Reliability, Transactions, Distributed Systems

Total Page:16

File Type:pdf, Size:1020Kb

Reliability, Transactions, Distributed Systems Important “ilities” CS162 • Availability: the probability that the system can accept and process Operating Systems and requests – Often measured in “nines” of probability. So, a 99.9% probability is Systems Programming considered “3-nines of availability” Lecture 20 – Key idea here is independence of failures • Durability: the ability of a system to recover data despite faults Reliability, Transactions – This idea is fault tolerance applied to data Distributed Systems – Doesn’t necessarily imply availability: information on pyramids was very durable, but could not be accessed until discovery of Rosetta Stone November 6th, 2017 • Reliability: the ability of a system or component to perform its Prof. Ion Stoica required functions under stated conditions for a specified period of time (IEEE definition) http://cs162.eecs.Berkeley.edu – Usually stronger than simply availability: means that the system is not only “up”, but also working correctly – Includes availability, security, fault tolerance/durability – Must make sure data survives system crashes, disk crashes, etc 11/6/17 CS162 © UCB Fall 2017 Lec 20.2 How to Make File System Durable? RAID: Redundant Arrays of Inexpensive Disks • Disk blocks contain Reed-Solomon error correcting codes (ECC) to deal with small defects in disk drive • Invented by David Patterson, Garth A. Gibson, and – Can allow recovery of data from small media defects Randy Katz here at UCB in 1987 • Make sure writes survive in short term – Either abandon delayed writes or • Data stored on multiple disks (redundancy) – Use special, battery-backed RAM (called non-volatile RAM or NVRAM) for dirty blocks in buffer cache • Either in software or hardware • Make sure that data survives in long term – In hardware case, done by disk controller; file system may not even know that there is more than one disk in use – Need to replicate! More than one copy of data! – Important element: independence of failure » Could put copies on one disk, but if disk head fails… • Initially, five levels of RAID (more now) » Could put copies on different disks, but if server fails… » Could put copies on different servers, but if building is struck by lightning…. » Could put copies on servers in different continents… 11/6/17 CS162 © UCB Fall 2017 Lec 20.3 11/6/17 CS162 © UCB Fall 2017 Lec 20.4 Page 1 RAID 1: Disk Mirroring/Shadowing RAID 5+: High I/O Rate Parity • Data stripped across Stripe multiple disks Unit – Successive blocks D0 D1 D2 D3 P0 recovery stored on successive Increasing group (non-parity) disks D4 D5 D6 P1 D7 Logical – Increased bandwidth Disk • Each disk is fully duplicated onto its “shadow” over single disk Addresses – For high I/O rate, high availability environments D8 D9 P2 D10 D11 • Parity block (in green) – Most expensive solution: 100% capacity overhead constructed by XORing D12 P3 D13 D14 D15 • Bandwidth sacrificed on write: data bocks in stripe – Logical write = two physical writes – P0=D0ÅD1ÅD2ÅD3 P4 D16 D17 D18 D19 – Highest bandwidth when disk heads and rotation fully – Can destroy any one synchronized (hard to do exactly) disk and still • Reads may be optimized reconstruct data D20 D21 D22 D23 P5 – Can have two independent reads to same data – Suppose Disk 3 fails, then can reconstruct: Disk 1 Disk 2 Disk 3 Disk 4 Disk 5 • Recovery: D2=D0ÅD1ÅD3ÅP0 – Disk failure Þ replace disk and copy data to new disk • Can spread information widely across internet for durability – Hot Spare: idle disk already attached to system to be used for immediate replacement – Overview now, more later in semester 11/6/17 CS162 © UCB Fall 2017 Lec 20.5 11/6/17 CS162 © UCB Fall 2017 Lec 20.6 Higher Durability/Reliability through Geographic Replication File System Reliability • Highly durable – hard to destroy all copies • What can happen if disk loses power or software crashes? • Highly available for reads – read any copy – Some operations in progress may complete • Low availability for writes – Some operations in progress may be lost – Can’t write if any one replica is not up – Overwrite of a block may only partially complete – Or – need relaxed consistency model • Reliability? – availability, security, durability, fault-tolerance • Having RAID doesn’t necessarily protect against all such failures Replica #1 – No protection against writing bad state – What if one disk of RAID group not written? Replica #2 • File system needs durability (as a minimum!) – Data previously stored can be retrieved (maybe after some recovery step), regardless of failure Replica #n 11/6/17 CS162 © UCB Fall 2017 Lec 20.7 11/6/17 CS162 © UCB Fall 2017 Lec 20.8 Page 2 Storage Reliability Problem Threats to Reliability • Single logical file operation can involve updates to • Interrupted Operation multiple physical disk blocks – Crash or power failure in the middle of a series of related – inode, indirect block, data block, bitmap, … updates may leave stored data in an inconsistent state – With sector remapping, single update to physical disk block – Example: Transfer funds from one bank account to another can require multiple (even lower level) updates to sectors – What if transfer is interrupted after withdrawal and before deposit? • At a physical level, operations complete one at a time – Want concurrent operations for performance • Loss of stored data • How do we guarantee consistency regardless of when – Failure of non-volatile storage media may cause previously crash occurs? stored data to disappear or be corrupted 11/6/17 CS162 © UCB Fall 2017 Lec 20.9 11/6/17 CS162 © UCB Fall 2017 Lec 20.10 Reliability Approach #1: Careful Ordering FFS: Create a File • Sequence operations in a specific order Normal operation: Recovery: – Careful design to allow sequence to be interrupted safely • Allocate data block • Scan inode table • Write data block • If any unlinked files (not in • Post-crash recovery • Allocate inode any directory), delete or put in lost & found dir – Read data structures to see if there were any operations in • Write inode block progress • Compare free block – Clean up/finish as needed • Update bitmap of free bitmap against inode trees blocks and inodes • Scan directories for missing • Update directory with update/access times • Approach taken by file name ® inode – FAT and FFS (fsck) to protect filesystem structure/metadata number – Many app-level recovery schemes (e.g., Word, emacs autosaves) • Update modify time for Time proportional to disk size directory 11/6/17 CS162 © UCB Fall 2017 Lec 20.11 11/6/17 CS162 © UCB Fall 2017 Lec 20.12 Page 3 Reliability Approach #2: Copy on Write File Layout COW with Smaller-Radix Blocks • To update file system, write a new version of the file old version new version system containing the update – Never update in place – Reuse existing unchanged disk blocks • Seems expensive! But – Updates can be batched – Almost all disk writes can occur in parallel Write • Approach taken in network file server appliances • If file represented as a tree of blocks, just need to – NetApp’s Write Anywhere File Layout (WAFL) update the leading fringe – ZFS (Sun/Oracle) and OpenZFS 11/6/17 CS162 © UCB Fall 2017 Lec 20.13 11/6/17 CS162 © UCB Fall 2017 Lec 20.14 ZFS and OpenZFS More General Reliability Solutions • Variable sized blocks: 512 B – 128 KB • Use Transactions for atomic updates • Symmetric tree – Ensure that multiple related updates are performed atomically – i.e., if a crash occurs in the middle, the state of the systems – Know if it is large or small when we make the copy reflects either all or none of the updates • Store version number with pointers – Most modern file systems use transactions internally to update filesystem structures and metadata – Can create new version by adding blocks and new pointers – Many applications implement their own transactions • Buffers a collection of writes before creating a new version with them • Provide Redundancy for media failures – Redundant representation on media (Error Correcting Codes) • Free space represented as tree of extents in each block group – Replication across media (e.g., RAID disk array) – Delay updates to freespace (in log) and do them all when block group is activated 11/6/17 CS162 © UCB Fall 2017 Lec 20.15 11/6/17 CS162 © UCB Fall 2017 Lec 20.16 Page 4 Transactions Key Concept: Transaction • Closely related to critical sections for manipulating shared data structures • An atomic sequence of actions (reads/writes) on a storage system (or database) • They extend concept of atomic update from memory to stable storage • That takes it from one consistent state to another – Atomically update multiple persistent data structures transaction consistent state 1 consistent state 2 • Many ad hoc approaches – FFS carefully ordered the sequence of updates so that if a crash occurred while manipulating directory or inodes the disk scan on reboot would detect and recover the error (fsck) – Applications use temporary files and rename 11/6/17 CS162 © UCB Fall 2017 Lec 20.17 11/6/17 CS162 © UCB Fall 2017 Lec 20.18 Typical Structure “Classic” Example: Transaction • Begin a transaction – get transaction id BEGIN; --BEGIN TRANSACTION UPDATE accounts SET balance = balance - 100.00 WHERE name = 'Alice'; • Do a bunch of updates UPDATE branches SET balance = balance - 100.00 WHERE name = (SELECT branch_name FROM accounts WHERE name – If any fail along the way, roll-back = 'Alice'); – Or, if any conflicts with other transactions, roll-back UPDATE accounts SET balance = balance + 100.00 WHERE name = 'Bob'; • Commit the transaction UPDATE branches SET balance = balance + 100.00 WHERE
Recommended publications
  • W4118: Linux File Systems
    W4118: Linux file systems Instructor: Junfeng Yang References: Modern Operating Systems (3rd edition), Operating Systems Concepts (8th edition), previous W4118, and OS at MIT, Stanford, and UWisc File systems in Linux Linux Second Extended File System (Ext2) . What is the EXT2 on-disk layout? . What is the EXT2 directory structure? Linux Third Extended File System (Ext3) . What is the file system consistency problem? . How to solve the consistency problem using journaling? Virtual File System (VFS) . What is VFS? . What are the key data structures of Linux VFS? 1 Ext2 “Standard” Linux File System . Was the most commonly used before ext3 came out Uses FFS like layout . Each FS is composed of identical block groups . Allocation is designed to improve locality inodes contain pointers (32 bits) to blocks . Direct, Indirect, Double Indirect, Triple Indirect . Maximum file size: 4.1TB (4K Blocks) . Maximum file system size: 16TB (4K Blocks) On-disk structures defined in include/linux/ext2_fs.h 2 Ext2 Disk Layout Files in the same directory are stored in the same block group Files in different directories are spread among the block groups Picture from Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639 3 Block Addressing in Ext2 Twelve “direct” blocks Data Data BlockData Inode Block Block BLKSIZE/4 Indirect Data Data Blocks BlockData Block Data (BLKSIZE/4)2 Indirect Block Data BlockData Blocks Block Double Block Indirect Indirect Blocks Data Data Data (BLKSIZE/4)3 BlockData Data Indirect Block BlockData Block Block Triple Double Blocks Block Indirect Indirect Data Indirect Data BlockData Blocks Block Block Picture from Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc.
    [Show full text]
  • Certifying a File System
    Certifying a file system: Correctness in the presence of crashes Tej Chajed, Haogang Chen, Stephanie Wang, Daniel Ziegler, Adam Chlipala, Frans Kaashoek, and Nickolai Zeldovich MIT CSAIL 1 / 28 New file systems (and bugs) are introduced over time Some bugs are serious: security exploits, data loss, etc. File systems are complex and have bugs File systems are complex (e.g., Linux ext4 is ∼60,000 lines of code) and have many bugs: 500 ext3 400 300 200 100 # patches for bugs 0 Jan'04 Jan'05 Jan'06 Jan'07 Jan'08 Jan'09 Jan'10 Jan'11 Cumulative number of patches for file-system bugs in Linux; data from [Lu et al., FAST’13] 2 / 28 Some bugs are serious: security exploits, data loss, etc. File systems are complex and have bugs File systems are complex (e.g., Linux ext4 is ∼60,000 lines of code) and have many bugs: 500 ext3 400 ext4 xfs 300 reiserfs 200 jfs btrfs 100 # patches for bugs 0 Jan'04 Jan'05 Jan'06 Jan'07 Jan'08 Jan'09 Jan'10 Jan'11 Cumulative number of patches for file-system bugs in Linux; data from [Lu et al., FAST’13] New file systems (and bugs) are introduced over time 2 / 28 File systems are complex and have bugs File systems are complex (e.g., Linux ext4 is ∼60,000 lines of code) and have many bugs: 500 ext3 400 ext4 xfs 300 reiserfs 200 jfs btrfs 100 # patches for bugs 0 Jan'04 Jan'05 Jan'06 Jan'07 Jan'08 Jan'09 Jan'10 Jan'11 Cumulative number of patches for file-system bugs in Linux; data from [Lu et al., FAST’13] New file systems (and bugs) are introduced over time Some bugs are serious: security exploits, data loss, etc.
    [Show full text]
  • XFS: There and Back ...And There Again? Slide 1 of 38
    XFS: There and Back.... .... and There Again? Dave Chinner <[email protected]> <[email protected]> XFS: There and Back .... and There Again? Slide 1 of 38 Overview • Story Time • Serious Things • These Days • Shiny Things • Interesting Times XFS: There and Back .... and There Again? Slide 2 of 38 Story Time • Way back in the early '90s • Storage exceeding 32 bit capacities • 64 bit CPUs, large scale MP • Hundreds of disks in a single machine • XFS: There..... Slide 3 of 38 "x" is for Undefined xFS had to support: • Fast Crash Recovery • Large File Systems • Large, Sparse Files • Large, Contiguous Files • Large Directories • Large Numbers of Files • - Scalability in the XFS File System, 1995 http://oss.sgi.com/projects/xfs/papers/xfs_usenix/index.html XFS: There..... Slide 4 of 38 The Early Years XFS: There..... Slide 5 of 38 The Early Years • Late 1994: First Release, Irix 5.3 • Mid 1996: Default FS, Irix 6.2 • Already at Version 4 • Attributes • Journalled Quotas • link counts > 64k • feature masks • • XFS: There..... Slide 6 of 38 The Early Years • • Allocation alignment to storage geometry (1997) • Unwritten extents (1998) • Version 2 directories (1999) • mkfs time configurable block size • Scalability to tens of millions of directory entries • • XFS: There..... Slide 7 of 38 What's that Linux Thing? • Feature development mostly stalled • Irix development focussed on CXFS • New team formed for Linux XFS port! • Encumberance review! • Linux was missing lots of bits XFS needed • Lot of work needed • • XFS: There and..... Slide 8 of 38 That Linux Thing? XFS: There and..... Slide 9 of 38 Light that fire! • 2000: SGI releases XFS under GPL • • 2001: First stable XFS release • • 2002: XFS merged into 2.5.36 • • JFS follows similar timeline • XFS: There and....
    [Show full text]
  • Filesystems HOWTO Filesystems HOWTO Table of Contents Filesystems HOWTO
    Filesystems HOWTO Filesystems HOWTO Table of Contents Filesystems HOWTO..........................................................................................................................................1 Martin Hinner < [email protected]>, http://martin.hinner.info............................................................1 1. Introduction..........................................................................................................................................1 2. Volumes...............................................................................................................................................1 3. DOS FAT 12/16/32, VFAT.................................................................................................................2 4. High Performance FileSystem (HPFS)................................................................................................2 5. New Technology FileSystem (NTFS).................................................................................................2 6. Extended filesystems (Ext, Ext2, Ext3)...............................................................................................2 7. Macintosh Hierarchical Filesystem − HFS..........................................................................................3 8. ISO 9660 − CD−ROM filesystem.......................................................................................................3 9. Other filesystems.................................................................................................................................3
    [Show full text]
  • Acronis® Disk Director® 12 User's Guide
    User Guide Copyright Statement Copyright © Acronis International GmbH, 2002-2015. All rights reserved. "Acronis", "Acronis Compute with Confidence", "Acronis Recovery Manager", "Acronis Secure Zone", Acronis True Image, Acronis Try&Decide, and the Acronis logo are trademarks of Acronis International GmbH. Linux is a registered trademark of Linus Torvalds. VMware and VMware Ready are trademarks and/or registered trademarks of VMware, Inc. in the United States and/or other jurisdictions. Windows and MS-DOS are registered trademarks of Microsoft Corporation. All other trademarks and copyrights referred to are the property of their respective owners. Distribution of substantively modified versions of this document is prohibited without the explicit permission of the copyright holder. Distribution of this work or derivative work in any standard (paper) book form for commercial purposes is prohibited unless prior permission is obtained from the copyright holder. DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID. Third party code may be provided with the Software and/or Service. The license terms for such third-parties are detailed in the license.txt file located in the root installation directory. You can always find the latest up-to-date list of the third party code and the associated license terms used with the Software and/or Service at http://kb.acronis.com/content/7696 Acronis patented technologies Technologies, used in this product, are covered and protected by one or more U.S.
    [Show full text]
  • Journaling File Systems
    Linux Journaling File Systems Linux onzSeries Journaling File Systems Volker Sameske ([email protected]) Linux on zSeries Development IBM Lab Boeblingen, Germany Share Anaheim,California February27 –March 4,2005 Session 9257 ©2005 IBM Corporation Linux Journaling File Systems Agenda o File systems. • Overview, definitions. • Reliability, scalability. • File system features. • Common grounds & differences. o Volume management. • LVM, EVMS, MD. • Striping. o Measurement results. • Hardware/software setup. • throughput. • CPU load. 2 Session 9257 © 2005 IBM Corporation Linux Journaling File Systems A file system should... o ...store data o ...organize data o ...administrate data o ...organize data about the data o ...assure integrity o ...be able to recover integrity problems o ...provide tools (expand, shrink, check, ...) o ...be able to handle many and large files o ...be fast o ... 3 Session 9257 © 2005 IBM Corporation Linux Journaling File Systems File system-definition o Informally • The mechanism by which computer files are stored and organized on a storage device. o More formally, • A set of abstract data types that are necessary for the storage, hierarchical organization, manipulation, navigation, access and retrieval of data. 4 Session 9257 © 2005 IBM Corporation Linux Journaling File Systems Why a journaling file system? o Imagine your Linux system crashs while you are saving an edited file: • The system crashs after the changes have been written to disk à good crash • The system crashs before the changes have been written to disk à bad crash but bearable if you have an older version • The sytem crashs just in the moment your data will be written: à very bad crash your file could be corrupted and in worst case the file system could be corrupted à That‘s why you need a journal 5 Session 9257 © 2005 IBM Corporation Linux Journaling File Systems Somefilesystemterms o Meta data • "Data about the data" • File system internal data structure (e.g.
    [Show full text]
  • Partitioning Disks with Parted
    Partitioning Disks with parted Author: Yogesh Babar Technical Reviewer: Chris Negus 10/6/2017 Storage devices in Linux (such as hard drives and USB drives) need to be structured in some way before use. In most cases, large storage devices are divided into separate sections, which in Linux are referred to as partitions. A popular tool for creating, removing and otherwise manipulating disk partitions in Linux is the parted command. Procedures in this tech brief step you through different ways of using the parted command to work with Linux partitions. UNDERSTANDING PARTED The parted command is particularly useful with large disk devices and many disk partitions. Differences between parted and the more common fdisk and cfdisk commands include: • GPT Format: The parted command can create can be used to create Globally Unique Identifiers Partition Tables (GPT), while fdisk and cfdisk are limited to msdos partition tables. • Larger disks: An msdos partition table can only format up to 2TB of disk space (although up to 16TB is possible in some cases). A GPT partition table, however, have the potential to address up to 8 zebibytes of space. • More partitions: Using primary and extended partitions, msdos partition tables allow only 16 partitions. With GPT, you get up to 128 partitions by default and can choose to have many more. • Reliability: Only one copy of the partition table is stored in an msdos partition. GPT keeps two copies of the partition table (at the beginning and end of the disk). The GPT also uses a CRC checksum to check the partition table's integrity (which is not done with msdos partitions).
    [Show full text]
  • Data Interchange & Optical Standards
    311 East Carrillo Street Santa Barbara, CA 93101 Telephone (805) 963-3853 Fax (805) 962-1541 Email: [email protected] http://www.osta.org Data Interchange & Optical Standards February 12, 1996 Revision 2 Data Interchange and Optical Standards Optical disk media provides high capacity removable storage with an extremely long life. The removability of the media provides us with the opportunity to easily transport and interchange of large amounts of data. The goal of data interchange is to have the ability to store files on optical media using any type of computer and then be able to access these files using any other computer system. Removable media data interchange occurs at three different levels: sector, file and application. Sector Level Interchange Sector interchange is the ability to interchange data at the sector level. This level of interchange defines the ability to move media universally among different optical drives. The data being interchanged at this level is comprised of sectors. When it is stated that two different optical drives can interchange data at the sector level, it means that the drives can read and write sectors on media created by each other. For example, assume you have two optical drives, one manufactured by company A and the other drive by company B, and you write to sectors 100 through 200 on an optical disk in drive A. With sector level interchange, you can place the disk in drive B and read sectors 100 through 200 that were written by drive A. You can then write to sectors 300 through 400 on the disk with drive B, and then move the disk back to drive A and read the sectors written by drive B.
    [Show full text]
  • The Evolution of File Systems
    The Evolution of File Systems Thomas Rivera, Hitachi Data Systems Craig Harmer, April 2011 SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced without modification The SNIA must be acknowledged as source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA Education Committee. Neither the Author nor the Presenter is an attorney and nothing in this presentation is intended to be nor should be construed as legal advice or opinion. If you need legal advice or legal opinion please contact an attorney. The information presented herein represents the Author's personal opinion and current understanding of the issues involved. The Author, the Presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK. The Evolution of File Systems 2 © 2012 Storage Networking Industry Association. All Rights Reserved. 2 Abstract The File Systems Evolution Over time additional file systems appeared focusing on specialized requirements such as: data sharing, remote file access, distributed file access, parallel files access, HPC, archiving, security, etc. Due to the dramatic growth of unstructured data, files as the basic units for data containers are morphing into file objects, providing more semantics and feature- rich capabilities for content processing This presentation will: Categorize and explain the basic principles of currently available file system architectures (e.g.
    [Show full text]
  • 08-Filesystems.Pdf
    ECE590-03 Enterprise Storage Architecture Fall 2017 File Systems Tyler Bletsch Duke University The file system layer User code open, read, write, seek, close, stat, mkdir, rmdir, unlink, ... Kernel VFS layer File system drivers ext4 fat nfs ... Disk driver NIC driver read_block, write_block packets Could be a single drive or a RAID HDD / SSD 2 High-level motivation • Disks are dumb arrays of blocks. • Need to allocate/deallocate (claim and free blocks) • Need logical containers for blocks (allow delete, append, etc. on different buckets of data – files) • Need to organize such containers (how to find data – directories) • May want to access control (restrict data access per user) • May want to dangle additional features on top Result: file systems 3 Disk file systems • All have same goal: • Fulfill file system calls (open, seek, read, write, close, mkdir, etc.) • Store resulting data on a block device • The big (non-academic) file systems • FAT (“File Allocation Table”): Primitive Microsoft filesystem for use on floppy disks and later adapted to hard drives • FAT32 (1996) still in use (default file system for USB sticks, SD cards, etc.) • Bad performance, poor recoverability on crash, but near-universal and easy for simple systems to implement • ext2, ext3, ext4: Popular Linux file system. • Ext2 (1993) has inode-based on-disk layout – much better scalability than FAT • Ext3 (2001) adds journaling – much better recoverability than FAT • Ext4 (2008) adds various smaller benefits • NTFS: Current Microsoft filesystem (1993). • Like ext3, adds journaling to provide better recoverability than FAT • More expressive metadata (e.g. Access Control Lists (ACLs)) • HFS+: Current Mac filesystem (1998).
    [Show full text]
  • Filesystem On-Disk Structures for FAT, HPFS, NTFS, JFS, Extn and Reiserfs Presentation Contents
    On-disk filesystem structures Jan van Wijk Filesystem on-disk structures for FAT, HPFS, NTFS, JFS, EXTn and ReiserFS Presentation contents Generic filesystem architecture (Enhanced) FAT(32), File Allocation Table variants HPFS, High Performance FileSystem (OS/2 only) NTFS, New Technology FileSystem (Windows) JFS, Journaled File System (IBM classic or bootable) EXT2, EXT3 and EXT4 Linux filesystems ReiserFS, Linux filesystem FS-info: FAT, HPFS, NTFS, JFS, EXTn, Reiser © 2019 JvW Who am I ? Jan van Wijk Software Engineer, C, C++, Rexx, PHP, Assembly Founded FSYS Software in 2001, developing and supporting DFSee from version 4 to 16.x First OS/2 experience in 1987, developing parts of OS/2 1.0 EE (Query Manager, later DB2) Used to be a systems-integration architect at a large bank, 500 servers and 7500 workstations Developing embedded software for machine control and appliances from 2008 onwards Home page: https://www.dfsee.com/ FS-info: FAT, HPFS, NTFS, JFS, EXTn, Reiser © 2019 JvW Information in a filesystem Generic volume information Boot sector, super blocks, special files ... File and directory descriptive info Directories, FNODEs, INODEs, MFT-records Tree hierarchy of files and directories Free space versus used areas Allocation-table, bitmap Used disk-sectors for each file/directory Allocation-table, run-list, bitmap FS-info: FAT, HPFS, NTFS, JFS, EXTn, Reiser © 2019 JvW File Allocation Table The FAT filesystem was derived from older CPM filesystems for the first (IBM) PC Designed for diskettes and small hard
    [Show full text]
  • The Zettabyte File System
    The Zettabyte File System Jeff Bonwick, Matt Ahrens, Val Henson, Mark Maybee, Mark Shellenbaum Jeff[email protected] Abstract Today’s storage environment is radically different from that of the 1980s, yet many file systems In this paper we describe a new file system that continue to adhere to design decisions made when provides strong data integrity guarantees, simple 20 MB disks were considered big. Today, even a per- administration, and immense capacity. We show sonal computer can easily accommodate 2 terabytes that with a few changes to the traditional high-level of storage — that’s one ATA PCI card and eight file system architecture — including a redesign of 250 GB IDE disks, for a total cost of about $2500 the interface between the file system and volume (according to http://www.pricewatch.com) in late manager, pooled storage, a transactional copy-on- 2002 prices. Disk workloads have changed as a result write model, and self-validating checksums — we of aggressive caching of disk blocks in memory; since can eliminate many drawbacks of traditional file reads can be satisfied by the cache but writes must systems. We describe a general-purpose production- still go out to stable storage, write performance quality file system based on our new architecture, now dominates overall performance[8, 16]. These the Zettabyte File System (ZFS). Our new architec- are only two of the changes that have occurred ture reduces implementation complexity, allows new over the last few decades, but they alone warrant a performance improvements, and provides several reexamination of file system architecture from the useful new features almost as a side effect.
    [Show full text]