Ext4 File System and Crash Consistency
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Copy on Write Based File Systems Performance Analysis and Implementation
Copy On Write Based File Systems Performance Analysis And Implementation Sakis Kasampalis Kongens Lyngby 2010 IMM-MSC-2010-63 Technical University of Denmark Department Of Informatics Building 321, DK-2800 Kongens Lyngby, Denmark Phone +45 45253351, Fax +45 45882673 [email protected] www.imm.dtu.dk Abstract In this work I am focusing on Copy On Write based file systems. Copy On Write is used on modern file systems for providing (1) metadata and data consistency using transactional semantics, (2) cheap and instant backups using snapshots and clones. This thesis is divided into two main parts. The first part focuses on the design and performance of Copy On Write based file systems. Recent efforts aiming at creating a Copy On Write based file system are ZFS, Btrfs, ext3cow, Hammer, and LLFS. My work focuses only on ZFS and Btrfs, since they support the most advanced features. The main goals of ZFS and Btrfs are to offer a scalable, fault tolerant, and easy to administrate file system. I evaluate the performance and scalability of ZFS and Btrfs. The evaluation includes studying their design and testing their performance and scalability against a set of recommended file system benchmarks. Most computers are already based on multi-core and multiple processor architec- tures. Because of that, the need for using concurrent programming models has increased. Transactions can be very helpful for supporting concurrent program- ming models, which ensure that system updates are consistent. Unfortunately, the majority of operating systems and file systems either do not support trans- actions at all, or they simply do not expose them to the users. -
CERIAS Tech Report 2017-5 Deceptive Memory Systems by Christopher N
CERIAS Tech Report 2017-5 Deceptive Memory Systems by Christopher N. Gutierrez Center for Education and Research Information Assurance and Security Purdue University, West Lafayette, IN 47907-2086 DECEPTIVE MEMORY SYSTEMS ADissertation Submitted to the Faculty of Purdue University by Christopher N. Gutierrez In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy December 2017 Purdue University West Lafayette, Indiana ii THE PURDUE UNIVERSITY GRADUATE SCHOOL STATEMENT OF DISSERTATION APPROVAL Dr. Eugene H. Spa↵ord, Co-Chair Department of Computer Science Dr. Saurabh Bagchi, Co-Chair Department of Computer Science Dr. Dongyan Xu Department of Computer Science Dr. Mathias Payer Department of Computer Science Approved by: Dr. Voicu Popescu by Dr. William J. Gorman Head of the Graduate Program iii This work is dedicated to my wife, Gina. Thank you for all of your love and support. The moon awaits us. iv ACKNOWLEDGMENTS Iwould liketothank ProfessorsEugeneSpa↵ord and SaurabhBagchi for their guidance, support, and advice throughout my time at Purdue. Both have been instru mental in my development as a computer scientist, and I am forever grateful. I would also like to thank the Center for Education and Research in Information Assurance and Security (CERIAS) for fostering a multidisciplinary security culture in which I had the privilege to be part of. Special thanks to Adam Hammer and Ronald Cas tongia for their technical support and Thomas Yurek for his programming assistance for the experimental evaluation. I am grateful for the valuable feedback provided by the members of my thesis committee, Professor Dongyen Xu, and Professor Math ias Payer. -
System Calls System Calls
System calls We will investigate several issues related to system calls. Read chapter 12 of the book Linux system call categories file management process management error handling note that these categories are loosely defined and much is behind included, e.g. communication. Why? 1 System calls File management system call hierarchy you may not see some topics as part of “file management”, e.g., sockets 2 System calls Process management system call hierarchy 3 System calls Error handling hierarchy 4 Error Handling Anything can fail! System calls are no exception Try to read a file that does not exist! Error number: errno every process contains a global variable errno errno is set to 0 when process is created when error occurs errno is set to a specific code associated with the error cause trying to open file that does not exist sets errno to 2 5 Error Handling error constants are defined in errno.h here are the first few of errno.h on OS X 10.6.4 #define EPERM 1 /* Operation not permitted */ #define ENOENT 2 /* No such file or directory */ #define ESRCH 3 /* No such process */ #define EINTR 4 /* Interrupted system call */ #define EIO 5 /* Input/output error */ #define ENXIO 6 /* Device not configured */ #define E2BIG 7 /* Argument list too long */ #define ENOEXEC 8 /* Exec format error */ #define EBADF 9 /* Bad file descriptor */ #define ECHILD 10 /* No child processes */ #define EDEADLK 11 /* Resource deadlock avoided */ 6 Error Handling common mistake for displaying errno from Linux errno man page: 7 Error Handling Description of the perror () system call. -
Study of File System Evolution
Study of File System Evolution Swaminathan Sundararaman, Sriram Subramanian Department of Computer Science University of Wisconsin {swami, srirams} @cs.wisc.edu Abstract File systems have traditionally been a major area of file systems are typically developed and maintained by research and development. This is evident from the several programmer across the globe. At any point in existence of over 50 file systems of varying popularity time, for a file system, there are three to six active in the current version of the Linux kernel. They developers, ten to fifteen patch contributors but a single represent a complex subsystem of the kernel, with each maintainer. These people communicate through file system employing different strategies for tackling individual file system mailing lists [14, 16, 18] various issues. Although there are many file systems in submitting proposals for new features, enhancements, Linux, there has been no prior work (to the best of our reporting bugs, submitting and reviewing patches for knowledge) on understanding how file systems evolve. known bugs. The problems with the open source We believe that such information would be useful to the development approach is that all communication is file system community allowing developers to learn buried in the mailing list archives and aren’t easily from previous experiences. accessible to others. As a result when new file systems are developed they do not leverage past experience and This paper looks at six file systems (Ext2, Ext3, Ext4, could end up re-inventing the wheel. To make things JFS, ReiserFS, and XFS) from a historical perspective worse, people could typically end up doing the same (between kernel versions 1.0 to 2.6) to get an insight on mistakes as done in other file systems. -
System Calls and I/O
System Calls and I/O CS 241 January 27, 2012 Copyright ©: University of Illinois CS 241 Staff 1 This lecture Goals Get you familiar with necessary basic system & I/O calls to do programming Things covered in this lecture Basic file system calls I/O calls Signals Note: we will come back later to discuss the above things at the concept level Copyright ©: University of Illinois CS 241 Staff 2 System Calls versus Function Calls? Copyright ©: University of Illinois CS 241 Staff 3 System Calls versus Function Calls Function Call Process fnCall() Caller and callee are in the same Process - Same user - Same “domain of trust” Copyright ©: University of Illinois CS 241 Staff 4 System Calls versus Function Calls Function Call System Call Process Process fnCall() sysCall() OS Caller and callee are in the same Process - Same user - OS is trusted; user is not. - Same “domain of trust” - OS has super-privileges; user does not - Must take measures to prevent abuse Copyright ©: University of Illinois CS 241 Staff 5 System Calls System Calls A request to the operating system to perform some activity System calls are expensive The system needs to perform many things before executing a system call The computer (hardware) saves its state The OS code takes control of the CPU, privileges are updated. The OS examines the call parameters The OS performs the requested function The OS saves its state (and call results) The OS returns control of the CPU to the caller Copyright ©: University of Illinois CS 241 Staff 6 Steps for Making a System Call -
Membrane: Operating System Support for Restartable File Systems Swaminathan Sundararaman, Sriram Subramanian, Abhishek Rajimwale, Andrea C
Membrane: Operating System Support for Restartable File Systems Swaminathan Sundararaman, Sriram Subramanian, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Michael M. Swift Computer Sciences Department, University of Wisconsin, Madison Abstract and most complex code bases in the kernel. Further, We introduce Membrane, a set of changes to the oper- file systems are still under active development, and new ating system to support restartable file systems. Mem- ones are introduced quite frequently. For example, Linux brane allows an operating system to tolerate a broad has many established file systems, including ext2 [34], class of file system failures and does so while remain- ext3 [35], reiserfs [27], and still there is great interest in ing transparent to running applications; upon failure, the next-generation file systems such as Linux ext4 and btrfs. file system restarts, its state is restored, and pending ap- Thus, file systems are large, complex, and under develop- plication requests are serviced as if no failure had oc- ment, the perfect storm for numerous bugs to arise. curred. Membrane provides transparent recovery through Because of the likely presence of flaws in their imple- a lightweight logging and checkpoint infrastructure, and mentation, it is critical to consider how to recover from includes novel techniques to improve performance and file system crashes as well. Unfortunately, we cannot di- correctness of its fault-anticipation and recovery machin- rectly apply previous work from the device-driver litera- ery. We tested Membrane with ext2, ext3, and VFAT. ture to improving file-system fault recovery. File systems, Through experimentation, we show that Membrane in- unlike device drivers, are extremely stateful, as they man- duces little performance overhead and can tolerate a wide age vast amounts of both in-memory and persistent data; range of file system crashes. -
Relieving the Burden of Track Switch in Modern Hard Disk Drives
Multimedia Systems DOI 10.1007/s00530-010-0218-5 REGULAR PAPER Relieving the burden of track switch in modern hard disk drives Jongmin Gim • Youjip Won Received: 11 November 2009 / Accepted: 22 November 2010 Ó Springer-Verlag 2010 Abstract In this work, we propose a novel hard disk 128 KByte, 17% of the disk space becomes unusable. technique, ‘‘AV Disk’’, for modern multimedia applica- Despite the decreased storage area, track aligning tech- tions. Modern hard disk drives adopt complex sector layout nique increases the overall performance of the hard disk. mechanisms to reduce track and head switch overhead. According to our simulation-based experiment, overall disk While these complex sector layout mechanism can reduce performance increases about 5–25%. Given that capacity of average overhead involved in the track and head switch, hard disk increases 100% every year, we cautiously regard they bring larger variability in the overhead. From a it as reasonable tradeoff to increase the I/O latency of the multimedia application’s point of view, it is important to disk. minimize the worst case I/O latency rather than to improve the average IO latency. We focus our effort to minimize Keyword Hard disk drive Á Multimedia Á Track align Á track switch overhead as well as the variability in track Track switch Á Sector geometry Á Audio and video switch overhead involved in disk I/O. We propose that track of the hard disk drive is aligned with a certain IO size. In this work, we develop an elaborate performance model 1 Introduction with which we can compute the optimal IO unit size for multimedia applications. -
The Power Supply Subsystem
The Power Supply Subsystem Sebastian Reichel Collabora October 24, 2018 Open First Sebastian Reichel I Embedded Linux engineer at Collabora I Open Source Consultancy I Based in Oldenburg, Germany I Open Source contributor I Debian Developer I HSI and power-supply subsystem maintainer I Cofounder of Oldenburg's Hack(er)/Makerspace Open First The power-supply subsystem I batteries / fuel gauges I chargers I (board level poweroff/reset) I Originally written and maintained by Anton Vorontsov (2007-2014) Created by Blink@design from the Noun Project Created by Jenie Tomboc I Temporarily maintained by Dmitry from the Noun Project Eremin-Solenikov (2014) Open First Userspace Interface root@localhost# ls /sys/class/power_supply/ AC BAT0 BAT1 root@localhost# ls /sys/class/power_supply/BAT0 alarm energy_full_design status capacity energy_now subsystem capacity_level manufacturer technology charge_start_threshold model_name type charge_stop_threshold power uevent cycle_count power_now voltage_min_design ... root@localhost# cat /sys/class/power_supply/BAT0/capacity 65 Open First Userspace Interface root@localhost# udevadm info /sys/class/power_supply/BAT0 E: POWER_SUPPLY_CAPACITY=79 E: POWER_SUPPLY_ENERGY_FULL=15200000 E: POWER_SUPPLY_ENERGY_FULL_DESIGN=23200000 E: POWER_SUPPLY_ENERGY_NOW=12010000 E: POWER_SUPPLY_POWER_NOW=5890000 E: POWER_SUPPLY_STATUS=Discharging E: POWER_SUPPLY_VOLTAGE_MIN_DESIGN=11100000 E: POWER_SUPPLY_VOLTAGE_NOW=11688000 ... Open First Userspace Interface I one power-supply device = one physical device I All values are in uV, -
Fragmentation in Large Object Repositories
Fragmentation in Large Object Repositories Experience Paper Russell Sears Catharine van Ingen University of California, Berkeley Microsoft Research [email protected] [email protected] ABSTRACT 1. INTRODUCTION Fragmentation leads to unpredictable and degraded application Application data objects continue to increase in size. performance. While these problems have been studied in detail Furthermore, the increasing popularity of web services and other for desktop filesystem workloads, this study examines newer network applications means that systems that once managed static systems such as scalable object stores and multimedia archives of “finished” objects now manage frequently modified repositories. Such systems use a get/put interface to store objects. versions of application data. Rather than updating these objects in In principle, databases and filesystems can support such place, typical archives either store multiple versions of the objects applications efficiently, allowing system designers to focus on (the V of WebDAV stands for “versioning” [25]), or simply do complexity, deployment cost and manageability. wholesale replacement (as in SharePoint Team Services [19]). Similarly, applications such as personal video recorders and Although theoretical work proves that certain storage policies media subscription servers continuously allocate and delete large, behave optimally for some workloads, these policies often behave transient objects. poorly in practice. Most storage benchmarks focus on short-term behavior or do not measure fragmentation. We compare SQL Applications store large objects as some combination of files in Server to NTFS and find that fragmentation dominates the filesystem and as BLOBs (binary large objects) in a database. performance when object sizes exceed 256KB-1MB. NTFS Only folklore is available regarding the tradeoffs. -
ECE 598 – Advanced Operating Systems Lecture 19
ECE 598 { Advanced Operating Systems Lecture 19 Vince Weaver http://web.eece.maine.edu/~vweaver [email protected] 7 April 2016 Announcements • Homework #7 was due • Homework #8 will be posted 1 Why use FAT over ext2? • FAT simpler, easy to code • FAT supported on all major OSes • ext2 faster, more robust filename and permissions 2 btrfs • B-tree fs (similar to a binary tree, but with pages full of leaves) • overwrite filesystem (overwite on modify) vs CoW • Copy on write. When write to a file, old data not overwritten. Since old data not over-written, crash recovery better Eventually old data garbage collected • Data in extents 3 • Copy-on-write • Forest of trees: { sub-volumes { extent-allocation { checksum tree { chunk device { reloc • On-line defragmentation • On-line volume growth 4 • Built-in RAID • Transparent compression • Snapshots • Checksums on data and meta-data • De-duplication • Cloning { can make an exact snapshot of file, copy-on- write different than link, different inodles but same blocks 5 Embedded • Designed to be small, simple, read-only? • romfs { 32 byte header (magic, size, checksum,name) { Repeating files (pointer to next [0 if none]), info, size, checksum, file name, file data • cramfs 6 ZFS Advanced OS from Sun/Oracle. Similar in idea to btrfs indirect still, not extent based? 7 ReFS Resilient FS, Microsoft's answer to brtfs and zfs 8 Networked File Systems • Allow a centralized file server to export a filesystem to multiple clients. • Provide file level access, not just raw blocks (NBD) • Clustered filesystems also exist, where multiple servers work in conjunction. -
CS 5600 Computer Systems
CS 5600 Computer Systems Lecture 10: File Systems What are We Doing Today? • Last week we talked extensively about hard drives and SSDs – How they work – Performance characterisEcs • This week is all about managing storage – Disks/SSDs offer a blank slate of empty blocks – How do we store files on these devices, and keep track of them? – How do we maintain high performance? – How do we maintain consistency in the face of random crashes? 2 • ParEEons and MounEng • Basics (FAT) • inodes and Blocks (ext) • Block Groups (ext2) • Journaling (ext3) • Extents and B-Trees (ext4) • Log-based File Systems 3 Building the Root File System • One of the first tasks of an OS during bootup is to build the root file system 1. Locate all bootable media – Internal and external hard disks – SSDs – Floppy disks, CDs, DVDs, USB scks 2. Locate all the parEEons on each media – Read MBR(s), extended parEEon tables, etc. 3. Mount one or more parEEons – Makes the file system(s) available for access 4 The Master Boot Record Address Size Descripon Hex Dec. (Bytes) Includes the starEng 0x000 0 Bootstrap code area 446 LBA and length of 0x1BE 446 ParEEon Entry #1 16 the parEEon 0x1CE 462 ParEEon Entry #2 16 0x1DE 478 ParEEon Entry #3 16 0x1EE 494 ParEEon Entry #4 16 0x1FE 510 Magic Number 2 Total: 512 ParEEon 1 ParEEon 2 ParEEon 3 ParEEon 4 MBR (ext3) (swap) (NTFS) (FAT32) Disk 1 ParEEon 1 MBR (NTFS) 5 Disk 2 Extended ParEEons • In some cases, you may want >4 parEEons • Modern OSes support extended parEEons Logical Logical ParEEon 1 ParEEon 2 Ext. -
W4118: Linux File Systems
W4118: Linux file systems Instructor: Junfeng Yang References: Modern Operating Systems (3rd edition), Operating Systems Concepts (8th edition), previous W4118, and OS at MIT, Stanford, and UWisc File systems in Linux Linux Second Extended File System (Ext2) . What is the EXT2 on-disk layout? . What is the EXT2 directory structure? Linux Third Extended File System (Ext3) . What is the file system consistency problem? . How to solve the consistency problem using journaling? Virtual File System (VFS) . What is VFS? . What are the key data structures of Linux VFS? 1 Ext2 “Standard” Linux File System . Was the most commonly used before ext3 came out Uses FFS like layout . Each FS is composed of identical block groups . Allocation is designed to improve locality inodes contain pointers (32 bits) to blocks . Direct, Indirect, Double Indirect, Triple Indirect . Maximum file size: 4.1TB (4K Blocks) . Maximum file system size: 16TB (4K Blocks) On-disk structures defined in include/linux/ext2_fs.h 2 Ext2 Disk Layout Files in the same directory are stored in the same block group Files in different directories are spread among the block groups Picture from Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639 3 Block Addressing in Ext2 Twelve “direct” blocks Data Data BlockData Inode Block Block BLKSIZE/4 Indirect Data Data Blocks BlockData Block Data (BLKSIZE/4)2 Indirect Block Data BlockData Blocks Block Double Block Indirect Indirect Blocks Data Data Data (BLKSIZE/4)3 BlockData Data Indirect Block BlockData Block Block Triple Double Blocks Block Indirect Indirect Data Indirect Data BlockData Blocks Block Block Picture from Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc.