File Systems I

Total Page:16

File Type:pdf, Size:1020Kb

File Systems I Operating Systems 11/8/2018 Characteristics of I/O Devices • Data transfer mode – block vs. character Disk Storage • Access method – sequential vs. random • Transfer schedule – synchronous vs. and File Systems asynchronous • Sharing mode – dedicated vs. sharable CS 256/456 • Device speed – latency, seek time, transfer rate, Dept. of Computer Science, University occupancy/delay between operations of Rochester • I/O direction – R, W, R/W 11/8/2018 CSC 2/456 1 11/8/2018 CSC 2/456 2 Recap: Disk Storage Disk Management • Disk drive • Formatting – mechanical parts (cylinders, tracks, sectors) and how they move to access disk data – Header: sector number etc. – electronic part (disk controller main) exposes a one- – Footer/tail: ECC codes dimensionally addressable set of blocks – Gap – large seek/rotation time – Initialize mapping from logical block number to defect- free sectors • Logical disk partitioning – One or more groups of cylinders – Sector 0: master boot record loaded by BIOS firmware, which contains partition information – Boot record points to boot partition 11/8/2018 CSC 2/456 3 11/8/2018 CSC 2/456 4 CSC 256/456 1 Operating Systems 11/8/2018 Disk Drive – Mechanical Parts Disk Structure Cylinder • Disk drives are addressed as large 1- dimensional arrays of logical blocks, (set of tracks) Cylinder Track where the logical block is the smallest unit of transfer. (set of tracks) • The 1-dimensional array of logical blocks is mapped into the sectors of the disk sequentially. – Sector 0 is the first sector of the first track on the outermost cylinder. – Mapping proceeds in order through Multi-surface Disk Disk Surface Cylinders that track, then the rest of the tracks in that cylinder, and then through the rest of the cylinders 11/8/2018 CSC 2/456 5 11/8/2018from outermost to innermost.CSC 2/456 6 Disk Performance Characteristics https://en.wikipedia.org/wiki/Hard_d 12 isk_drive 10 Improvement of HDD characteristics over time • A disk operation has three Started with Developed to 8 Parameter Improvement (1957) (2017) major components 6 [13 Capacity 3.75 megabytes [14] 3.73-million-to- ] 14 terabytes [15] – Seek – moving the heads 4 (formatted) one 2.1 cubic 68 cubic 3 [16][ [17] to the cylinder 2 Physical volume 3 [c][6] inches(34 cm ) 56,000-to-one feet(1.9 m ) d] Seek Seek time(millisecond) 0 containing the desired Seek timeSeek (millisecond) distance 2,000 pounds(910 2.2 ounces (62 g)[ Weight 15,000-to-one[18] sector kg)[6] 16] 2.5 ms to 10 ms; Average access approx. about RW RAM – Rotation – rotating the time 600 milliseconds[6] 200-to-one[19] dependent 60 desired sector to the US$9,200 per US$0.032 per 300-million-to- Price megabyte gigabyte by [22] disk head [20] [21] one 40 (1961) 2015 1.3 terabits per – Transfer – sequentially 2,000 bits per squ 650-million-to- Data density [23] square inch in [25] 20 are inch [24] one moving data to or from A Seagate SCSI drive 2015 c. 2,500,000 hrs An IBM SCSI drive c. 2000 hrs disk [27] 0 Average lifespan [citation needed] (~285 years) 1250-to-one Starting transfer address MTBF [26] Readthroughput(MB/sec) Transfer throughput (MB/sec) MTBF 11/8/2018 CSC 2/456 7 11/8/2018 CSC 2/456 8 CSC 256/456 2 Operating Systems 11/8/2018 Disk Scheduling FCFS (First-Come-First-Serve) • Disk scheduling – choose from outstanding disk requests when the disk is ready for a new request – can be done in both disk controller and the operating system – Disk scheduling non-preemptible • Goals of disk scheduling – overall efficiency – small resource consumption for completing disk I/O workload – fairness – prevent starvation • Illustration shows the total head movement is 640. • Starvation? 11/8/2018 CSC 2/456 9 11/8/2018 CSC 2/456 10 SSTF (Shortest-Seek-Time-First) SCAN • The disk arm starts at one end of the disk, and moves toward the • Selects the request with the minimum seek time from the current other end, servicing requests until it gets to the other end, where head position. the head movement is reversed and servicing continues. • SSTF scheduling is a form of SJF scheduling. • Sometimes called the elevator algorithm. • Illustration shows the total head movement is 236. • Illustration shows the total head movement is 208. Starvation? Starvation? 11/8/2018 CSC 2/456 11 11/8/2018 CSC 2/456 12 CSC 256/456 3 Operating Systems 11/8/2018 C-SCAN (Circular-SCAN) C-LOOK • Provides a more uniform wait time than SCAN. • Variation of C-SCAN • The head moves from one end of the disk to the other. servicing • Arm only goes as far as the last request in each direction, then requests as it goes. When it reaches the other end, however, it reverses direction immediately, without first going all the way to immediately returns to the beginning of the disk, without servicing the end of the disk. any requests on the return trip. Starvation? 11/8/2018 CSC 2/456 13 11/8/2018 CSC 2/456 14 Deadline Scheduling in Linux Concurrent I/O • A regular elevator-style scheduler similar to C-LOOK • Consider two request handlers in a Web server • Additionally, all I/O requests are put into a FIFO queue with an – each accesses a different stream of sequential data (a file) on disk; expiration time (e.g., 500ms) – each reads a chunk (the buffer size) at a time; does a little CPU • When the head request in the FIFO queue expires, it will be processing; and reads the next chunk executed next (even if it is not next in line according to C-LOOK). • What happens? • A mix of performance and fairness. A thread/process A thread/process A thread/process Timeline Disk I/O CPU or waiting for I/O 11/8/2018 CSC 2/456 15 11/8/2018 CSC 2/456 16 CSC 256/456 4 Operating Systems 11/8/2018 How to Deal with It? Two Disks: Disk Striping • Aggressive prefetching • Anticipatory scheduling [Iyer & Druschel, SOSP 2001] • Blocks divided into subblocks – at the completion of an I/O request, the disk scheduler will wait a bit (despite the fact that there is other work to do), in anticipation • Subblocks stored on different disks that a new request with strong locality will be issued; schedule another request if no such new request appears before timeout – included in Linux 2.6 Disk 1 Disk 2 CPU 11/8/2018 CSC 2/456 17 18 Two Disks: Mirroring Multiple Disks: Parity Block • Have one disk contain parity bits of • Make a copy of each block on each disk blocks on other devices • Provides redundancy • Provides redundancy without full copy Disk 1 Disk 2 Disk 1 Disk 2 Disk 3 CPU CPU 19 20 CSC 256/456 5 Operating Systems 11/8/2018 Exploiting Concurrency Solid State Drives • RAID: Redundant Arrays of Independent Disks • No mechanical component (moving parts) – RAID 0: data striping at block level, no redundancy • Lower energy requirements – RAID 1: mirrored disks (100% overhead) • Speed – RAID 2: bit-level striping with parity bits, synchronized writes – Reads and writes in the order of 10s of – RAID 3: data striping at the bit level with parity disk, microseconds (reading faster than writing) synchronized writes – Erase on the order of a millisecond – RAID 4: data striping at block level with parity disk • Finite number of erase and write cycles, – RAID 5: scattered parity requiring what is called “wear leveling” – RAID 6: handles multiple disk failures 11/8/2018 CSC 2/456 21 11/8/2018 CSC 2/456 22 Flash Memory (Based on Charge) File Systems • Based on floating-gate transistor • A File system is the OS abstraction for storage resources – File is a logical storage unit in the OS abstract interface for storage resources • Extension of address space (temporary files) • Non-volatile storage that survives the execution of an individual program (persistent files) – Directory is a logical “container” for a group of files 11/8/2018 CSC 2/456 23 11/8/2018 CSC 2/456 24 CSC 256/456 6 Operating Systems 11/8/2018 Operations Supported File System Issues • Create – associate a name with a file • File naming and other attributes: • Delete – remove the file – name, size, access time, sharing/protection, • Rename – associate a new name with a file location • Open – create cached context that is associated • Intra-file structure implicitly with future reads and writes – None - sequence of words, bytes • Write – store data in a file – Complex Structures • Read – access the data associated with a file • records/formatted document/executable • Close – discard cached context • File system organization: efficiency of disk access • Seek – random access to any record or byte • Concurrent access: allow multiple processes to read/write • Map – place in address space for convenience (memory- • Reliability: integrity in the presence of failures based loads and stores), speed; disadvantages: lengths • Protection: sharing/protection attributes and access control that are not multiples of the page size, consistency with lists (ACLs) open/read/write interface 11/8/2018 CSC 2/456 25 11/8/2018 CSC 2/456 26 Naming Files Using Directory File Naming Structures • Fixed vs. variable length • Directory: maps names to files; directories may – Fixed: 8-255 characters themselves be files – Single level (flat): no two files may have the – Variable: length:value encoding same name • File extensions – system supported vs. – Two level: per-user single-level directory convention – Hierarchical: generalization of two level; each file system is assigned the root of a tree – Acyclic (or cyclic) graph: allow sharing of files across directories; hard versus soft (symbolic) links 11/8/2018 CSC 2/456 27 11/8/2018 CSC 2/456 28 CSC 256/456 7 Operating Systems 11/8/2018 Shared Files: Links File Types • File appears simultaneously in different • Control operations allowed on files directories • Use file name extensions to indicate type (in • File system is now a directed acyclic graph Unix, this is just a convention) (DAG) • Structured vs.
Recommended publications
  • Copy on Write Based File Systems Performance Analysis and Implementation
    Copy On Write Based File Systems Performance Analysis And Implementation Sakis Kasampalis Kongens Lyngby 2010 IMM-MSC-2010-63 Technical University of Denmark Department Of Informatics Building 321, DK-2800 Kongens Lyngby, Denmark Phone +45 45253351, Fax +45 45882673 [email protected] www.imm.dtu.dk Abstract In this work I am focusing on Copy On Write based file systems. Copy On Write is used on modern file systems for providing (1) metadata and data consistency using transactional semantics, (2) cheap and instant backups using snapshots and clones. This thesis is divided into two main parts. The first part focuses on the design and performance of Copy On Write based file systems. Recent efforts aiming at creating a Copy On Write based file system are ZFS, Btrfs, ext3cow, Hammer, and LLFS. My work focuses only on ZFS and Btrfs, since they support the most advanced features. The main goals of ZFS and Btrfs are to offer a scalable, fault tolerant, and easy to administrate file system. I evaluate the performance and scalability of ZFS and Btrfs. The evaluation includes studying their design and testing their performance and scalability against a set of recommended file system benchmarks. Most computers are already based on multi-core and multiple processor architec- tures. Because of that, the need for using concurrent programming models has increased. Transactions can be very helpful for supporting concurrent program- ming models, which ensure that system updates are consistent. Unfortunately, the majority of operating systems and file systems either do not support trans- actions at all, or they simply do not expose them to the users.
    [Show full text]
  • Development of a Verified Flash File System ⋆
    Development of a Verified Flash File System ? Gerhard Schellhorn, Gidon Ernst, J¨orgPf¨ahler,Dominik Haneberg, and Wolfgang Reif Institute for Software & Systems Engineering University of Augsburg, Germany fschellhorn,ernst,joerg.pfaehler,haneberg,reifg @informatik.uni-augsburg.de Abstract. This paper gives an overview over the development of a for- mally verified file system for flash memory. We describe our approach that is based on Abstract State Machines and incremental modular re- finement. Some of the important intermediate levels and the features they introduce are given. We report on the verification challenges addressed so far, and point to open problems and future work. We furthermore draw preliminary conclusions on the methodology and the required tool support. 1 Introduction Flaws in the design and implementation of file systems already lead to serious problems in mission-critical systems. A prominent example is the Mars Explo- ration Rover Spirit [34] that got stuck in a reset cycle. In 2013, the Mars Rover Curiosity also had a bug in its file system implementation, that triggered an au- tomatic switch to safe mode. The first incident prompted a proposal to formally verify a file system for flash memory [24,18] as a pilot project for Hoare's Grand Challenge [22]. We are developing a verified flash file system (FFS). This paper reports on our progress and discusses some of the aspects of the project. We describe parts of the design, the formal models, and proofs, pointing out challenges and solutions. The main characteristic of flash memory that guides the design is that data cannot be overwritten in place, instead space can only be reused by erasing whole blocks.
    [Show full text]
  • DASH: Database Shadowing for Mobile DBMS
    DASH: Database Shadowing for Mobile DBMS Youjip Won1 Sundoo Kim2 Juseong Yun2 Dam Quang Tuan2 Jiwon Seo2 1KAIST, Daejeon, Korea 2Hanyang University, Seoul, Korea [email protected] [email protected] ABSTRACT 1. INTRODUCTION In this work, we propose Database Shadowing, or DASH, Crash recovery is a vital part of DBMS design. Algorithms which is a new crash recovery technique for SQLite DBMS. for crash recovery range from naive full-file shadowing [15] DASH is a hybrid mixture of classical shadow paging and to the sophisticated ARIES protocol [38]. Most enterprise logging. DASH addresses four major issues in the current DBMS's, e.g., IBM DB2, Informix, Micrsoft SQL and Oracle SQLite journal modes: the performance and write amplifi- 8, use ARIES or its variants for efficient concurrency control. cation issues of the rollback mode and the storage space re- SQLite is one of the most widely used DBMS's. It is quirement and tail latency issues of the WAL mode. DASH deployed on nearly all computing platform such as smart- exploits two unique characteristics of SQLite: the database phones (e.g, Android, Tizen, Firefox, and iPhone [52]), dis- files are small and the transactions are entirely serialized. tributed filesystems (e.g., Ceph [58] and Gluster filesys- DASH consists of three key ingredients Aggregate Update, tem [1]), wearable devices (e.g., smart watch [4, 21]), and Atomic Exchange and Version Reset. Aggregate Update elim- automobiles [19, 55]. As a library-based embedded DBMS, inates the redundant write overhead and the requirement to SQLite deliberately adopts a basic transaction management maintain multiple snapshots both of which are inherent in and crash recovery scheme.
    [Show full text]
  • CS 5600 Computer Systems
    CS 5600 Computer Systems Lecture 10: File Systems What are We Doing Today? • Last week we talked extensively about hard drives and SSDs – How they work – Performance characterisEcs • This week is all about managing storage – Disks/SSDs offer a blank slate of empty blocks – How do we store files on these devices, and keep track of them? – How do we maintain high performance? – How do we maintain consistency in the face of random crashes? 2 • ParEEons and MounEng • Basics (FAT) • inodes and Blocks (ext) • Block Groups (ext2) • Journaling (ext3) • Extents and B-Trees (ext4) • Log-based File Systems 3 Building the Root File System • One of the first tasks of an OS during bootup is to build the root file system 1. Locate all bootable media – Internal and external hard disks – SSDs – Floppy disks, CDs, DVDs, USB scks 2. Locate all the parEEons on each media – Read MBR(s), extended parEEon tables, etc. 3. Mount one or more parEEons – Makes the file system(s) available for access 4 The Master Boot Record Address Size Descripon Hex Dec. (Bytes) Includes the starEng 0x000 0 Bootstrap code area 446 LBA and length of 0x1BE 446 ParEEon Entry #1 16 the parEEon 0x1CE 462 ParEEon Entry #2 16 0x1DE 478 ParEEon Entry #3 16 0x1EE 494 ParEEon Entry #4 16 0x1FE 510 Magic Number 2 Total: 512 ParEEon 1 ParEEon 2 ParEEon 3 ParEEon 4 MBR (ext3) (swap) (NTFS) (FAT32) Disk 1 ParEEon 1 MBR (NTFS) 5 Disk 2 Extended ParEEons • In some cases, you may want >4 parEEons • Modern OSes support extended parEEons Logical Logical ParEEon 1 ParEEon 2 Ext.
    [Show full text]
  • Redbooks Paper Linux on IBM Zseries and S/390
    Redbooks Paper Simon Williams Linux on IBM zSeries and S/390: TCP/IP Broadcast on z/VM Guest LAN Preface This Redpaper provides information to help readers plan for and exploit Internet Protocol (IP) broadcast support that was made available to z/VM Guest LAN environments with the introduction of the z/VM 4.3 Operating System. Using IP broadcast support, Linux guests can for the first time use DHCP to lease an IP address dynamically from a DHCP server in a z/VM Guest LAN environment. This frees the administrator from the previous method of having to hardcode an IP address for every Linux guest in the system. This new feature enables easier deployment and administration of large-scale Linux environments. Objectives The objectives of this paper are to: Review the z/VM Guest LAN environment Explain IP broadcast Introduce the Dynamic Host Configuration Protocol (DHCP) Explain how DHCP works in a z/VM Guest LAN Describe how to implement DHCP in a z/VM Guest LAN environment © Copyright IBM Corp. 2003. All rights reserved. ibm.com/redbooks 1 z/VM Guest LAN Attention: While broadcast support for z/VM Guest LANs was announced with the base z/VM 4.3 operating system, the user must apply the PTF for APAR VM63172. This APAR resolves several issues which have been found to inhibit the use of DHCP by Linux-based applications running over the z/VM Guest LAN (in simulated QDIO mode). Introduction Prior to z/VM 4.2, virtual connectivity options for connecting one or more virtual machines (VM guests) was limited to virtual channel-to-channel adapters (CTCA) and the Inter-User Communications Vehicle (IUCV) facility.
    [Show full text]
  • Ext4 File System and Crash Consistency
    1 Ext4 file system and crash consistency Changwoo Min 2 Summary of last lectures • Tools: building, exploring, and debugging Linux kernel • Core kernel infrastructure • Process management & scheduling • Interrupt & interrupt handler • Kernel synchronization • Memory management • Virtual file system • Page cache and page fault 3 Today: ext4 file system and crash consistency • File system in Linux kernel • Design considerations of a file system • History of file system • On-disk structure of Ext4 • File operations • Crash consistency 4 File system in Linux kernel User space application (ex: cp) User-space Syscalls: open, read, write, etc. Kernel-space VFS: Virtual File System Filesystems ext4 FAT32 JFFS2 Block layer Hardware Embedded Hard disk USB drive flash 5 What is a file system fundamentally? int main(int argc, char *argv[]) { int fd; char buffer[4096]; struct stat_buf; DIR *dir; struct dirent *entry; /* 1. Path name -> inode mapping */ fd = open("/home/lkp/hello.c" , O_RDONLY); /* 2. File offset -> disk block address mapping */ pread(fd, buffer, sizeof(buffer), 0); /* 3. File meta data operation */ fstat(fd, &stat_buf); printf("file size = %d\n", stat_buf.st_size); /* 4. Directory operation */ dir = opendir("/home"); entry = readdir(dir); printf("dir = %s\n", entry->d_name); return 0; } 6 Why do we care EXT4 file system? • Most widely-deployed file system • Default file system of major Linux distributions • File system used in Google data center • Default file system of Android kernel • Follows the traditional file system design 7 History of file system design 8 UFS (Unix File System) • The original UNIX file system • Design by Dennis Ritche and Ken Thompson (1974) • The first Linux file system (ext) and Minix FS has a similar layout 9 UFS (Unix File System) • Performance problem of UFS (and the first Linux file system) • Especially, long seek time between an inode and data block 10 FFS (Fast File System) • The file system of BSD UNIX • Designed by Marshall Kirk McKusick, et al.
    [Show full text]
  • Improving Journaling File System Performance in Virtualization
    SOFTWARE – PRACTICE AND EXPERIENCE Softw. Pract. Exper. 2012; 42:303–330 Published online 30 March 2011 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/spe.1069 VM aware journaling: improving journaling file system performance in virtualization environments Ting-Chang Huang1 and Da-Wei Chang2,∗,† 1Department of Computer Science, National Chiao Tung University, No. 1001, University Road, Hsinchu 300, Taiwan 2Department of Computer Science and Information Engineering, National Cheng Kung University, No. 1, Ta-Hsueh Road, Tainan 701, Taiwan SUMMARY Journaling file systems, which are widely used in modern operating systems, guarantee file system consistency and data integrity by logging file system updates to a journal, which is a reserved space on the storage, before the updates are written to the data storage. Such journal writes increase the write traffic to the storage and thus degrade the file system performance, especially in full data journaling, which logs both metadata and data updates. In this paper, a new journaling approach is proposed to eliminate journal writes in server virtualization environments, which are gaining in popularity in server platforms. Based on reliable hardware subsystems and virtual machine monitor (VMM), the proposed approach eliminates journal writes by retaining journal data (i.e. logged file system updates) in the memory of each virtual machine and ensuring the integrity of these journal data through cooperation between the journaling file systems and the VMM. We implement the proposed approach in Linux ext3 in the Xen virtualization environment. According to the performance results, a performance improvement of up to 50.9% is achieved over the full data journaling approach of ext3 due to journal write elimination.
    [Show full text]
  • ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency
    ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency Thanos Makatos∗†, Yannis Klonatos∗†, Manolis Marazakis∗, Michail D. Flouris∗, and Angelos Bilas∗† ∗ Foundation for Research and Technology - Hellas (FORTH) Institute of Computer Science (ICS) 100 N. Plastira Av., Vassilika Vouton, Heraklion, GR-70013, Greece † Department of Computer Science, University of Crete P.O. Box 2208, Heraklion, GR 71409, Greece {mcatos, klonatos, maraz, flouris, bilas}@ics.forth.gr Abstract—In this work we examine how transparent compres- • Logical to physical block mapping: Block-level compres- sion in the I/O path can improve space efficiency for online sion imposes a many-to-one mapping from logical to storage. We extend the block layer with the ability to compress physical blocks, as multiple compressed logical blocks and decompress data as they flow between the file-system and the disk. Achieving transparent compression requires extensive must be stored in the same physical block. This requires metadata management for dealing with variable block sizes, dy- using a translation mechanism that imposes low overhead namic block mapping, block allocation, explicit work scheduling in the common I/O path and scales with the capacity and I/O optimizations to mitigate the impact of additional I/Os of the underlying devices as well as a block alloca- and compression overheads. Preliminary results show that online tion/deallocation mechanism that affects data placement. transparent compression is a viable option for improving effective • storage capacity, it can improve I/O performance by reducing Increased number of I/Os: Using compression increases I/O traffic and seek distance, and has a negative impact on the number of I/Os required on the critical path during performance only when single-thread I/O latency is critical.
    [Show full text]
  • CS 152: Computer Systems Architecture Storage Technologies
    CS 152: Computer Systems Architecture Storage Technologies Sang-Woo Jun Winter 2019 Storage Used To be a Secondary Concern Typically, storage was not a first order citizen of a computer system o As alluded to by its name “secondary storage” o Its job was to load programs and data to memory, and disappear o Most applications only worked with CPU and system memory (DRAM) o Extreme applications like DBMSs were the exception Because conventional secondary storage was very slow o Things are changing! Some (Pre)History Magnetic core memory Rope memory (ROM) 1960’s Drum memory 1950~1970s 72 KiB per cubic foot! 100s of KiB (1024 bits in photo) Hand-woven to program the 1950’s Apollo guidance computer Photos from Wikipedia Some (More Recent) History Floppy disk drives 1970’s~2000’s 100 KiBs to 1.44 MiB Hard disk drives 1950’s to present MBs to TBs Photos from Wikipedia Some (Current) History Solid State Drives Non-Volatile Memory 2000’s to present 2010’s to present GB to TBs GBs Hard Disk Drives Dominant storage medium for the longest time o Still the largest capacity share Data organized into multiple magnetic platters o Mechanical head needs to move to where data is, to read it o Good sequential access, terrible random access • 100s of MB/s sequential, maybe 1 MB/s 4 KB random o Time for the head to move to the right location (“seek time”) may be ms long • 1000,000s of cycles! Typically “ATA” (Including IDE and EIDE), and later “SATA” interfaces o Connected via “South bridge” chipset Ding Yuan, “Operating Systems ECE344 Lecture 11: File
    [Show full text]
  • AMD Alchemy™ Processors Building a Root File System for Linux® Incorporating Memory Technology Devices
    AMD Alchemy™ Processors Building a Root File System for Linux® Incorporating Memory Technology Devices 1.0 Scope This document outlines a step-by-step process for building and deploying a Flash-based root file system for Linux® on an AMD Alchemy™ processor-based development board, using an approach that incorporates Memory Technology Devices (MTDs) with the JFFS2 file system. Note: This document describes creating a root file system on NOR Flash memory devices, and does not apply to NAND Flash devices. 1.1 Journaling Flash File System JFFS2 is the second generation of the Journaling Flash File System (JFFS). This file system provides a crash-safe and powerdown-safe Linux file system for use with Flash memory devices. The home page for the JFFS project is located at http://developer.axis.com/software/jffs. 1.2 Memory Technology Device The MTD subsystem provides a generic Linux driver for a wide range of memory devices, including Flash memory devices. This driver creates an abstracted device used by JFFS2 to interface to the actual Flash memory hardware. The home page for the MTD project is located at http://www.linux-mtd.infradead.org. 2.0 Building the Root File System Before being deployed to an AMD Alchemy platform, the file system must first be built on an x86 Linux host PC. The pri- mary concern when building a Flash-based root file system is often the size of the image. The file system must be designed so that it fits within the available space of the Flash memory, with enough extra space to accommodate any runtime-created files, such as temporary or log files.
    [Show full text]
  • Recursive Updates in Copy-On-Write File Systems - Modeling and Analysis
    2342 JOURNAL OF COMPUTERS, VOL. 9, NO. 10, OCTOBER 2014 Recursive Updates in Copy-on-write File Systems - Modeling and Analysis Jie Chen*, Jun Wang†, Zhihu Tan*, Changsheng Xie* *School of Computer Science and Technology Huazhong University of Science and Technology, China *Wuhan National Laboratory for Optoelectronics, Wuhan, Hubei 430074, China [email protected], {stan, cs_xie}@hust.edu.cn †Dept. of Electrical Engineering and Computer Science University of Central Florida, Orlando, Florida 32826, USA [email protected] Abstract—Copy-On-Write (COW) is a powerful technique recursive update. Recursive updates can lead to several for data protection in file systems. Unfortunately, it side effects to a storage system, such as write introduces a recursively updating problem, which leads to a amplification (also can be referred as additional writes) side effect of write amplification. Studying the behaviors of [4], I/O pattern alternation [5], and performance write amplification is important for designing, choosing and degradation [6]. This paper focuses on the side effects of optimizing the next generation file systems. However, there are many difficulties for evaluation due to the complexity of write amplification. file systems. To solve this problem, we proposed a typical Studying the behaviors of write amplification is COW file system model based on BTRFS, verified its important for designing, choosing, and optimizing the correctness through carefully designed experiments. By next generation file systems, especially when the file analyzing this model, we found that write amplification is systems uses a flash-memory-based underlying storage greatly affected by the distributions of files being accessed, system under online transaction processing (OLTP) which varies from 1.1x to 4.2x.
    [Show full text]
  • F2punifycr: a Flash-Friendly Persistent Burst-Buffer File System
    F2PUnifyCR: A Flash-friendly Persistent Burst-Buffer File System ThanOS Department of Computer Science Florida State University Tallahassee, United States I. ABSTRACT manifold depending on the workloads it is handling for With the increased amount of supercomputing power, applications. In order to leverage the capabilities of burst it is now possible to work with large scale data that buffers to the utmost level, it is very important to have a pose a continuous opportunity for exascale computing standardized software interface across systems. It has to that puts immense pressure on underlying persistent data deal with an immense amount of data during the runtime storage. Burst buffers, a distributed array of node-local of the applications. persistent flash storage devices deployed on most of Using node-local burst buffer can achieve scalable the leardership supercomputers, are means to efficiently write bandwidth as it lets each process write to the handling the bursty I/O invoked through cutting-edge local flash drive, but when the files are shared across scientific applications. In order to manage these burst many processes, it puts the management of metadata buffers, many ephemeral user level file system solutions, and object data of the files under huge challenge. In like UnifyCR, are present in the research and industry order to handle all the challenges posed by the bursty arena. Because of the intrinsic nature of the flash devices and random I/O requests by the Scientific Applica- due to background processing overhead, like Garbage tions running on leadership Supercomputing clusters, Collection, peak write bandwidth is hard to get.
    [Show full text]