Totalcow: Unleash the Power of Copy-On-Write for Thin-Provisioned Containers
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Copy on Write Based File Systems Performance Analysis and Implementation
Copy On Write Based File Systems Performance Analysis And Implementation Sakis Kasampalis Kongens Lyngby 2010 IMM-MSC-2010-63 Technical University of Denmark Department Of Informatics Building 321, DK-2800 Kongens Lyngby, Denmark Phone +45 45253351, Fax +45 45882673 [email protected] www.imm.dtu.dk Abstract In this work I am focusing on Copy On Write based file systems. Copy On Write is used on modern file systems for providing (1) metadata and data consistency using transactional semantics, (2) cheap and instant backups using snapshots and clones. This thesis is divided into two main parts. The first part focuses on the design and performance of Copy On Write based file systems. Recent efforts aiming at creating a Copy On Write based file system are ZFS, Btrfs, ext3cow, Hammer, and LLFS. My work focuses only on ZFS and Btrfs, since they support the most advanced features. The main goals of ZFS and Btrfs are to offer a scalable, fault tolerant, and easy to administrate file system. I evaluate the performance and scalability of ZFS and Btrfs. The evaluation includes studying their design and testing their performance and scalability against a set of recommended file system benchmarks. Most computers are already based on multi-core and multiple processor architec- tures. Because of that, the need for using concurrent programming models has increased. Transactions can be very helpful for supporting concurrent program- ming models, which ensure that system updates are consistent. Unfortunately, the majority of operating systems and file systems either do not support trans- actions at all, or they simply do not expose them to the users. -
The Page Cache Today’S Lecture RCU File System Networking(Kernel Level Syncmem
2/21/20 COMP 790: OS Implementation COMP 790: OS Implementation Logical Diagram Binary Memory Threads Formats Allocators User System Calls Kernel The Page Cache Today’s Lecture RCU File System Networking(kernel level Syncmem. Don Porter management) Memory Device CPU Management Drivers Scheduler Hardware Interrupts Disk Net Consistency 1 2 1 2 COMP 790: OS Implementation COMP 790: OS Implementation Recap of previous lectures Background • Page tables: translate virtual addresses to physical • Lab2: Track physical pages with an array of PageInfo addresses structs • VM Areas (Linux): track what should be mapped at in – Contains reference counts the virtual address space of a process – Free list layered over this array • Hoard/Linux slab: Efficient allocation of objects from • Just like JOS, Linux represents physical memory with a superblock/slab of pages an array of page structs – Obviously, not the exact same contents, but same idea • Pages can be allocated to processes, or to cache file data in memory 3 4 3 4 COMP 790: OS Implementation COMP 790: OS Implementation Today’s Problem The address space abstraction • Given a VMA or a file’s inode, how do I figure out • Unifying abstraction: which physical pages are storing its data? – Each file inode has an address space (0—file size) • Next lecture: We will go the other way, from a – So do block devices that cache data in RAM (0---dev size) physical page back to the VMA or file inode – The (anonymous) virtual memory of a process has an address space (0—4GB on x86) • In other words, all page -
CERIAS Tech Report 2017-5 Deceptive Memory Systems by Christopher N
CERIAS Tech Report 2017-5 Deceptive Memory Systems by Christopher N. Gutierrez Center for Education and Research Information Assurance and Security Purdue University, West Lafayette, IN 47907-2086 DECEPTIVE MEMORY SYSTEMS ADissertation Submitted to the Faculty of Purdue University by Christopher N. Gutierrez In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy December 2017 Purdue University West Lafayette, Indiana ii THE PURDUE UNIVERSITY GRADUATE SCHOOL STATEMENT OF DISSERTATION APPROVAL Dr. Eugene H. Spa↵ord, Co-Chair Department of Computer Science Dr. Saurabh Bagchi, Co-Chair Department of Computer Science Dr. Dongyan Xu Department of Computer Science Dr. Mathias Payer Department of Computer Science Approved by: Dr. Voicu Popescu by Dr. William J. Gorman Head of the Graduate Program iii This work is dedicated to my wife, Gina. Thank you for all of your love and support. The moon awaits us. iv ACKNOWLEDGMENTS Iwould liketothank ProfessorsEugeneSpa↵ord and SaurabhBagchi for their guidance, support, and advice throughout my time at Purdue. Both have been instru mental in my development as a computer scientist, and I am forever grateful. I would also like to thank the Center for Education and Research in Information Assurance and Security (CERIAS) for fostering a multidisciplinary security culture in which I had the privilege to be part of. Special thanks to Adam Hammer and Ronald Cas tongia for their technical support and Thomas Yurek for his programming assistance for the experimental evaluation. I am grateful for the valuable feedback provided by the members of my thesis committee, Professor Dongyen Xu, and Professor Math ias Payer. -
HTTP-FUSE Xenoppix
HTTP-FUSE Xenoppix Kuniyasu Suzaki† Toshiki Yagi† Kengo Iijima† Kenji Kitagawa†† Shuichi Tashiro††† National Institute of Advanced Industrial Science and Technology† Alpha Systems Inc.†† Information-Technology Promotion Agency, Japan††† {k.suzaki,yagi-toshiki,k-iijima}@aist.go.jp [email protected], [email protected] Abstract a CD-ROM. Furthermore it requires remaking the entire CD-ROM when a bit of data is up- dated. The other solution is a Virtual Machine We developed “HTTP-FUSE Xenoppix” which which enables us to install many OSes and ap- boots Linux, Plan9, and NetBSD on Virtual plications easily. However, that requires in- Machine Monitor “Xen” with a small bootable stalling virtual machine software. (6.5MB) CD-ROM. The bootable CD-ROM in- cludes boot loader, kernel, and miniroot only We have developed “Xenoppix” [1], which and most part of files are obtained via Internet is a combination of CD/DVD bootable Linux with network loopback device HTTP-FUSE “KNOPPIX” [2] and Virtual Machine Monitor CLOOP. It is made from cloop (Compressed “Xen” [3, 4]. Xenoppix boots Linux (KNOP- Loopback block device) and FUSE (Filesys- PIX) as Host OS and NetBSD or Plan9 as Guest tem USErspace). HTTP-FUSE CLOOP can re- OS with a bootable DVD only. KNOPPIX construct a block device from many small block is advanced in automatic device detection and files of HTTP servers. In this paper we describe driver integration. It prepares the Xen environ- the detail of the implementation and its perfor- ment and Guest OSes don’t need to worry about mance. lack of device drivers. -
Dynamically Tuning the JFS Cache for Your Job Sjoerd Visser Dynamically Tuning the JFS Cache for Your Job Sjoerd Visser
Dynamically Tuning the JFS Cache for Your Job Sjoerd Visser Dynamically Tuning the JFS Cache for Your Job Sjoerd Visser The purpose of this presentation is the explanation of: IBM JFS goals: Where was Journaled File System (JFS) designed for? JFS cache design: How the JFS File System and Cache work. JFS benchmarking: How to measure JFS performance under OS/2. JFS cache tuning: How to optimize JFS performance for your job. What do these settings say to you? [E:\]cachejfs SyncTime: 8 seconds MaxAge: 30 seconds BufferIdle: 6 seconds Cache Size: 400000 kbytes Min Free buffers: 8000 ( 32000 K) Max Free buffers: 16000 ( 64000 K) Lazy Write is enabled Do you have a feeling for this? Do you understand the dynamic cache behaviour of the JFS cache? Or do you just rely on the “proven” cachejfs settings that the eCS installation presented to you? Do you realise that the JFS cache behaviour may be optimized for your jobs? November 13, 2009 / page 2 Dynamically Tuning the JFS Cache for Your Job Sjoerd Visser Where was Journaled File System (JFS) designed for? 1986 Advanced Interactive eXecutive (AIX) v.1 based on UNIX System V. for IBM's RT/PC. 1990 JFS1 on AIX was introduced with AIX version 3.1 for the RS/6000 workstations and servers using 32-bit and later 64-bit IBM POWER or PowerPC RISC CPUs. 1994 JFS1 was adapted for SMP servers (AIX 4) with more CPU power, many hard disks and plenty of RAM for cache and buffers. 1995-2000 JFS(2) (revised AIX independent version in c) was ported to OS/2 4.5 (1999) and Linux (2000) and also was the base code of the current JFS2 on AIX branch. -
CIS 191 Linux Lab Exercise
CIS 191 Linux Lab Exercise Lab 9: Backup and Restore Fall 2008 Lab 9: Backup and Restore The purpose of this lab is to explore the different types of backups and methods of restoring them. We will look at three utilities often used for backups and learn the advantages and disadvantages of each. Supplies • VMWare Server 1.05 or higher • Benji VM Preconfiguration • Labs 6 and Labs 7 completed on a pristine Benji VM [root@benji /]# fdisk -l Disk /dev/sda: 5368 MB, 5368709120 bytes 255 heads, 63 sectors/track, 652 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 382 3068383+ 83 Linux /dev/sda2 383 447 522112+ 82 Linux swap / Solaris /dev/sda3 448 511 514080 83 Linux /dev/sda4 512 652 1132582+ 5 Extended /dev/sda5 512 549 305203+ 83 Linux /dev/sda6 550 556 56196 83 Linux /dev/sda7 557 581 200781 83 Linux [root@benji /]# [root@benji /]# mount /dev/sda1 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) /dev/sda5 on /opt type ext3 (rw) /dev/sda3 on /var type ext3 (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) /dev/sda7 on /home type ext3 (rw,usrquota) [root@benji /]# cat /etc/fstab LABEL=/1 / ext3 defaults 1 1 devpts /dev/pts devpts gid=5,mode=620 0 0 tmpfs /dev/shm tmpfs defaults 0 0 LABEL=/opt /opt ext3 defaults 1 2 proc /proc proc defaults 0 0 sysfs /sys sysfs defaults 0 0 LABEL=/var /var ext3 defaults 1 2 LABEL=SWAP-sda2 swap swap defaults 0 0 LABEL=/home /home ext3 usrquota,defaults 1 2 [root@benji /]# Forum If you get stuck on one of the steps below don’t beat your head against the wall. -
NOVA: a Log-Structured File System for Hybrid Volatile/Non
NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories Jian Xu and Steven Swanson, University of California, San Diego https://www.usenix.org/conference/fast16/technical-sessions/presentation/xu This paper is included in the Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST ’16). February 22–25, 2016 • Santa Clara, CA, USA ISBN 978-1-931971-28-7 Open access to the Proceedings of the 14th USENIX Conference on File and Storage Technologies is sponsored by USENIX NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories Jian Xu Steven Swanson University of California, San Diego Abstract Hybrid DRAM/NVMM storage systems present a host of opportunities and challenges for system designers. These sys- Fast non-volatile memories (NVMs) will soon appear on tems need to minimize software overhead if they are to fully the processor memory bus alongside DRAM. The result- exploit NVMM’s high performance and efficiently support ing hybrid memory systems will provide software with sub- more flexible access patterns, and at the same time they must microsecond, high-bandwidth access to persistent data, but provide the strong consistency guarantees that applications managing, accessing, and maintaining consistency for data require and respect the limitations of emerging memories stored in NVM raises a host of challenges. Existing file sys- (e.g., limited program cycles). tems built for spinning or solid-state disks introduce software Conventional file systems are not suitable for hybrid mem- overheads that would obscure the performance that NVMs ory systems because they are built for the performance char- should provide, but proposed file systems for NVMs either in- acteristics of disks (spinning or solid state) and rely on disks’ cur similar overheads or fail to provide the strong consistency consistency guarantees (e.g., that sector updates are atomic) guarantees that applications require. -
Connecting the Storage System to the Solaris Host
Configuration Guide for Solaris™ Host Attachment Hitachi Virtual Storage Platform Hitachi Universal Storage Platform V/VM FASTFIND LINKS Document Organization Product Version Getting Help Contents MK-96RD632-05 Copyright © 2010 Hitachi, Ltd., all rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, or stored in a database or retrieval system for any purpose without the express written permission of Hitachi, Ltd. (hereinafter referred to as “Hitachi”) and Hitachi Data Systems Corporation (hereinafter referred to as “Hitachi Data Systems”). Hitachi Data Systems reserves the right to make changes to this document at any time without notice and assumes no responsibility for its use. This document contains the most current information available at the time of publication. When new and/or revised information becomes available, this entire document will be updated and distributed to all registered users. All of the features described in this document may not be currently available. Refer to the most recent product announcement or contact your local Hitachi Data Systems sales office for information about feature and product availability. Notice: Hitachi Data Systems products and services can be ordered only under the terms and conditions of the applicable Hitachi Data Systems agreement(s). The use of Hitachi Data Systems products is governed by the terms of your agreement(s) with Hitachi Data Systems. Hitachi is a registered trademark of Hitachi, Ltd. in the United States and other countries. Hitachi Data Systems is a registered trademark and service mark of Hitachi, Ltd. -
Hibachi: a Cooperative Hybrid Cache with NVRAM and DRAM for Storage Arrays
Hibachi: A Cooperative Hybrid Cache with NVRAM and DRAM for Storage Arrays Ziqi Fan, Fenggang Wu, Dongchul Parkx, Jim Diehl, Doug Voigty, and David H.C. Du University of Minnesota–Twin Cities, xIntel Corporation, yHewlett Packard Enterprise Email: [email protected], ffenggang, park, [email protected], [email protected], [email protected] Abstract—Adopting newer non-volatile memory (NVRAM) Application Application Application technologies as write buffers in slower storage arrays is a promising approach to improve write performance. However, due OS … OS … OS to DRAM’s merits, including shorter access latency and lower Buffer/Cache Buffer/Cache Buffer/Cache cost, it is still desirable to incorporate DRAM as a read cache along with NVRAM. Although numerous cache policies have been Application, web or proposed, most are either targeted at main memory buffer caches, database servers or manage NVRAM as write buffers and separately manage DRAM as read caches. To the best of our knowledge, cooperative hybrid volatile and non-volatile memory buffer cache policies specifically designed for storage systems using newer NVRAM Storage Area technologies have not been well studied. Network (SAN) This paper, based on our elaborate study of storage server Hibachi Cache block I/O traces, proposes a novel cooperative HybrId NVRAM DRAM NVRAM and DRAM Buffer cACHe polIcy for storage arrays, named Hibachi. Hibachi treats read cache hits and write cache hits differently to maximize cache hit rates and judiciously adjusts the clean and the dirty cache sizes to capture workloads’ tendencies. In addition, it converts random writes to sequential writes for high disk write throughput and further exploits storage server I/O workload characteristics to improve read performance. -
Managing File Systems in Oracle® Solaris 11.4
® Managing File Systems in Oracle Solaris 11.4 Part No: E61016 November 2020 Managing File Systems in Oracle Solaris 11.4 Part No: E61016 Copyright © 2004, 2020, Oracle and/or its affiliates. License Restrictions Warranty/Consequential Damages Disclaimer This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. Warranty Disclaimer The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. Restricted Rights Notice If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or "commercial -
DMFS - a Data Migration File System for Netbsd
DMFS - A Data Migration File System for NetBSD William Studenmund Veridian MRJ Technology Solutions NASAAmes Research Center" Abstract It was designed to support the mass storage systems de- ployed here at NAS under the NAStore 2 system. That system supported a total of twenty StorageTek NearLine ! have recently developed DMFS, a Data Migration File tape silos at two locations, each with up to four tape System, for NetBSD[I]. This file system provides ker- drives each. Each silo contained upwards of 5000 tapes, nel support for the data migration system being devel- and had robotic pass-throughs to adjoining silos. oped by my research group at NASA/Ames. The file system utilizes an underlying file store to provide the file The volman system is designed using a client-server backing, and coordinates user and system access to the model, and consists of three main components: the vol- files. It stores its internal metadata in a flat file, which man master, possibly multiple volman servers, and vol- resides on a separate file system. This paper will first man clients. The volman servers connect to each tape describe our data migration system to provide a context silo, mount and unmount tapes at the direction of the for DMFS, then it will describe DMFS. It also will de- volman master, and provide tape services to clients. The scribe the changes to NetBSD needed to make DMFS volman master maintains a database of known tapes and work. Then it will give an overview of the file archival locations, and directs the tape servers to move and mount and restoration procedures, and describe how some typi- tapes to service client requests. -
Unionfs: User- and Community-Oriented Development of a Unification File System
Unionfs: User- and Community-Oriented Development of a Unification File System David Quigley, Josef Sipek, Charles P. Wright, and Erez Zadok Stony Brook University {dquigley,jsipek,cwright,ezk}@cs.sunysb.edu Abstract If a file exists in multiple branches, the user sees only the copy in the higher-priority branch. Unionfs allows some branches to be read-only, Unionfs is a stackable file system that virtually but as long as the highest-priority branch is merges a set of directories (called branches) read-write, Unionfs uses copy-on-write seman- into a single logical view. Each branch is as- tics to provide an illusion that all branches are signed a priority and may be either read-only writable. This feature allows Live-CD develop- or read-write. When the highest priority branch ers to give their users a writable system based is writable, Unionfs provides copy-on-write se- on read-only media. mantics for read-only branches. These copy- on-write semantics have lead to widespread There are many uses for namespace unifica- use of Unionfs by LiveCD projects including tion. The two most common uses are Live- Knoppix and SLAX. In this paper we describe CDs and diskless/NFS-root clients. On Live- our experiences distributing and maintaining CDs, by definition, the data is stored on a read- an out-of-kernel module since November 2004. only medium. However, it is very convenient As of March 2006 Unionfs has been down- for users to be able to modify the data. Uni- loaded by over 6,700 unique users and is used fying the read-only CD with a writable RAM by over two dozen other projects.