University of Warwick Institutional Repository

Total Page:16

File Type:pdf, Size:1020Kb

University of Warwick Institutional Repository MENS T A T A G I MOLEM U N IS IV S E EN RS IC ITAS WARW Monitoring, Analysis and Optimisation of I/O in Parallel Applications by Steven Alexander Wright A thesis submitted to The University of Warwick in partial fulfilment of the requirements for admission to the degree of Doctor of Philosophy Department of Computer Science The University of Warwick July 2014 Abstract High performance computing (HPC) is changing the way science is performed in the 21st Century; experiments that once took enormous amounts of time, were dangerous and often produced inaccurate results can now be performed and refined in a fraction of the time in a simulation environment. Current gen- eration supercomputers are running in excess of 1016 floating point operations per second, and the push towards exascale will see this increase by two orders of magnitude. To achieve this level of performance it is thought that applica- tions may have to scale to potentially billions of simultaneous threads, pushing hardware to its limits and severely impacting failure rates. To reduce the cost of these failures, many applications use checkpointing to periodically save their state to persistent storage, such that, in the event of a failure, computation can be restarted without significant data loss. As computational power has grown by approximately 2 every 18 24 months, ⇥ − persistent storage has lagged behind; checkpointing is fast becoming a bottleneck to performance. Several software and hardware solutions have been presented to solve the current I/O problem being experienced in the HPC community and this thesis examines some of these. Specifically, this thesis presents a tool designed for analysing and optimising the I/O behaviour of scientific applications, as well as a tool designed to allow the rapid analysis of one software solution to the problem of parallel I/O, namely the parallel log-structured file system (PLFS). This thesis ends with an analysis of a modern Lustre file system under contention from multiple applications and multiple compute nodes running the same problem through PLFS. The results and analysis presented outline a framework through which application settings and procurement decisions can be made. ii This thesis is dedicated to the memory of my Grandad. Ian Haig Henderson (1928 – 2014) Acknowledgements Since walking into the Department of Computer Science for the first time in October 2006, I have been fortunate enough to meet, work with and enjoy the company of many special people. First and foremost, I would like to thank my supervisor, Professor Stephen Jarvis, for all his help and hard work over the past 4 years and for allowing me the opportunity to undertake a Ph. D. I would also like to thank him for providing me with funding and a post-doctoral position in the department. Secondly, I would like to thank the two best office-mates I’m ever likely to have, Dr. Simon Hammond and Dr. John Pennycook. I have been in a lull since each of them moved on to pastures new. I consider my time sharing an office with each of them to be both the most productive (with Si) and most entertaining (with John) time in my Warwick career, and I miss their daily company dearly. Thirdly, I acknowledge my current office-mates, Robert Bird and Richard Bunt. Both provide nice light entertainment and interesting discussions to break up the day and make my working environment a more pleasant place to spend my time. Additionally, I thank the other members of the High Performance and Scientific Computing group, past and present – David Beckingsale, Dr. Adam Chester, Peter Coetzee, James Davis, Timothy Law, Andy Mallinson, Dr. Gihan Mudalige, Dr. Oliver Perks and Stephen Roberts – for their lunchtime company and occasional pointless discussions. Finally, within the University, I would like to thank the group of individuals that have helped me through both my undergraduate and postgraduate studies – Dr. Abhir Bhalerao, Jane Clarke, Dr. Matt Ismail, Dr. Arshad Jhumka, Dr. Matthew Leeke, Dr. Christine Leigh, Prof. Chang-Tsun Li, Rod Moore, Dr. Roger Packwood, Catherine Pillet, Jackie Pinks, Gill Reeves-Brown, Phillip iv Taylor, Stuart Valentine, Dr. Justin Ward, Paul Williamson, amongst many others. Outside of University, thanks go to the organisations that have contributed resources and expertise to much of the material in this thesis: the team at the Lawrence Livermore National Laboratory for allowing access to the Sierra and Cab supercomputers; the team at Daresbury Laboratory for granting time on their IBM BlueGene/P; and finally, to both Meghan Wingate-McLelland, at Xyratex, and John Bent, at EMC2, for contributing their time and expertise to my investigations into the parallel log-structured file system. Special thanks is reserved for my closest friend for four years of undergrad- uate studies and current snowboarding partner, Chris Stra↵on. I only wish our snowboarding excursions were both longer and more frequent; they’re currently the week of the year I look forward to the most. And last, but certainly not least, thanks go to my family – Mum, Dad, Gemma and Paula; my beloved Gran and Grandad; my aunties and uncles; and my nieces and nephews – Chloe, Sophie, Megan, Lauren, Charlie, Holly-Mae, Liam, Tyler, Aimee and Isabelle – who often make me laugh uncontrollably and cost me so much money each Christmas time. And finally huge thanks go to my girlfriend, Jessie, for putting up with me throughout my Ph. D. and making my life more enjoyable. Declarations This thesis is submitted to the University of Warwick in support of the author’s application for the degree of Doctor of Philosophy. It has been composed by the author and has not been submitted in any previous application for any degree. The work presented (including data generated and data analysis) was carried out by the author except in the cases outlined below: Performance data in Chapters 4 and 5 for the Sierra supercomputer were • collected by Dr. Simon Hammond. Parts of this thesis have been previously published by the author in the following publications: [141] S. A. Wright, S. D. Hammond, S. J. Pennycook, R. F. Bird, J. A. Herd- man, I. Miller, A. Vadgama, A. H. Bhalerao, and S. A. Jarvis. Parallel File System Analysis Through Application I/O Tracing. The Computer Journal, 56(2):141–155, February 2013 [142] S. A. Wright, S. D. Hammond, S. J. Pennycook, and S. A. Jarvis. Light- weight Parallel I/O Analysis at Scale. Lecture Notes in Computer Science (LNCS), 6977:235–249, October 2011 [143] S. A. Wright, S. D. Hammond, S. J. Pennycook, I. Miller, J. A. Herd- man, and S. A. Jarvis. LDPLFS: Improving I/O Performance without Application Modification. In Proceedings of the 26th IEEE International Parallel & Distributed Processing Symposium Workshops & PhD Forum (IPDPSW’12), pages 1352–1359, Shanghai, China, 2012. IEEE Computer Society, Washington, DC [144] S. A. Wright, S. J. Pennycook, S. D. Hammond, and S. A. Jarvis. RIOT – A Parallel Input/Output Tracer. In Proceedings of the 27th Annual UK vi Performance Engineering Workshop (UKPEW’11), pages 25–39, Brad- ford, UK, July 2011. The University of Bradford, Bradford, UK [145] S. A. Wright, S. J. Pennycook, and S. A. Jarvis. Towards the Automated Generation of Hard Disk Models Through Physical Geometry Discovery. In Proceedings of the 3rd International Workshop on Performance Model- ing, Benchmarking and Simulation of High Performance Computing Sys- tems (PMBS’12), pages 1–8, Salt Lake City, UT, November 2012. ACM, New York, NY In addition, research conducted during the period of registration has also led to the following publications which, while not directly focused on parallel I/O, have nevertheless shaped the research presented in this thesis: [12] R. F. Bird, S. J. Pennycook, S. A. Wright, and S. A. Jarvis. Towards a Portable and Future-proof Particle-in-Cell Plasma Physics Code. In Proceedings of the 1st International Workshop on OpenCL (IWOCL’13), Atlanta, GA, May 2013. Georgia Institute of Technology, GA [13] R. F. Bird, S. A. Wright, D. A. Beckingsale, and S. A. Jarvis. Performance Modelling of Magnetohydrodynamics Codes. Lecture Notes in Computer Science (LNCS), 7587:197–209, July 2013 [98] S. J. Pennycook, S. D. Hammond, G. R. Mudalige, S. A. Wright, and S. A. Jarvis. On the Acceleration of Wavefront Applications using Distributed Many-Core Architectures. The Computer Journal, 55(2):138–153, Febru- ary 2012 [99] S. J. Pennycook, S. D. Hammond, S. A. Wright, J. A. Herdman, I. Miller, and S. A. Jarvis. An Investigation of the Performance Portability of OpenCL. Journal of Parallel and Distributed Computing (JPDC), 73(11):1439–1450, November 2013 Sponsorship and Grants The research presented in this thesis was made possible by the support of the following benefactors and sources: The University of Warwick, United Kingdom: • Engineering and Physical Sciences Research Council Doctoral Training Accounts Studentship (2010–2014) Royal Society: • Industry Fellowship Scheme (IF090020/AM) UK Atomic Weapons Establishment: • “The Production of Predictive Models for Future Computing Requirements” (CDK0660) “AWE Technical Outreach Programme” (CDK0724) “TSB Knowledge Transfer Partnership” (KTP006740) viii Abbreviations ADIO Abstract Device Interface for Parallel I/O ANL Argonne National Laboratory API Application Programmable Interface ATA Advanced Technology Attachment B/W Bandwidth BG/P IBM BlueGene/P CPU Central Processing Unit DFS Distributed File System EMC2 EMC Corporation FLOP/s Floating-Point Operations per Second FUSE File system in User Space GB 10243 Bytes GFLOP/s 109
Recommended publications
  • CS 5600 Computer Systems
    CS 5600 Computer Systems Lecture 10: File Systems What are We Doing Today? • Last week we talked extensively about hard drives and SSDs – How they work – Performance characterisEcs • This week is all about managing storage – Disks/SSDs offer a blank slate of empty blocks – How do we store files on these devices, and keep track of them? – How do we maintain high performance? – How do we maintain consistency in the face of random crashes? 2 • ParEEons and MounEng • Basics (FAT) • inodes and Blocks (ext) • Block Groups (ext2) • Journaling (ext3) • Extents and B-Trees (ext4) • Log-based File Systems 3 Building the Root File System • One of the first tasks of an OS during bootup is to build the root file system 1. Locate all bootable media – Internal and external hard disks – SSDs – Floppy disks, CDs, DVDs, USB scks 2. Locate all the parEEons on each media – Read MBR(s), extended parEEon tables, etc. 3. Mount one or more parEEons – Makes the file system(s) available for access 4 The Master Boot Record Address Size Descripon Hex Dec. (Bytes) Includes the starEng 0x000 0 Bootstrap code area 446 LBA and length of 0x1BE 446 ParEEon Entry #1 16 the parEEon 0x1CE 462 ParEEon Entry #2 16 0x1DE 478 ParEEon Entry #3 16 0x1EE 494 ParEEon Entry #4 16 0x1FE 510 Magic Number 2 Total: 512 ParEEon 1 ParEEon 2 ParEEon 3 ParEEon 4 MBR (ext3) (swap) (NTFS) (FAT32) Disk 1 ParEEon 1 MBR (NTFS) 5 Disk 2 Extended ParEEons • In some cases, you may want >4 parEEons • Modern OSes support extended parEEons Logical Logical ParEEon 1 ParEEon 2 Ext.
    [Show full text]
  • W4118: Linux File Systems
    W4118: Linux file systems Instructor: Junfeng Yang References: Modern Operating Systems (3rd edition), Operating Systems Concepts (8th edition), previous W4118, and OS at MIT, Stanford, and UWisc File systems in Linux Linux Second Extended File System (Ext2) . What is the EXT2 on-disk layout? . What is the EXT2 directory structure? Linux Third Extended File System (Ext3) . What is the file system consistency problem? . How to solve the consistency problem using journaling? Virtual File System (VFS) . What is VFS? . What are the key data structures of Linux VFS? 1 Ext2 “Standard” Linux File System . Was the most commonly used before ext3 came out Uses FFS like layout . Each FS is composed of identical block groups . Allocation is designed to improve locality inodes contain pointers (32 bits) to blocks . Direct, Indirect, Double Indirect, Triple Indirect . Maximum file size: 4.1TB (4K Blocks) . Maximum file system size: 16TB (4K Blocks) On-disk structures defined in include/linux/ext2_fs.h 2 Ext2 Disk Layout Files in the same directory are stored in the same block group Files in different directories are spread among the block groups Picture from Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639 3 Block Addressing in Ext2 Twelve “direct” blocks Data Data BlockData Inode Block Block BLKSIZE/4 Indirect Data Data Blocks BlockData Block Data (BLKSIZE/4)2 Indirect Block Data BlockData Blocks Block Double Block Indirect Indirect Blocks Data Data Data (BLKSIZE/4)3 BlockData Data Indirect Block BlockData Block Block Triple Double Blocks Block Indirect Indirect Data Indirect Data BlockData Blocks Block Block Picture from Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc.
    [Show full text]
  • Ext3 = Ext2 + Journaling
    FS Sistem datoteka-skup metoda i struktura podataka koje operativni sistem koristi za čuvanje podataka Struktura sistema datoteka: - 1. zaglavlje→neophodni podaci za funkcionisanje sistema datoteka - 2. strukture za organizaciju podataka na medijumu→meta podaci - 3. podaci→datoteke i direktorijumi Strukture podataka neophodne za realizaciju sistema datoteka: - PCB(Partition Control Block) - BCB(Boot control Block) - Kontrolne strukture za alokaciju datoteka(i-node tabela kod Linux-a) - Direktorijumske strukture koje sadrže kontrolne blokove datoteka - FCB(File Control Block) ext3 Slide 1 of 51 VIRTUELNI SISTEM DATOTEKA(VFS) Linux podržava rad sa velikim brojem sistema datoteka(ext2,ext3, XFS,FAT, NTFS...) VFS-objektno orjentisani način realizacije sistema datoteka koji omogućava korisniku da na isti način pristupa svim sistemima datoteka Način obraćanja korisnika sistemu datoteka - korisnik->API - VFS->sistem datoteka ext3 Slide 2 of 51 Linux FS Linux posmatra svaki sistem datoteka kao nezavisnu hijerarhijsku strukturu objekata(datoteka i direktorijuma) na čijem se vrhu nalazi root(/) direktorijum Objekti Linux sistema datoteka: Super block - zaglavlje(superblock) - i-node tabela I-Node Table - blokovi sa podacima - direktorijumski blokovi - blokovi indirektnih pokazivača Data Area i-node-opisuje objekte, oko 128B na disku Kompromis između veličine i-node tabele i brzine rada sistema datoteka - prvih 10-12 pokazivača na blokove sa podacima - za alokaciju većih datoteka koristi se single indirection block - za još veće datoteke
    [Show full text]
  • Measuring Parameters of the Ext4 File System
    File System Forensics : Measuring Parameters of the ext4 File System Madhu Ramanathan Venkatesh Karthik Srinivasan Department of Computer Sciences, UW Madison Department of Computer Sciences, UW Madison [email protected] [email protected] Abstract An extent is a group of physically contiguous blocks. Allocating Operating systems are rather complex software systems. The File extents instead of indirect blocks reduces the size of the block map, System component of Operating Systems is defined by a set of pa- thus, aiding the quick retrieval of logical disk block numbers and rameters that impact both the correct functioning as well as the per- also minimizes external fragmentation. An extent is represented in formance of the File System. In order to completely understand and an inode by 96 bits with 48 bits to represent the physical block modify the behavior of the File System, correct measurement of number and 15 bits to represent length. This allows one extent to have a length of 215 blocks. An inode can have at most 4 extents. those parameters and a thorough analysis of the results is manda- 15 tory. In this project, we measure the various key parameters and If the file is fragmented, every extent typically has less than 2 a few interesting properties of the Fourth Extended File System blocks. If the file needs more than four extents, either due to frag- (ext4). The ext4 has become the de facto File System of Linux ker- mentation or due to growth, an extent HTree rooted at the inode is nels 2.6.28 and above and has become the default file system of created.
    [Show full text]
  • Migrating from Netware to OES 2 Linux
    Best Practice Guide www.novell.com Migrating from NetWare to OES 2 prepared for Novell OES 2 User Community Published: November, 2007 Disclaimer Novell, Inc. makes no representations or warranties with respect to the contents or use of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. Trademarks Novell is a registered trademark of Novell, Inc. in the United States and other countries. * All third-party trademarks are property of their respective owner. Copyright 2007 Novell, Inc. All rights reserved. No part of this publication may be reproduced, photocopied, stored on a retrieval system, or transmitted without the express written consent of Novell, Inc. Novell, Inc. 404 Wyman Suite 500 Waltham Massachusetts 02451 USA Prepared By Novell Services and User Community Migrating from NetWare to OES 2—Best Practice Guide November, 2007 Novell OES 2 User Community The latest version of this document, along with other OES 2 Linux Best Practice Guides, can be found with the NetWare to Linux Migration Resources at: http://www.novell.com/products/openenterpriseserver/netwaretolinux/view/all/-9/tle/all Contents Acknowledgments.................................................................................. iv Getting Started...................................................................................... 1 Why OES 2?..............................................................................................1 Which Services Are Right for OES 2? ................................................................4
    [Show full text]
  • The Third Extended File System with Copy-On-Write
    Limiting Liability in a Federally Compliant File System Zachary N. J. Peterson The Johns Hopkins University Hopkins Storage Systems Lab, Department of Computer Science Regulatory Requirements z Data Maintenance Acts & Regulations – HIPAA, GISRA, SOX, GLB – 4,000+ State and Federal Laws and Regulations with regards to storage z Audit Trail – creating a “chain of trust” – Files are versioned over time – Authenticated block sharing (copy-on-write) between versions. z Disk Encryption – Privacy and Confidentiality – Non-repudiation Hopkins Storage Systems Lab, Department of Computer Science Secure Deletion in a Regulatory Environment z Desire to limit liability when audited – Records that go out of audit scope do so forever – When a disk is subpoenaed old or irrelevant data are inaccessible z Existing Techniques – Secure overwrite [Gutmann] – File key disposal in disk encrypted systems [Boneh & Lipton] z Existing solutions don’t work well in block- versioning file systems Hopkins Storage Systems Lab, Department of Computer Science Technical Problems z Secure overwriting of noncontiguous data blocks is slow and inefficient – When versions share blocks, data to be overwritten may be noncontiguous z Cannot dispose file keys in a versioning file system – Blocks encrypted with a particular key need to be available in future versions z User space tools are inadequate – Can’t delete metadata – Can’t be interposed between file operations – Truncate may leak data – Difficult to be synchronous Hopkins Storage Systems Lab, Department of Computer Science
    [Show full text]
  • A Novel Term Weighing Scheme Towards Efficient Crawl Of
    International Journal of Computer Engineering and Applications, Volume X, Special Issue, ICRTCST -2016 www.ijcea.com, ISSN 2321-3469 FEATURES AND DIFFERENCES OF LINUX EXT, EXT2, EXT3 AND EXT4 FILE SYSTEMS Sanku Sinha 1, Sadique Nayeem 2 1, 2 Department of Computer Science and Engineering, R.V.S. College of Engineering and Technology, Jamshedpur ABSTRACT: File System is a way used by operating systems to store and organize files so that the files can be accessed efficiently. File system is a component of operating system and it interacts with the abstraction of the underlying hardware known as Block Layer. An open source environment like Linux has over 50 different file systems since its existence. Though, in recent years, the speed of CPUs have been increased and the Hard drive technology have tremendous growth, the file systems have also been evolved so that operating system can provide better performance to the user. To measure the performance of the file systems there are different benchmarks used by different developers. In this work, a study of Linux file system evolution has been carried out. As there are several file systems for different Linux Distributions and different Linux kernel versions, some popular versions of extended file systems like ext, ext2, ext3 and ext4 has been considered. The features and differences on the inode structures of these file systems have been discussed. Keywords: Ext, ext2, ext3, ext4, inode, Linux [1] INTRODUCTION Operating System of a particular system is chosen by a user on the basis of applications to be used or better system performance. For better system performance, the performance of the file system is a key part.
    [Show full text]
  • Ntfs, Fat, Fat32, Ext2, Ext3, Ext4)
    6 VIII August 2018 International Journal for Research in Applied Science & Engineering Technology (IJRASET) ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 6.887 Volume 6 Issue VIII, August 2018- Available at www.ijraset.com Comparative study of File systems (NTFS, FAT, FAT32, EXT2, EXT3, EXT4) Akash Bundele1, Prof. Dr. S. E. Yedey2 PG Department of Computer Science & Technology, Hanuman Vyayam Prasarak Mandal, Amravati, Maharashta Abstract: Over the years, hard drives and the systems used to store data on them have constantly evolved. There are Windows file systems and Linux file system. And have several advantages and disadvantages. File systems have traditionally been a major area of research and development. This is evident from the existence of over 50 file systems of varying popularity in the current version of the Linux kernel. Windows 2000 supports several file systems, the most important of which are FAT-16, FAT-32, and NTFS (NT File System). This paper looks at various file systems (FAT NTFS EXT2 EXT3 EXT4) and performing comparative study. I. INTRODUCTION In today’s world everything revolves around data. Data is critical for day-to-day operation of any system. Data management is taken care of by file systems which reliably store data on disks. Users typically have varied requirements ranging from scalability, availability, fault-tolerance, performance guarantees in business environment to small memory footprints, security, and reliability in desktop environments. This has driven the file system community to develop a variety of systems that cater to different user requirements. Since they were developed over twenty years ago, the role of personal computers in our lives has drastically increased.
    [Show full text]
  • Analyzing Metadata Performance in Distributed File Systems
    Inaugural-Dissertation zur Erlangung der Doktorwurde¨ der Naturwissenschaftlich-Mathematischen Gesamtfakultat¨ der Ruprecht-Karls-Universitat¨ Heidelberg vorgelegt von Diplom-Informatiker Christoph Biardzki aus Thorn Tag der mundlichen¨ Prufung:¨ 19.1.2009 Analyzing Metadata Performance in Distributed File Systems Gutachter: Prof. Dr. Thomas Ludwig Abstract Distributed file systems are important building blocks in modern computing environments. The challenge of increasing I/O bandwidth to files has been largely resolved by the use of parallel file systems and sufficient hardware. However, determining the best means by which to manage large amounts of metadata, which contains information about files and directories stored in a distributed file system, has proved a more difficult challenge. The objective of this thesis is to analyze the role of metadata and present past and current implementations and access semantics. Understanding the development of the current file system interfaces and functionality is a key to understanding their performance limitations. Based on this analysis, a distributed metadata benchmark termed DMetabench is presented. DMetabench significantly improves on existing benchmarks and allows stress on meta- data operations in a distributed file system in a parallelized manner. Both intra-node and inter-node parallelity, current trends in computer architecture, can be explicitly tested with DMetabench. This is due to the fact that a distributed file system can have different seman- tics inside a client node rather than semantics between multiple nodes. As measurements in larger distributed environments may exhibit performance artifacts difficult to explain by reference to average numbers, DMetabench uses a time-logging tech- nique to record time-related changes in the performance of metadata operations and also protocols additional details of the runtime environment for post-benchmark analysis.
    [Show full text]
  • Linux Filesystem Comparisons
    Linux Filesystem Comparisons Jerry Feldman Boston Linux and Unix Presentation prepared in LibreOffice Impress Boston Linux and Unix 12/17/2014 Background My Background. I've worked as a computer programmer/software engineer almost continually since 1972. I have used the Unix Operating System since about 1980, and Linux since about 1993. While I am not an expert in filesystems, I do have some knowledge and experience. I have been a member and director of the BLU since its founding in 1994. Linux File Systems Comparison Dec 17, 2014 2 Overview My talk is intended to focus primarily on the end user. Servers and NAS systems have different requirements, so what might be good for you may not be appropriate for a high-traffic NAS system. I will try to explain some of the file system issues that you should be aware of when you decide to install Linux. While I will mention distributions, all Linux distributions allow you to use just about any file system you want. Please feel free to ask questions at any time. Linux File Systems Comparison Dec 17, 2014 3 Terminology RAID: redundant array of independent disks RAID is used to incorporate stripe sets, and to add redundancy. In general RAID 0 is stripe, and RAID 1 is mirroring. There are other RAID levels that I won't discuss here STRIPE – Data is spanned among multiple volumes or devices. Mirroring – Data is written to 2 or more volumes Linux File Systems Comparison Dec 17, 2014 4 Terminology LVM – Logical Volume Manager. This has been around for a while.
    [Show full text]
  • The Second Extended File System
    The Second Extended File System Internal Layout Dave Poirier [email protected] The Second Extended File SystemInternal Layout by Dave Poirier Copyright © 2001-2011 Dave Poirier Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license can be acquired electronically from http://www.fsf.org/licenses/fdl.html or by writing to 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Table of Contents About this book....................................................................................................................................... vii 1. Historical Background..........................................................................................................................1 2. Definitions...............................................................................................................................................2 2.1. Blocks..........................................................................................................................................2 2.2. Block Groups..............................................................................................................................3 2.3. Directories...................................................................................................................................3 2.4. Inodes..........................................................................................................................................4
    [Show full text]
  • Performance Evaluation of Filesystems Compression Features
    UNIVERSITY OF OSLO Department of Informatics Performance Evaluation Of FileSystems Compression Features Master Thesis In the field of Network and System Administration Solomon Legesse Oslo and Akerhus University College (hioa) In collaboration with University of Oslo (UiO) May 20, 2014 1 Performance Evaluation Of FileSystems Compression Features Master Thesis In the field of Network and System Administration Solomon Legesse Oslo and Akerhus University College (hioa) In collaboration with University of Oslo (UiO) May 20, 2014 Abstract The Linux operating system already provide a vast number of filesystems to the user community. In general, having a filesystem that can provide scala- bility, excellent performance and reliability is a requirement, especially in the lights of the very large data size being utilized by most IT data centers. Re- cently modern file systems has begun to include transparent compression as main features in their design strategy. Transparent compression is the method of compressing and decompressing data so that it takes relatively less space. Transparent compression can also improve IO performance by reducing IO traffic and seek distance and has a negative impact on performance only when single-thread I/O latency is critical. Two of the newer filesystem technologies that aim at addressing todays IO challenges are ZFS and Btrfs. Using high speed transparent compression algorithms like LZ4 and LZO with Btrfs and Zfs can greatly help to improve IO performance. The goal of this paper is threefold. 1st, to evaluate the impact of transparent compression on perfor- mance for Btrfs and ZFS, respectively. 2nd, to compare the two file system compression feature on performance.
    [Show full text]