<Insert Picture Here> XFS in Rapid Development

<Insert Picture Here> XFS in Rapid Development

<Insert Picture Here> XFS In Rapid Development Jeff Liu <[email protected]> We have many requests to provide a supported option for the XFS file system on Oracle Linux – Oracle Linux Blog Feb 28, 2013 2 About This Talk • Introduction - About XFS - XFS Development Community • How Fast XFS Is Going - Kernel changes (> Linux 3.0) - User space Programs - XFS Test Suite • Upcoming Features - Kernel and user space - Preview of the self describing metadata 3 About XFS • Full 64-bit journaling file system • Well-known for high-performance and scalability • Maximum filesystem size/file size: 16 EiB/8EiB • Variable blocks sizes: 512 bytes to 64 KB • Freeze/Thaw to support volume level snapshot - xfs_freeze(8) • Online filesystem/file defragmentation - xfs_fsr(8) • Online filesystem resize – xfs_growfs(8) • Internal log space/External log volume • Realtime subvolume - Provide very deterministic data rates suitable for media streaming applications 4 XFS Development Community • Developers From Corporations - SGI, Redhat, Oracle, SuSE, IBM • Main Contributors – In alphabetical order Dave Chinner, Christoph Hellwig - Preeminent Individual Contributors Brian Foster, Carlos Maiolino, Chandra Seetharaman, Eric Sandeen, Jan Kara, Jeff Liu, Mark Tinguely, <leave the seat of honour open for you> • Maintainer Ben Myers @SGI • Join us via Mailing list: [email protected] and IRC Channel: irc.freenode.net#xfs • Newcomers are always welcome! 5 The number of files changed, insertions and de letions 2013) -May 11 code changesv3.10-rc1 of (Jul21statistics betweenLinuxv3.0- 2011 The How Fast XFSGoing Is How Fast 10000 15000 20000 25000 30000 35000 40000 45000 50000 5000 0 Files changed git diff --stat --minimal -C -M v3.0..v3.10-rc1 fs/[btrfs|xfs|ext4 -- with jbd2] Btrfs/Ext4 with JBD2/XFS with Btrfs/Ext4 Linux v3.0 ~ v3.10-rc1 Insertions Deletions Btrfs XFS Ext4&JBD2 6 How Fast XFS Is Going • XFS changes were made up of - Improvements – performance/scalability improvements, code base refactoring - New features – anything new - Bug fixes - Misc – trivial fix, code style adjustment, dead code cleanups 7 How Fast XFS Is Going The proportion of the XFS kernel changes between Linux 3.0 to Linux 3.10-rc1 Based on the number of Patches Improvement New feature Bug fix Misc 8 How Fast XFS Is Going The proportion of the XFS kernel changes between Linux 3.0 to Linux 3.10-rc1 Based on the lines (+/-) Improvement New feature Bug fix Misc 9 How Fast XFS Is Going • Xfsprogs v3.1.6 ~ v3.1.11 (Oct 11 2011 ~ May 09 2013) - 15 Contributors - 106 patches $ git diff --stat --minimal -C -M v3.1.6 v3.1.11 |grep changed 108 files changed, 11113 insertions(+), 11418 deletions(-) 10 How Fast XFS Is Going • XFS test suite - xfstests - A generic test tool for Linux local filesystems - 300+ test cases overall - 170+ special test cases for XFS • Test cases are well-organized for different filesystems $ ls -l xfstests/tests/ btrfs/ ext4/ generic/ Makefile shared/ udf/ xfs/ 11 Speedup Direct-IO R/W On High IOPS Devices • XFS inode locking modes, e.g. shared/exclusive - The name convention is inherited from SGI IRIX - Equivalent is the read/write modes on Linux • Issues faced before Linux 3.2 - Exclusive lock range is too extensive - Concurrent direct-IO reads are serialized on page cache check up - Exclusive lock mode is used for direct-IO write by default 12 Speedup Direct-IO R/W On High IOPS Devices • Solutions - Use shared lock for direct-IO read, take the exclusive mode if the page invalidation is needed - Use shared lock for direct-IO writes by default, take the exclusive lock during IO submission if extent allocation is required 13 Speedup Direct-IO R/W On High IOPS Devices FIO Scenario Storage formated with default options Fio version 2.1 Simplified output of xfs_info(8) Direct=1 Metadata: isize=256 agcount=4 rw=randrw agsize=937408 blks sectsz=512 bs=4k size=10G Data: bsize=4096 blocks=3749632 Numjobs=10 #[20,40,80] sunit=0 swidth=0 blks Runtime=120 Thread Log: internal bsize=4096 ioengine=psync blocks=2560 version=2 14 Input/Output operations per second 10000 12000 14000 Speedup Direct-IO R/W On High IOPS R/W High On Devices Direct-IO Speedup 2000 4000 6000 8000 0 10 Vanilla 3.7.0 vs 2.6.39 in delaylog mode XFS Read IOPS, SSD SATA3SSD Read XFS IOPS, 20 Threads 40 80 3.7.0 2.6.39 15 Input/Output operations per second Speedup Direct-IO R/W On High IOPS R/W High On Devices Direct-IO Speedup 10000 12000 14000 2000 4000 6000 8000 0 10 Vanilla 3.7.0 vs 2.6.39 in delaylog mode XFS Write IOPS, SSDSATA3 IOPS, Write XFS 20 Threads 40 80 3.7.0 2.6.39 16 Sync Story • Improve concurrency for fsync(2) on files - Unlock inode before the log force • Optimizations for fsync(2) on directories - Directories are only updated transactionally - No file data need to flush - Does not have to flush disk caches except as part of a transaction commit • Improved sync behavior in the face of aggressive dirtying - Writes data out itself two times per filesystem sync that overriding the livelock protection in the core writeback code path 17 Sync Story • Xfssyncd workqueue was removed, Instead - New dedicated workqueue for inode reclaim - New dedicated workqueue for log operation - Now the sync work is periodic log work only for xfsyncd_centisecs sysctl 18 Efficient Sparse File Handing • SEEK_DATA/SEEK_HOLE options to lseek(2) - Derive from Solaris ZFS - Neater call interface than FIEMAP ioctl(2) • Use scenarios - cp(1), GNU tar(1), etc... - Virtual image(XEN, KVM) backup - Sparse file detection 19 Efficient Sparse File Handing • Refinement for unwritten extents • Create a sparse file with unwritten extents mixed with data and holes #!/bin/bash xfs_io -F -f ©-c falloc 0 10G© /xfs/sparse for i in $(seq 0 30 120); do offset=$(($i * $((1 << 20)))) xfs_io "-c pwrite $offset 500m" /xfs/sparse done 20 Efficient Sparse File Handing • Layout of the created sparse file $ filefrag -v sparse Filesystem type is: 58465342 File size of sparse is 10737418240 (2621440 blocks, blocksize 4096) ext logical physical expected length flags 0 0 43547551 151040 1 151040 43698591 1946111 unwritten 2 2097151 43008572 45644702 524289 unwritten,eof sparse: 2 extents found 21 Efficient Sparse File Handing File Sparse Efficient Time in seconds Sparse file copy via xfstests/seek_copy_test on laptop with normal SATA disk 100 120 140 20 40 60 80 0 Improved With/Without unwritten extents refinement Non-improved 22 Quota Improvements • XFS disk quota supports - User quota - Group quota - Project quota – per directory quota (limit disk quota per directory) 23 Quota Improvements • Bad scalability for tens thousands of in-memory dquot searching, why? - User/Group/Project dquots are stored at a global hash table which is shared between file systems • Hash table at worst O(n) search/insert/delete while Radix tree at worst O(k) on insertion and deletion • Solutions - Replace global hash tables with per-filesystem radix tree - Replace global dquot lru lists with per-filesystems - Remove the global xfs_Gqm structure 24 Fighting With Process 8K Stack Space Limitation • 8K process stack space for x86_64 in Linux 2.6 by default - Every process has a dedicated kernel stack - Kernel stacks are a fixed size, can not be expanded as required - Can not be swapped • Extreme stack use in the Linux VM/VFS call chain • The old problems for XFS - Significant issues with the amount of stack that allocation in XFS uses, especially in the memory reclaim situations (writeback path) 25 Fighting With Process 8K Stack Space Limitation • Buffer cache miss that triggers I/O vs CPU cache miss • Solution - Alleviate stack allocation in allocation call chain, e.g. Delayed allocation - Move all allocations to a new worker thread combine with a completion - Avoid context switch overhead if an allocation request is comes in with large stack 26 Bounds Checking Enabled XFS Kernel • Alternative CONFIG_XFS_WARN Support - Depends on XFS_FS && !XFS_DEBUG - Converts ASSERT() checks to WARN_ON(1) - Does not modify algorithms - Does not cause kernel to panic on non-fatal errors - Allow to find strange "out of bounds" problems more easily - Already turned on Fedora kernel-debug packages • Suggest applying this feature for other Linux distributions with XFS support 27 Bounds Checking Enabled Kernel • XFS with CONFIG_XFS_DEBUG - Very efficient buddy for developers - Weak points from a user perspective . Significant overhead in production environment . Change the behavior of algorithms(such as allocation) to improve the test coverage, e.g. xfs_alloc_ag_vextent_near() . Would intentionally panic the machine on non-fatal errors by design • Only advisable to use for debugging purpose 28 Misc Changes • Mount options - Nodelaylog mode is removed, using delaylog mode by default ( >= Linux 3.3) - Inode64 re-mountable - Inode32 re-mountable • Speculative preallocation improvements - Trimming the speculative preallocation near ENOSPC/quota limits/sparse file • Discontiguous buffers - Virtually contiguous in the buffers, but non-contiguous on disk 29 Upcoming – Self Describing Metadata Preview 30 Upcoming – Self Describing Metadata Preview • XFS utilities for forensic analysis of the file system structures - xfs_repair(8) - xfs_db(8) • Analyze the structure of 100TB to 1PB storage :( • Primary concern for supporting PB scale file system - Minimize the time and effort required for basic forensic analysis of the file system structures 31 Self Describing Metadata Preview • Problems with the current metadata format - Magic number is the only way - Lack of magic number identifying in AGFL, remote

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    41 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us