Storage Gardening: Using a Virtualization Layer for Efficient Defragmentation in the WAFL File System

Storage Gardening: Using a Virtualization Layer for Efficient Defragmentation in the WAFL File System

Storage Gardening: Using a Virtualization Layer for Efficient Defragmentation in the WAFL File System Ram Kesavan, Matthew Curtis-Maury, Vinay Devadas, Kesari Mishra NetApp Inc. 1 Outline ¡ Some WAFL Background ¡ Defragmentation techniques in WAFL – Multi-core systems, multiple storage devices of different media – Trade-offs for enterprise-grade system – Multiple applications and use-cases – All features (eg. snapshots) must be preserved ¡ Evaluation & some history 2 Forms of Fragmentation ¡ File layout fragmentation – Impacts sequential read performance – Mitigation: predictive/speculative readahead algorithms – Unavoidable: more reads IOs ¡ Free space fragmentation – Impacts write throughput of file/storage system – Impacts file layout ¡ Intra-block fragmentation – WAFL supports sub-block compaction & addressing – Impacts achieved storage efficiency ¡ Briefly discussed: Objects in an object tier 3 WAFL Background ¡ Runs in the kernel space of proprietary OS: ONTAP ¡ Multi-core, multi-protocol (NFS, SMB, SCSI, NVMe, etc.), multi-workload ¡ Multi-media: HDDs, SSDs, object store, cloud, flash devices, and more ¡ WAFL is feature-rich ¡ Typical deployment: – One node: 100’s of FlexVols (file systems), 100’s of TiB, 1000’s of LUNs, with dozens of applications – Several features enabled: snapshots, file cloning, file system cloning, replication, dedupe, compression, mobility, encryption, etc. ¡ Copy-On-Write: so has the potential to fragment – Storage gardening techniques 4 FlexVols & Aggregates ¡ Aggregate: pool of storage ... ¡ FlexVol: namespace exported to FlexVol1 FlexVol2 FlexVoln clients—files, dirs, LUNs ¡ Each FlexVol & Aggr is a WAFL ... file system Container Container Container – tree of blocks rooted at File 1 File 2 File n superblock, leaves of tree contain data of user files & aggrA metafiles ... ... ¡ Each FlexVol stored as a container file RG of HDDs RG of SSDs Object store 5 FlexVols & Aggregates FlexVol ¡ 4KB block number spaces: File A File B – V in FlexVol – P in aggregate V1 P1 V2 P2 ¡ FlexVol structures cache P – For performance V2 ¡ V->P indirection used for Aggr V1 several features in WAFL P1 P2 – Also for storage gardening techniques P2 Container file P1 … … … 6 Free Space Fragmentation ¡ Results in "partial" stripe writes – More reads & compute for xor/parity, more writeIOs, poor file layout ¡ Latency of write-op not directly affected – Ack’ed after logging to fast NVRAM ¡ WAFL checkpoints ~GBs of dirty content – Should complete in few seconds – Lower device write throughput => D1 D2 D3 P longer checkpoint – Impacts op latency & IOPS throughput Used Block Free Block 7 WAFL: Segment Cleaning ¡ Each block stored with WAFL context – For protection against lost writes P1 P3 – Identifies file + offset P2 – FlexVol block => container file … … … … ¡ Loaded as dirty container leaf block ¡ Rewritten (moved) by next CP P5 P4 – Preserves snapshots in FlexVol … … … … – More efficient P1’ P4’ ¡ Lazy fixup of stale cached P P2’ P5’ P3’ – WAFL context used to catch stale P – Read redirected to container file D1 D2 D3 D4 ¡ CSC: JIT cleaning of segments Used Free 8 CSC Evaluation: Summary ¡ All-HDD: – With CSC: Op latency & write chain length stabilize after 35 days ¡ SSD+HDD: – Hot spots of working set stay in SSD tier; SSDs fragment quickly – HDD write chain length stabilizes after 22 days – CSC in SSD-tier: Ameliorates flash wear-out (early gen. SSDs) ¡ All-SSD: – Expectation of high & consistent performance: CPU is the bottleneck – Disk bandwidth & wear-out is less of a concern (modern gen. SSDs) – CSC improves write chain lengths, but without op latency improvement – Pure random overwrites: op latency regresses (0.7ms à 1.3ms) 9 File Layout Defragmentation ¡ Re-dirtying file blocks can trivially fix contiguity in P space – But that impacts efficiency of diff’ing-snapshots of the FlexVol ¡ Special fake-dirty that retains its V – Diff’ing of snapshots finds V unchanged – Stale P handled by consulting the container ¡ Performance: pre-fragmented data – All-HDD: results in lower latency, higher throughput – All-SSD: ¡ Beneficial for pure read workloads ¡ But, op latency increases with read/write op mix (1.7ms à 2.5ms) 10 Sub-block Compaction in WAFL ¡ Tail-end of files & of FlexVol compression groups File A File B ¡ Compacted into leaves of the container file V1 P1 V2 P1 V3 P1 ¡ Benefits: Compacted Block P1 – No fixed sizes for sub- blocks; potential for V1 len offset V2 len offset workload-aware V1 V2 V3 V3 len offset compaction P1 P1 P1 – Compaction-related Container File for FlexVol (empty space) metadata overhead proportional to savings (block of B) P1 – Crunching/re-compaction Initial: … 3 … of FlexVol snapshots nd … (2 block of A) Current: 3 … (1st block of A) RefCount Metafile 11 Re-Compaction ¡ Background scan recovers wasted fragments ¡ Walks container file – Compares refi with refcurr per P; if worth re-compacting… – …each fragment loaded as a dirty container leaf – Re-compacted (moved) By next CP ¡ Same handling of stale cached P ¡ Future work: – Make re-compaction less intrusive – Make re-compaction autonomous 12 Conclusion ¡ Fourth form of fragmentation – WAFL aggregate can include (on-prem/remote) object tier – Cold blocks packaged into large objects – Objects can get fragmented; defragmentation based on cost-metrics ¡ Defragmentation techniques help in all-HDD or mixed-media systems – All-SSD systems are sensitive to CPU consumption – Ideally, defragmentation should be autonomous and based on system load ¡ Customer data show – All-SSD systems have sufficient CPU headroom – Autonomous defragmentation is feasible – Consistent & predictable performance 13.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    13 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us