The Btrfs Filesystem

O The Btrfs Filesystem Chris Mason Oracle Sept 2007 Chris Mason The Btrfs Filesystem O Topics Chris Mason The Btrfs Filesystem O Storage Drives are getting bigger, but not much faster File sizes are often small Fsck times can be very large Filesystem scaling issues I Performance I Management I Space utilization Chris Mason The Btrfs Filesystem O A New Filesystem? Existing filesystems not well suited Disk format compatibility Upstream integration Chris Mason The Btrfs Filesystem O Btrfs Design Extent based file storage Space efficient packing of small files Space efficient indexed directories Integrity checksumming (data and metadata) Writable snapshots Efficient incremental backups Object level striping and mirroring Online filesystem check Chris Mason The Btrfs Filesystem O Btrfs Btree Basics Simple copy on write tree Reference counting for filesystem trees Everything stored as a key/value pair Large file data extents stored outside the tree Defragments before writeback and via an ioctl Btree blocks allocated from metadata-only block groups Chris Mason The Btrfs Filesystem O Btrfs Trees Super Root Tree Directory items . Extent Root Subvolume 'default' Snapshot 'snap' .. Pointer Pointer Pointer default snap Extent Tree Subvolume Tree Block allocation Files and Information and Directories Reference Counts Chris Mason The Btrfs Filesystem O Btrfs Files Small files stored inside the btree (inline) Large files stored in large extents Written via delayed allocation Variable sized checksums stored in the btree Chris Mason The Btrfs Filesystem O Btrfs Directories Double indexed by name and inode number Compact: space reclaimed as names are removed Backpointers from the file to the directory Used to implement xattrs Chris Mason The Btrfs Filesystem O Directory Layout Benchmarks Create one million files in a single directory (512 bytes or 16k) Time find, backup and delete operations Chris Mason The Btrfs Filesystem Disk IO Ext3 Read 28005 Ext3 Write XFS Read XFS Write 21004 Btrfs Read Btrfs Write 14002 7001 Disk offset (MB) 0 120 Seek Count 90 Ext3 XFS Btrfs 60 30 Seeks / sec 0 45 Throughput 33 Ext3 XFS Btrfs 22 MB/s 11 0 0 68 137 206 275 344 412 481 550 Time (seconds) Figure: Create one million 512 byte files 24004 Disk IO Ext3 Read 18003 Ext3 Write Btrfs Read Btrfs Write 12002 6001 Disk offset (MB) 0 160 Seek Count 120 Ext3 Btrfs 80 40 Seeks / sec 0 50 Throughput 37 Ext3 Btrfs 25 MB/s 12 0 0 14 28 42 56 70 84 98 112 Time (seconds) Figure: Create one million 512 byte files Disk IO 27330 Ext3 Read Ext3 Write XFS Read 20497 XFS Write Btrfs Read 13665 6832 Disk offset (MB) 0 250 Seek Count 187 Ext3 XFS Btrfs 125 62 Seeks / sec 0 30 Throughput 22 Ext3 XFS Btrfs 15 MB/s 7 0 0 6 12 19 25 32 38 44 51 Time (seconds) Figure: Find one million 512 byte files 27330 Disk IO XFS Read 20497 XFS Write Btrfs Read Btrfs Write 13665 6832 Disk offset (MB) 0 350 Seek Count 262 XFS Btrfs 175 87 Seeks / sec 0 20 Throughput 15 XFS Btrfs 10 MB/s 5 0 0 44 89 134 179 224 269 314 359 Time (seconds) Figure: Read one million 512 byte files O Btrfs Snapshots Subvolume trees and file extents are reference counted Snapshots are writable and can be snapshotted again Root Node Root Node Cow Root Ref: 1 Ref: 1 Ref: 1 Leaf 1 Leaf 2 Leaf 3 Leaf 1 Leaf 2 Leaf 3 Ref: 1 Ref: 1 Ref: 1 Ref: 2 Ref: 2 Ref: 2 Figure: Before COW Figure: After COW Chris Mason The Btrfs Filesystem O Btrfs Snapshots Transactions are snapshots that are deleted after commit Root Node Cow Root New Root Ref: 1 Ref: 1 Ref: 1 Leaf 1 Leaf 2 Leaf 3 Leaf 2 Leaf 3 Ref: 1 Ref: 2 Ref: 2 Ref: 1 Ref: 1 Figure: Modification after COW Figure: After Deletion Chris Mason The Btrfs Filesystem O Btrfs Allocation Trees Will help model underlying storage (DM integration) Record reference counts for all extents Contain resizable block groups Btrees can pull from one or more allocation trees Chris Mason The Btrfs Filesystem O Btrfs Benchmarking Single threaded, focus on layout mkfs.xfs -d agcount=3 -l version=2,logsize=128m mount -t xfs -o logbsize=256k mount -t ext3 -o data=writeback Single SATA drive, no barriers used atimes left on Graphs from http://oss.oracle.com/∼mason/seekwatcher Chris Mason The Btrfs Filesystem Disk IO Ext3 Read 30510 Ext3 Write XFS Read XFS Write 22882 Btrfs Read Btrfs Write 15255 7627 Disk offset (MB) 0 120 Seek Count 90 Ext3 XFS Btrfs 60 30 Seeks / sec 0 50 Throughput 37 Ext3 XFS Btrfs 25 MB/s 12 0 0 110 220 331 441 552 662 772 883 Time (seconds) Figure: Create one million 16 KB files 30510 Disk IO Ext3 Read 22882 Ext3 Write Btrfs Read Btrfs Write 15255 7627 Disk offset (MB) 0 45 Seek Count 33 Ext3 Btrfs 22 11 Seeks / sec 0 50 Throughput 37 Ext3 Btrfs 25 MB/s 12 0 0 54 108 162 216 270 324 378 432 Time (seconds) Figure: Create one million 16 KB files Disk IO 27425 Ext3 Read Ext3 Write XFS Read 20569 XFS Write Btrfs Read 13712 6856 Disk offset (MB) 0 250 Seek Count 187 Ext3 XFS Btrfs 125 62 Seeks / sec 0 20 Throughput 15 Ext3 XFS Btrfs 10 MB/s 5 0 0 6 12 18 24 30 36 42 48 Time (seconds) Figure: Find one million 16 KB files 29756 Disk IO XFS Read 22317 XFS Write Btrfs Read Btrfs Write 14878 7439 Disk offset (MB) 0 160 Seek Count 120 XFS Btrfs 80 40 Seeks / sec 0 35 Throughput 26 XFS Btrfs 17 MB/s 8 0 0 114 228 342 456 570 684 799 913 Time (seconds) Figure: Read one million 16 KB files O Compilebench Simulates operations on kernel trees File sizes and names collected from real kernel trees Randomly create, compile, patch, clean, delete Intended to age and fragment the FS Created 90 initial trees and ran 150 operations http://oss.oracle.com/∼mason/compilebench Chris Mason The Btrfs Filesystem O Compilebench Btrfs Ext3 XFS Initial Create 33.78 MB/s 10.31 MB/s 15.30 MB/s Create 14.39 MB/s 12.47 MB/s 8.94 MB/s Patch 3.77 MB/s 4.48 MB/s 3.19 MB/s Compile 20.56 MB/s 15.28 MB/s 21.61 MB/s Clean 69.96 MB/s 43.84 MB/s 134.15 MB/s Read 5.37 MB/s 8.66 MB/s 7.05 MB/s Read Compiled 11.31 MB/s 13.22 MB/s 14.40 MB/s Delete 15.77s 18.34s 16.63s Delete Compiled 22.96s 33.35s 21.38s Stat 15.86s 11.35s 7.25s Stat Compiled 21.82s 14.17s 8.99s Chris Mason The Btrfs Filesystem Disk IO Ext3 Read 34614 Ext3 Write XFS Read XFS Write 25960 Btrfs Read Btrfs Write 17307 8653 Disk offset (MB) 0 300 Seek Count 225 Ext3 XFS Btrfs 150 75 Seeks / sec 0 50 Throughput 37 Ext3 XFS Btrfs 25 MB/s 12 0 0 37 74 111 148 185 223 260 297 Time (seconds) Figure: Create 20 kernel trees Disk IO Ext3 Read 40934 Ext3 Write XFS Read XFS Write 30700 Btrfs Read Btrfs Write 20467 10233 Disk offset (MB) 0 250 Seek Count 187 Ext3 XFS Btrfs 125 62 Seeks / sec 0 35 Throughput 26 Ext3 XFS Btrfs 17 MB/s 8 0 0 96 192 288 384 480 576 672 769 Time (seconds) Figure: Make -j simulated 20 kernel trees Disk IO Ext3 Read 40933 Ext3 Write XFS Read XFS Write 30700 Btrfs Read Btrfs Write 20466 10233 Disk offset (MB) 0 180 Seek Count 135 Ext3 XFS Btrfs 90 45 Seeks / sec 0 25 Throughput 18 Ext3 XFS Btrfs 12 MB/s 6 0 0 61 122 184 245 307 368 429 491 Time (seconds) Figure: Delete 20 kernel trees O TODO Concurrency (tree locking) Multiple extent allocation trees OSYNC and fsync performance Online fsck Many many more Chris Mason The Btrfs Filesystem.

Load more