<<

O

The Filesystem

Chris Mason

Oracle

Sept 2007

Chris Mason The Btrfs Filesystem O Topics

Chris Mason The Btrfs Filesystem O Storage

Drives are getting bigger, but not much faster File sizes are often small times can be very large Filesystem scaling issues

I Performance I Management I Space utilization

Chris Mason The Btrfs Filesystem O A New Filesystem?

Existing filesystems not well suited Disk format compatibility Upstream integration

Chris Mason The Btrfs Filesystem O Btrfs Design

Extent based file storage Space efficient packing of small files Space efficient indexed directories Integrity checksumming (data and metadata) Writable snapshots Efficient incremental Object level striping and mirroring Online filesystem check

Chris Mason The Btrfs Filesystem O Btrfs Btree Basics

Simple copy on write for filesystem trees Everything stored as a key/value pair Large file data extents stored outside the tree Defragments before writeback and via an Btree blocks allocated from metadata-only groups

Chris Mason The Btrfs Filesystem O Btrfs Trees

Super

Root Tree items . Root Subvolume 'default' Snapshot 'snap' .. Pointer Pointer Pointer default snap

Extent Tree Subvolume Tree Block allocation Files and Information and Directories Reference Counts

Chris Mason The Btrfs Filesystem O Btrfs Files

Small files stored inside the btree (inline) Large files stored in large extents Written via delayed allocation Variable sized stored in the btree

Chris Mason The Btrfs Filesystem O Btrfs Directories

Double indexed by name and number Compact: space reclaimed as names are removed Backpointers from the file to the directory Used to implement xattrs

Chris Mason The Btrfs Filesystem O Directory Layout Benchmarks

Create one million files in a single directory (512 or 16k) Time find, and delete operations

Chris Mason The Btrfs Filesystem Disk IO 28005 Ext3 Write XFS Read XFS Write 21004 Btrfs Read Btrfs Write 14002 7001

Disk offset (MB) 0

120 Seek Count

90 Ext3 XFS Btrfs 60 30 Seeks / sec 0

45 Throughput

33 Ext3 XFS Btrfs 22 MB/s 11 0 0 68 137 206 275 344 412 481 550 Time (seconds) Figure: Create one million 512 files 24004 Disk IO Ext3 Read 18003 Ext3 Write Btrfs Read Btrfs Write 12002 6001

Disk offset (MB) 0

160 Seek Count 120 Ext3 Btrfs 80 40 Seeks / sec 0

50 Throughput 37 Ext3 Btrfs 25 MB/s 12 0 0 14 28 42 56 70 84 98 112 Time (seconds) Figure: Create one million 512 byte files Disk IO 27330 Ext3 Read Ext3 Write XFS Read 20497 XFS Write Btrfs Read 13665 6832

Disk offset (MB) 0

250 Seek Count

187 Ext3 XFS Btrfs 125 62 Seeks / sec 0

30 Throughput

22 Ext3 XFS Btrfs 15 MB/s 7 0 0 6 12 19 25 32 38 44 51 Time (seconds) Figure: Find one million 512 byte files 27330 Disk IO XFS Read 20497 XFS Write Btrfs Read Btrfs Write 13665 6832

Disk offset (MB) 0

350 Seek Count 262 XFS Btrfs 175 87 Seeks / sec 0

20 Throughput 15 XFS Btrfs 10 MB/s 5 0 0 44 89 134 179 224 269 314 359 Time (seconds) Figure: Read one million 512 byte files O Btrfs Snapshots

Subvolume trees and file extents are reference counted Snapshots are writable and can be snapshotted again

Root Node Root Node Cow Root Ref: 1 Ref: 1 Ref: 1

Leaf 1 Leaf 2 Leaf 3 Leaf 1 Leaf 2 Leaf 3 Ref: 1 Ref: 1 Ref: 1 Ref: 2 Ref: 2 Ref: 2

Figure: Before COW Figure: After COW

Chris Mason The Btrfs Filesystem O Btrfs Snapshots

Transactions are snapshots that are deleted after commit

Root Node Cow Root New Root Ref: 1 Ref: 1 Ref: 1

Leaf 1 Leaf 2 Leaf 3 Leaf 2 Leaf 3 Ref: 1 Ref: 2 Ref: 2 Ref: 1 Ref: 1

Figure: Modification after COW Figure: After Deletion

Chris Mason The Btrfs Filesystem O Btrfs Allocation Trees

Will help model underlying storage (DM integration) Record reference counts for all extents Contain resizable block groups Btrees can pull from one or more allocation trees

Chris Mason The Btrfs Filesystem O Btrfs Benchmarking

Single threaded, focus on layout mkfs. -d agcount=3 -l version=2,logsize=128m mount -t xfs -o logbsize=256k mount -t ext3 -o data=writeback Single SATA drive, no barriers used atimes left on Graphs from http://oss.oracle.com/∼mason/seekwatcher

Chris Mason The Btrfs Filesystem Disk IO Ext3 Read 30510 Ext3 Write XFS Read XFS Write 22882 Btrfs Read Btrfs Write 15255 7627

Disk offset (MB) 0

120 Seek Count

90 Ext3 XFS Btrfs 60 30 Seeks / sec 0

50 Throughput

37 Ext3 XFS Btrfs 25 MB/s 12 0 0 110 220 331 441 552 662 772 883 Time (seconds) Figure: Create one million 16 KB files 30510 Disk IO Ext3 Read 22882 Ext3 Write Btrfs Read Btrfs Write 15255 7627

Disk offset (MB) 0

45 Seek Count 33 Ext3 Btrfs 22 11 Seeks / sec 0

50 Throughput 37 Ext3 Btrfs 25 MB/s 12 0 0 54 108 162 216 270 324 378 432 Time (seconds) Figure: Create one million 16 KB files Disk IO 27425 Ext3 Read Ext3 Write XFS Read 20569 XFS Write Btrfs Read 13712 6856

Disk offset (MB) 0

250 Seek Count

187 Ext3 XFS Btrfs 125 62 Seeks / sec 0

20 Throughput

15 Ext3 XFS Btrfs 10 MB/s 5 0 0 6 12 18 24 30 36 42 48 Time (seconds) Figure: Find one million 16 KB files 29756 Disk IO XFS Read 22317 XFS Write Btrfs Read Btrfs Write 14878 7439

Disk offset (MB) 0

160 Seek Count 120 XFS Btrfs 80 40 Seeks / sec 0

35 Throughput 26 XFS Btrfs 17 MB/s 8 0 0 114 228 342 456 570 684 799 913 Time (seconds) Figure: Read one million 16 KB files O Compilebench

Simulates operations on kernel trees File sizes and names collected from real kernel trees Randomly create, compile, patch, clean, delete Intended to age and fragment the FS Created 90 initial trees and ran 150 operations http://oss.oracle.com/∼mason/compilebench

Chris Mason The Btrfs Filesystem O Compilebench

Btrfs Ext3 XFS Initial Create 33.78 MB/s 10.31 MB/s 15.30 MB/s Create 14.39 MB/s 12.47 MB/s 8.94 MB/s Patch 3.77 MB/s 4.48 MB/s 3.19 MB/s Compile 20.56 MB/s 15.28 MB/s 21.61 MB/s Clean 69.96 MB/s 43.84 MB/s 134.15 MB/s Read 5.37 MB/s 8.66 MB/s 7.05 MB/s Read Compiled 11.31 MB/s 13.22 MB/s 14.40 MB/s Delete 15.77s 18.34s 16.63s Delete Compiled 22.96s 33.35s 21.38s 15.86s 11.35s 7.25s Stat Compiled 21.82s 14.17s 8.99s

Chris Mason The Btrfs Filesystem Disk IO Ext3 Read 34614 Ext3 Write XFS Read XFS Write 25960 Btrfs Read Btrfs Write 17307 8653

Disk offset (MB) 0

300 Seek Count

225 Ext3 XFS Btrfs 150 75 Seeks / sec 0

50 Throughput

37 Ext3 XFS Btrfs 25 MB/s 12 0 0 37 74 111 148 185 223 260 297 Time (seconds) Figure: Create 20 kernel trees Disk IO Ext3 Read 40934 Ext3 Write XFS Read XFS Write 30700 Btrfs Read Btrfs Write 20467 10233

Disk offset (MB) 0

250 Seek Count

187 Ext3 XFS Btrfs 125 62 Seeks / sec 0

35 Throughput

26 Ext3 XFS Btrfs 17 MB/s 8 0 0 96 192 288 384 480 576 672 769 Time (seconds) Figure: Make -j simulated 20 kernel trees Disk IO Ext3 Read 40933 Ext3 Write XFS Read XFS Write 30700 Btrfs Read Btrfs Write 20466 10233

Disk offset (MB) 0

180 Seek Count

135 Ext3 XFS Btrfs 90 45 Seeks / sec 0

25 Throughput

18 Ext3 XFS Btrfs 12 MB/s 6 0 0 61 122 184 245 307 368 429 491 Time (seconds) Figure: Delete 20 kernel trees O TODO

Concurrency (tree locking) Multiple extent allocation trees OSYNC and fsync performance Online fsck Many many more

Chris Mason The Btrfs Filesystem