Duy Le (Dan) - The College of William and Mary Hai Huang - IBM T. J. Watson Research Center Haining Wang - The College of William and Mary Virtualization Games Games Web server Videos Programming Database server Web File server Mail server
Performance Implications Disks Disks Disks [Jujjuri-LinuxSym’10] of VirtFS - File system pass- [Boutcher-Hotstorage’09] through Different[Tang-ATC’11] I/O scheduler combinations File Systems
File Systems File Systems File Systems
[Tang-ATC’11] Storage space allocation and tracking dirty blocks functionality optimization Storage Storage 2 “Selected file systems are based on workloads” ◦ Only true in physical systems File systems for guest virtual machine ◦ Workloads ◦ Deployed file systems (at host level) Investigation needed! Guest File Systems
Ext2, Ext3, Ext4, ReiserFS, XFS, and JFS
Host File Systems
Ext2, Ext3, Ext4, ReiserFS, XFS, and JFS
3 For the best performance?
Best and worst Guest/Host File System combinations?
Guest and Host File System Dependency
◦ Varied I/Os and interaction ◦ File disk images and physical disks
4 . Experimentations . Macro level . Throughout analysis . Micro level . Findings and Advice
5 . Experimentations . Macro level . Throughout analysis . Micro level . Findings and Advice
6 Qemu 0.9.1; 512MB RAM 9 x 106 blocks Linux 2.6.32
Guest
File Systems vdc1Ext2 vdc2Ext3 vdc3Ext4 ReiserFSvdc4 vdc5XFS vdc6JFS
VirtIO
File Systems sdb1Ext2 sdb2Ext3 sdb3Ext4 ReiserFSsdb4 sdb5XFS sdb6JFS sdb7BD
Host
Pentium D 3.4 GHz, 2GB Ram Raw partition as 60 x 106 blocks Linux 2.6.32 + Qemu-KVM 0.12.3 block device (BD) Sparse disk image 1 TB, SATA 6Gb/s, 64MB Cache 60 x 106 blocks
7 Filebench ◦ File server, web server, database server, and mail server.
Throughput Latency
I/O Performance ◦ Different abstraction consideration Via block device (BD) Via nested file systems ◦ Relative performance variation BD as baseline
8 Relative performance to baseline of guest file systems
Host file Baseline systems
9 ReiserFSReiserFS guestguest filefile systemsystem
10 30 120
25 100 R 20 80
15 60
10 40
5 20 Percentage (%)
Throughput (MB/s) 0 0 BD Ext2 Ext3 Ext4 ReiserFS XFS JFS
ReiserFSExt4 guestguest filefile systemsystem
11 Guest file system Host file systems ◦ Varied performance Host file system Guest file systems ◦ Impacted differently Right and wrong combinations ◦ Bidirectional dependency I/Os behave differently ◦ Writes is more critical than Read (mail server)
12 8 100 7 90 80 6 R 70 5 60 4 50 3 40 30 2 20
1 10 Percentage (%)
Throughput (MB/s) 0 0 BD Ext2 Ext3 Ext4 ReiserFS XFS JFS
Ext2 Ext3 guest file system guest file system
13 Guest file system Host file systems ◦ Varied performance Host file system Guest file systems ◦ Impacted differently Right and wrong combinations ◦ Bidirectional dependency (mail server) I/Os behave differently ◦ Writes is more critical than Read (mail server)
14 2.5 80 70 2 60 R 50 1.5 40 1 30 20 0.5 10 Percentage (%)
Throughput (MB/s) 0 0 BD Ext2 Ext3 Ext4 ReiserFS XFS JFS
Ext2 guest file system
15 Guest file system Host file systems ◦ Varied performance Host file system Guest file systems ◦ Impacted differently Right and wrong combinations ◦ Bidirectional dependency I/Os behave differently ◦ Writes is more critical than Reads
16 2.5 Lower than 100% 80 70 2 60 R 50 1.5 40 1 30 20 0.5 10 Percentage (%)
Throughput (MB/s) 0 0 BD Ext2 Ext3 Ext4 ReiserFS XFS JFS
More WRITES
17 Guest file system Host file systems ◦ Varied performance Host file system Guest file systems ◦ Impacted differently Right and wrong combinations ◦ Bidirectional dependency I/Os behave differently ◦ WRITES are more critical than READS
18 Guest file system Host file systems ◦ Varied performance Host file system Guest file systems ◦ Impacted differently Right and wrong combinations ◦ Bidirectional dependency I/Os behave differently ◦ WRITES are more critical than READS Latency is sensitive to nested file systems
19 . Experimentations . Macro level . Throughout analysis . Micro level . Findings and Advice
20 Same testbed Primitive I/Os ◦ Reads or Writes ◦ Random or Sequential FIO benchmark Description Parameters Total I/O size 5 GB I/O parallelism 255 Block size 8 KB I/O pattern Random/Sequential I/O mode Native async I/O
21 2 160 140 1.6 R 120 1.2 100 Random 80 0.8 60 40 0.4 20 Percentage (%)
Throughput (MB/s) 0 0 BD Ext2 Ext3 Ext4 ReiserFS XFS JFS
100 160 140 80 R 120 60 100 Sequential 80 40 60 40 20 20 Percentage (%)
Throughput (MB/s) 0 0 BD Ext2 Ext3 Ext4 ReiserFS XFS JFS
Ext3 guest file system 22 Read dominated workloads ◦ Unaffected performance by nested file systems Write dominated workloads ◦ Heavily affected performance by nested file systems
23 4 160
140 R 3 120 100 2 80 Random 60 1 40 20 Percentage (%)
Throughput (MB/s) 0 0 BD Ext2 Ext3 Ext4 ReiserFS XFS JFS
100 160 140 80 R 120 60 100 80 Sequential 40 60 40 20 20 Percentage (%)
Throughput (MB/s) 0 0 BD Ext2 Ext3 Ext4 ReiserFS XFS JFS
24 Read dominated workloads ◦ Unaffected performance by nested file systems Write dominated workloads ◦ Heavily affected performance by nested file systems
25 100 160 100 160 140 Ext3 guest file system R 80 140 R Read 80dominated workloads 120 JFS guest file system 120 100 60◦ Unaffected 60 performance by nested file systems 100 80 80 40 Ext3 guest file system 60 Write 40 dominated workloads 60 40 20 40 20 20 ◦ Heavily affected performance by nestedPercentage (%) file 20 systems
Percentage (%)
Throughput (MB/s) 0 0 Throughput (MB/s) BD 0 Ext2 Ext3 Ext4 ReiserFS XFS JFS 0 BD Ext2 Ext3 Ext4 ReiserFS XFS JFS
SequentialSequential Reads:Reads : Ext3/JFSExt3/JFS vs.vs. Ext3/BDExt3/BD Sequential Writes: ◦ Ext3/ReiserFS vs. JFS/ReiserFS (same host file systems) ◦ JFS/ReiserFS vs. JFS/XFS (same guest file systems) I/O analysis using blktrace
26 Findings: Readahead at the hypervisor when nesting FS Long idle times for queuing
27 Different guests (Ext3, JFS) same host (ReiserFS) ◦ I/O scheduler and Block allocation scheme
28 Long waiting in the queue
Well merged
Low I/Os for . Ext3 causes multiple journaling back merges . JFS coalescences multiple log entries
29 Different guests (Ext3, JFS) same host (ReiserFS) ◦ I/O scheduler and Block allocation scheme ◦ Findings I/O schedulers are NOT effective for ALL nested file systems I/O scheduler’s effectiveness on block allocation scheme
30 Different guests (Ext3, JFS) same host (ReiserFS) Same guest (JFS) different hosts (ReiserFS, XFS) ◦ Block allocation schemes
31 Longer waiting in queue
Fairly similar s 1 Repeated 0.8 Long logging 0.6 distance 0.4 seeks 0.2 ReiserFS XFS 0 CDF of disk I/O 0 0.2 0.4 0.6 0.8 1 Normalized seek distance 32 Different guests (Ext3, JFS) same host (ReiserFS) Same guest (JFS) different hosts (ReiserFS, XFS) ◦ Block allocation schemes ◦ Findings: Effectiveness of guest file system’s block allocation is NOT guaranteed Journal logging on disk images lowers the performance
33 . Experimentations . Macro level . Throughout analysis . Micro level . Findings and Advice
34 Advice 1 – Read-dominated workloads ◦ Minimum impact on I/O throughput ◦ Sequential reads: even improve the performance
Advice 2 – Write-dominated workloads ◦ Nested file system should be avoided One more pass-through layer Extra metadata operations ◦ Journaling degrades performance
35 Advice 3 – I/O sensitive workloads ◦ I/O latency increased by 10-30% Advice 4 – Data allocation scheme ◦ Data and Metadata I/Os of nested file systems are not differentiated at host ◦ Pass-through host file system is even better!
Advice 5 – Tuning file system parameters ◦ “Non-smart” disk ◦ Noatime and nodiratime
36 37 Devices Blocks (x106) Speed (MB/s) Type sdb2 60.00 127.64 Ext2 sdb3 60.00 127.71 Ext3 sdb4 60.00 126.16 Ext4 sdb5 60.00 125.86 ReiserFS sdb6 60.00 123.47 XFS sdb7 60.00 122.23 JFS sdb8 60.00 121.35 Block Device
38