<<

Duy Le (Dan) - The College of William and Mary Hai Huang - IBM T. J. Watson Research Center Haining Wang - The College of William and Mary Virtualization Games Games Web server Videos Programming Database server Web server Mail server

Performance Implications Disks Disks Disks [Jujjuri-LinuxSym’10] of VirtFS - pass- [Boutcher-Hotstorage’09] through Different[Tang-ATC’11] I/O scheduler combinations File Systems

File Systems File Systems File Systems

[Tang-ATC’11] Storage space allocation and tracking dirty blocks functionality optimization Storage Storage 2  “Selected file systems are based on workloads” ◦ Only true in physical systems  File systems for guest virtual machine ◦ Workloads ◦ Deployed file systems (at host level)  Investigation needed! Guest File Systems

, , , ReiserFS, XFS, and JFS

Host File Systems

 Ext2, Ext3, Ext4, ReiserFS, XFS, and JFS

3  For the best performance?

 Best and worst Guest/Host File System combinations?

 Guest and Host File System Dependency

◦ Varied I/Os and interaction ◦ File disk images and physical disks

4 . Experimentations . Macro level . Throughout analysis . Micro level . Findings and Advice

5 . Experimentations . Macro level . Throughout analysis . Micro level . Findings and Advice

6 Qemu 0.9.1; 512MB RAM 9 x 106 blocks 2.6.32

Guest

File Systems vdc1Ext2 vdc2Ext3 vdc3Ext4 ReiserFSvdc4 vdc5XFS vdc6JFS

VirtIO

File Systems sdb1Ext2 sdb2Ext3 sdb3Ext4 ReiserFSsdb4 sdb5XFS sdb6JFS sdb7BD

Host

Pentium D 3.4 GHz, 2GB Ram Raw partition as 60 x 106 blocks Linux 2.6.32 + Qemu-KVM 0.12.3 device (BD) Sparse 1 TB, SATA 6Gb/s, 64MB Cache 60 x 106 blocks

7  Filebench ◦ File server, web server, database server, and mail server.

 Throughput  Latency

 I/O Performance ◦ Different abstraction consideration  Via block device (BD)  Via nested file systems ◦ Relative performance variation  BD as baseline

8 Relative performance to baseline of guest file systems

Host file Baseline systems

9 ReiserFSReiserFS guestguest filefile systemsystem

10 30 120

25 100 R 20 80

15 60

10 40

5 20 Percentage (%)

Throughput (MB/s) 0 0 BD Ext2 Ext3 Ext4 ReiserFS XFS JFS

ReiserFSExt4 guestguest filefile systemsystem

11  Guest file system  Host file systems ◦ Varied performance  Host file system  Guest file systems ◦ Impacted differently  Right and wrong combinations ◦ Bidirectional dependency  I/Os behave differently ◦ Writes is more critical than Read (mail server)

12 8 100 7 90 80 6 R 70 5 60 4 50 3 40 30 2 20

1 10 Percentage (%)

Throughput (MB/s) 0 0 BD Ext2 Ext3 Ext4 ReiserFS XFS JFS

Ext2 Ext3 guest file system guest file system

13  Guest file system  Host file systems ◦ Varied performance  Host file system  Guest file systems ◦ Impacted differently  Right and wrong combinations ◦ Bidirectional dependency (mail server)  I/Os behave differently ◦ Writes is more critical than Read (mail server)

14 2.5 80 70 2 60 R 50 1.5 40 1 30 20 0.5 10 Percentage (%)

Throughput (MB/s) 0 0 BD Ext2 Ext3 Ext4 ReiserFS XFS JFS

Ext2 guest file system

15  Guest file system  Host file systems ◦ Varied performance  Host file system  Guest file systems ◦ Impacted differently  Right and wrong combinations ◦ Bidirectional dependency  I/Os behave differently ◦ Writes is more critical than Reads

16 2.5 Lower than 100% 80 70 2 60 R 50 1.5 40 1 30 20 0.5 10 Percentage (%)

Throughput (MB/s) 0 0 BD Ext2 Ext3 Ext4 ReiserFS XFS JFS

More WRITES

17  Guest file system  Host file systems ◦ Varied performance  Host file system  Guest file systems ◦ Impacted differently  Right and wrong combinations ◦ Bidirectional dependency  I/Os behave differently ◦ WRITES are more critical than READS

18  Guest file system  Host file systems ◦ Varied performance  Host file system  Guest file systems ◦ Impacted differently  Right and wrong combinations ◦ Bidirectional dependency  I/Os behave differently ◦ WRITES are more critical than READS  Latency is sensitive to nested file systems

19 . Experimentations . Macro level . Throughout analysis . Micro level . Findings and Advice

20  Same testbed  Primitive I/Os ◦ Reads or Writes ◦ Random or Sequential  FIO benchmark Description Parameters Total I/O size 5 GB I/O parallelism 255 Block size 8 KB I/O pattern Random/Sequential I/O mode Native async I/O

21 2 160 140 1.6 R 120 1.2 100 Random 80 0.8 60 40 0.4 20 Percentage (%)

Throughput (MB/s) 0 0 BD Ext2 Ext3 Ext4 ReiserFS XFS JFS

100 160 140 80 R 120 60 100 Sequential 80 40 60 40 20 20 Percentage (%)

Throughput (MB/s) 0 0 BD Ext2 Ext3 Ext4 ReiserFS XFS JFS

Ext3 guest file system 22  Read dominated workloads ◦ Unaffected performance by nested file systems  Write dominated workloads ◦ Heavily affected performance by nested file systems

23 4 160

140 R 3 120 100 2 80 Random 60 1 40 20 Percentage (%)

Throughput (MB/s) 0 0 BD Ext2 Ext3 Ext4 ReiserFS XFS JFS

100 160 140 80 R 120 60 100 80 Sequential 40 60 40 20 20 Percentage (%)

Throughput (MB/s) 0 0 BD Ext2 Ext3 Ext4 ReiserFS XFS JFS

24  Read dominated workloads ◦ Unaffected performance by nested file systems  Write dominated workloads ◦ Heavily affected performance by nested file systems

25 100 160 100 160 140 Ext3 guest file system R  80 140 R Read 80dominated workloads 120 JFS guest file system 120 100 60◦ Unaffected 60 performance by nested file systems 100 80 80 40 Ext3 guest file system 60  Write 40 dominated workloads 60 40 20 40 20 20 ◦ Heavily affected performance by nestedPercentage (%) file 20 systems

Percentage (%)

Throughput (MB/s) 0 0 Throughput (MB/s) BD 0 Ext2 Ext3 Ext4 ReiserFS XFS JFS 0 BD Ext2 Ext3 Ext4 ReiserFS XFS JFS

 SequentialSequential Reads:Reads : Ext3/JFSExt3/JFS vs.vs. Ext3/BDExt3/BD  Sequential Writes: ◦ Ext3/ReiserFS vs. JFS/ReiserFS (same host file systems) ◦ JFS/ReiserFS vs. JFS/XFS (same guest file systems)  I/O analysis using blktrace

26  Findings:  Readahead at the hypervisor when nesting FS  Long idle times for queuing

27  Different guests (Ext3, JFS) same host (ReiserFS) ◦ I/O scheduler and Block allocation scheme

28 Long waiting in the queue

Well merged

Low I/Os for . Ext3 causes multiple journaling back merges . JFS coalescences multiple log entries

29  Different guests (Ext3, JFS) same host (ReiserFS) ◦ I/O scheduler and Block allocation scheme ◦ Findings  I/O schedulers are NOT effective for ALL nested file systems  I/O scheduler’s effectiveness on block allocation scheme

30  Different guests (Ext3, JFS) same host (ReiserFS)  Same guest (JFS) different hosts (ReiserFS, XFS) ◦ Block allocation schemes

31 Longer waiting in queue

Fairly similar s 1 Repeated 0.8 Long logging 0.6 distance 0.4 seeks 0.2 ReiserFS XFS 0 CDF of disk I/O 0 0.2 0.4 0.6 0.8 1 Normalized seek distance 32  Different guests (Ext3, JFS) same host (ReiserFS)  Same guest (JFS) different hosts (ReiserFS, XFS) ◦ Block allocation schemes ◦ Findings:  Effectiveness of guest file system’s block allocation is NOT guaranteed  Journal logging on disk images lowers the performance

33 . Experimentations . Macro level . Throughout analysis . Micro level . Findings and Advice

34  Advice 1 – Read-dominated workloads ◦ Minimum impact on I/O throughput ◦ Sequential reads: even improve the performance

 Advice 2 – Write-dominated workloads ◦ Nested file system should be avoided  One more pass-through layer  Extra metadata operations ◦ Journaling degrades performance

35  Advice 3 – I/O sensitive workloads ◦ I/O latency increased by 10-30%  Advice 4 – Data allocation scheme ◦ Data and Metadata I/Os of nested file systems are not differentiated at host ◦ Pass-through host file system is even better!

 Advice 5 – Tuning file system parameters ◦ “Non-smart” disk ◦ Noatime and nodiratime

36 37 Devices Blocks (x106) Speed (MB/s) Type sdb2 60.00 127.64 Ext2 sdb3 60.00 127.71 Ext3 sdb4 60.00 126.16 Ext4 sdb5 60.00 125.86 ReiserFS sdb6 60.00 123.47 XFS sdb7 60.00 122.23 JFS sdb8 60.00 121.35 Block Device

38