Red Hat Customer Convergence #rhconvergence

1 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence RED HAT ENTERPRISE : PERFORMANCE ENGINEERING PERFORMANCE UPDATE RHEL 6/7

Douglas Shakshober Senior Consulting Software Engineer February 6, 2014

2 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Red Hat Performance Engineering

 Benchmarks – code path coverage

 CPU – , lmbench

 Memory – lmbench, McCalpin Streams

 Disk IO – Iozone, aiostress – scsi, FC, iSCSI

 Filesystem – IOzone, postmark– ext3/4, xfs. gfs2,gluster

 Network – Netperf – 10 Gbit, 40 Gbit IB, PCI3

 Bare Metal, RHEL6/7 KVM

 White box AMD/Intel, with our OEM partners

3 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER Red#rhconvergence Hat Confidential Red Hat Performance Engineering L

 Application Performance

 Linpack MPI, SPECcpu (omp) – single systems, clusters

 AIM 7 – single systems, large smp

 Database DB2, Oracle 11G, Sybase 15.x , MySQL, Postgres, Mongo

 OLTP – metal/kvm/RHEV-M clusters - TPC-/virt

 DSS – metal/kvm/RHEV-M, IQ, TPC-H/virt

 SPECsfs NFS, Postmark

 SAP – SLCS, SD

 STAC = FSI – trading AMQP,Reuters, Tibco, etc

4 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Red Hat Performance R7 beta vs R6.5

● RHEL7 partner beta

− Intel in intel_idle driver - control cstate to 1 or 0 − NUMA (numa_balance), scheduler w/ large memory - 12 TB Testing: − CPU Performance Linpack/Stream, - SPECjbb − Iozone Performance w/ various filesystem +/- 3, EXT4 write issue − Databases (Oracle, Sybase, DB2, mySQL, Postgress, SAP

• Advanced Performance Tools − Tuna / Tuned / Perf − ISV support/request

● KVM new virtualization features

6 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence RHEL NUMA Scheduler

● RHEL6

● numactl, numastat enhancements ● numad – usermode tool, dynamically monitor, auto-tune ● RHEL7 beta – numabalance

● 3.10-35 checked in by Rik van Riel

● Derived from Andrea Arcangeli, Mel Gorman, Peter Zijlstra, Ingo M ● Enable / Disable

● echo NUMA > /sys/kernel/debug/sched_features ● echo NO_NUMA > /sys/kernel/debug/sched_features

7 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Non-Uniform Memory Access - NUMA

● The Linux system scheduler is very good at maintaining responsiveness and optimizing for CPU utilization

● Tries to use idle CPUs, regardless of where process memory is located.... Using remote memory degrades performance!

● Red Hat is working with the upstream community to increase NUMA awareness of the scheduler and to implement automatic NUMA balancing. ● Remote memory latency matters most for long- running, significant processes, e.g., HPTC, VMs, etc.

8 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence How to manage NUMA manually - Checklist

● Research NUMA topology of each system

● Make a resource plan for each system

● Bind both CPUs and Memory

● Might also consider devices and IRQs ● Use numactl for native jobs:

● numactl -N -m ● Use numatune for libvirt started guests

● Edit xml: ● Use Cgroups w/ apps to bind cpu/mem to numa nodes

9 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Know Your Hardware (hwloc)

Solarflare SFN6322

10 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Numa Performance – Specjbb

Multi-instance Java peak SPECjbb2005

Multi-instance Java loads fit within 1-node

1200000 1.2

1000000 1.15

800000 1.1 4 3 2 )

l 1 a t

o 600000 1.05 %gain vs noauto t (

s p o b

400000 1

200000 0.95

0 0.9 3.10-54 nonuma 3.10-54 numa numactl

11 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Use numastat to see memory layout

● Rewritten for RHEL to show per-node system and process memory information

● 100% compatible with prior version by default, displaying /sys...node/numastat memory allocation statistics

● Any command options invoke new functionality

● -m for per-node system memory info ● for per-node process memory info ● See numastat(8)

12 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence numastat - java processes w/NUMA-balance on

# numastat -c java (default scheduler – non-optimal) Per-node process memory usage (in MBs) PID Node 0 Node 1 Node 2 Node 3 Total ------57501 (java) 755 1121 480 698 3054 57502 (java) 1068 702 573 723 3067 57503 (java) 649 1129 687 606 3071 57504 (java) 1202 678 1043 150 3073 ------Total 3674 3630 2783 2177 12265

# numastat -c java (numabalance close to opt) Per-node process memory usage (in MBs) PID Node 0 Node 1 Node 2 Node 3 Total ------56918 (java) 49 2791 56 37 2933 56919 (java) 2769 76 55 32 2932 56920 (java) 19 55 77 2780 2932 56921 (java) 97 65 2727 47 2936 ------Total 2935 2987 2916 2896 11734

13 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence NUMA Performance – Database Single Large DB

Postgres Sysbench OLTP

2-socket Westmere EP 24p/48 GB

700000 1.2

600000 1.15 500000 3.10-54 base 1.1 c 400000 3.10-54 numa e s

/ NumaD % s

n 300000 a

r 1.05 t 200000 1 100000

0 0.95 10 20 30

threads

14 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Numa Performance – Single Oracle Database

RHEL7 vs RHEL6 Oracle OLTP Performance Miminize impact on large single app

RHEL6.4 RHEL6.4 – numad 3.10-54 numa 3.10-54 no numa

15 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence RHEL7 beta Performance Tuning

● RHEL 7 beta potential tuning

● tuned-adm profile throughput-performance ● tuned-adm profile latency-performance (to turn cstate=1)

● NUMAbalance scheduler via

● echo NO_NUMA > /sys/kernel/debug/sched_feature

● Adjust dirty ratios back to rhel6 40 and 10

● vm.dirty_ratio = 40 ● vm.dirty_background_ratio = 10

16 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence RHEL7 Network Features

• Overview of new Networking Features in RHEL7 • Adaptive Tickless (dynticks) Patchset • BUSY_POLL Socket Option • Power Management • Tunable Workqueues

17 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence RHEL7 Networks 1/3

● IPv4 Routing Cache, bye-bye − Reduce overhead for route lookups ● Socket BUSY_POLL (aka low latency sockets) − Performance numbers later ● 127/8 is (optionally) routable now – for cloud stuff ● 40G NIC support, bottleneck moves back to CPU :-) ● RFS, aRFS, XPS etc ● ipset is included, accelerates complex iptables rules ● netsniff-ng included ... ifpps awesome

18 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence RHEL7 Networks 2/3

● SO_REUSEPORT socket option − Multiple sockets listen on same port, TCP & UDP

● Bufferbloat Avoidance – non-LAN-latency situations − TCP Small Queues (tcp_limit_output_bytes) − CoDel and FW CoDel Packet Schedulers

● TCP Proportional Rate Reduction (PRR) − Improves reaction time of window scaling, 3-10% range

● TCP connection repair − hostTo support LXC, stop TCP connection and restart on another

19 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence RHEL7 Networks 3/3

● Performance Co-Pilot Support − pmatop awesome, also pmcollectl ● Per-cgroup TCP Buffer Limits − Memory pressure controls for TCP ● Stacked VLANs 802.1ad QinQ Support − Frame header includes > 1 VLAN tag ● PTP full support in 6.5 and 7.0 − Requires NIC driver enablement ● Chrony offered instead of ntpd (ntpd still included)

20 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence New Networking Features in RHEL7

● Linux Containers (LXC) Network Namespaces −Per-namespace sysctl tunables ● TCP Fast Open socket option −Combines first 2 steps of handshake ● TCP Tail Loss Probe −Reduce impact of lost packets (RTO ~ 15%)

21 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence RHEL “tuned” package

# yum install tune* # tuned-adm profile latency-performance # tuned-adm list Available profiles: - latency-performance - default - enterprise-storage - virtual-guest - throughput-performance - virtual-host Current active profile: latency-performance # tuned-adm profile default (to disable)

22 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence “tuned” Profile Summary

Tunable default enterprise- virtual-hostvirtual- latency- throughput- storage guest performance performance kernel.sched_min_ 4ms 10ms 10ms 10ms 10ms granularity_ns kernel.sched_wakeup_ 4ms 15ms 15ms 15ms 15ms granularity_ns vm.dirty_ratio 20% RAM 40% 10% 40% 40% vm.dirty_background_ra10% RAM 5% tio vm.swappiness 60 10 30 I/O Scheduler (Elevator)CFQ deadline deadline deadline deadline deadline

Filesystem Barriers On Off Off Off CPU Governor ondemand performance performance performance Disk Read-ahead 4x

Disable THP Yes

CPU C-States Locked @ 1

23 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Impact of Power Management on Latency and High Context-Switching Workloads (storage/network) ) s d n o 250 c C6 C3 C1 C0 e s 200  o Current status r c i 150  Network off +/-3% M (

 y Storage +/-5% c 100 n  Future Plans e t a 50  Impact on Customers L 0

60000

50000 s

/ 40000 s n a r T

30000 R R f r 20000 e p t e n 10000

0 R7 UDP baseline R7 TCP baseline R7 UDP lat-perf R7 TCP lat-perf 24 R6 UDP baseline R6 TCPRED baseline HAT CONFIDENTIAL | DOUGLASR6 UDP SHAKSHOBER lat-perf R6 TCP lat-perf #rhconvergence Adaptive Tickless (DynTicks) Patchset

● Goal of this patchset is to stop interrupting userspace when

● nr_running=1 (see /proc/sched_debug)

● Idea being that if runqueue depth is 1, then the scheduler

● should have nothing to do on that core

● Move all timekeeping to non-latency-sensitive cores

● Mark certain cores as full_nohz cores

● In addition to cmdline options full_nohz and rcu_nocbs − Also need to move RCU threads yourself (pgrep, taskset, tuna)

25 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Precision Time Protocol (IEEE-1588v2)

● Tech Preview in RHEL 6.4, Full Support in 6.5 −Limited driver enablement in 6.4 −6.5: bnx2x, tg3, e1000e, igb, ixgbe, and sfc ● Improved synchronization accuracy over NTP −PTP Hardware timestamping most accurate • Query your NICs PTP capabilities: ethtool -T p1p1 ● Improve time sync by disabling tickless kernel −nohz=off −Increased power consumption

26 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Precision Time Protocol (IEEE-1588v2) nohz=on

nohz=off

27 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Adaptive Tickless (DynTicks) Patchset

● Reading: −https://www.kernel.org/doc/Documentation/timers/NO_HZ.t xt −http://lwn.net/Articles/549580/ −http://www.youtube.com/watch?v=G3jHP9kNjwc

28 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Timeline of a tick...tick...tick...RHEL5 Time

jiffies jiffies+1 jiffies+2 jiffies+3 jiffies+4

Userspace Task Timer Interrupt

29 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Timeline of a tick...tick...tick...RHEL6 and 7 CONFIG_NO_HZ Time

jiffies jiffies+1 jiffies+2 jiffies+3 jiffies+4

Userspace Task Timer Interrupt Idle

30 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Timeline of a tick...tick...tick...RHEL7 CONFIG_NO_HZ_FULL Time

jiffies jiffies+1 jiffies+2 jiffies+3 jiffies+4

 Tickless doesn't require idle Userspace Task Timer Interrupt

31 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Examining the tick 1/3

# egrep 'CPU|LOC' /proc/interrupts # perf list|grep local_timer irq_vectors:local_timer_entry [Tracepoint event] irq_vectors:local_timer_exit [Tracepoint event]

32 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Examining the tick 2/3

# perf stat -C 1 -e irq_vectors:local_timer_entry sleep 1 9 irq_vectors:local_timer_entry

# perf stat -C 1 -e irq_vectors:local_timer_entry taskset -c 1 /root/pig -s 1 1,002 irq_vectors:local_timer_entry

Reboot with full_nohz=1 rcu_nocbs=1 # tuna -c 1 -i ; tuna -q \* -c 1 -i

# perf stat -C 1 -e irq_vectors:local_timer_entry taskset -c 1 /root/pig -s 1 5 irq_vectors:local_timer_entry

33 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Examining the tick 3/3 (debugfs)

# mount -t debugfs nodev /sys/kernel/debug # cd /sys/kernel/debug/tracing # echo 1 > events/irq_vectors/enable # cat trace # tracer: nop # # entries-in-buffer/entries-written: 432/432 #P:8 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | -0 [007] dNh. 22793.558298: reschedule_entry: vector=253 -0 [007] dNh. 22793.558299: reschedule_exit: vector=253 -0 [000] d.h. 22793.558969: local_timer_entry: vector=239 -0 [000] d.h. 22793.558977: local_timer_exit: vector=239 -0 [000] d.H. 22793.558980: irq_work_entry: vector=246 -0 [000] dNH. 22793.558983: irq_work_exit: vector=246 -0 [000] d.h. 22793.559970: local_timer_entry: vector=239 -0 [000] d.h. 22793.559977: local_timer_exit: vector=239 ...

34 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence NUMA Topology and PCI Bus

● Servers may have more than 1 PCI bus.

● Install adapters “close” to the CPU that will run the performance critical application.

● When BIOS reports locality, irqbalance handles NUMA/IRQ affinity automatically.

42:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

# cat /sys/devices/pci0000\:40/0000\:40\:03.0/0000\:42\:00.0/local_cpulist

1,3,5,7,9,11,13,15

# dmesg | grep "NUMA node" pci_bus 0000:00: on NUMA node 0 (pxm 1) pci_bus 0000:40: on NUMA node 1 (pxm 2) pci_bus 0000:3f: on NUMA node 0 (pxm 1) pci_bus 0000:7f: on NUMA node 1 (pxm 2)

35 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Performance Projects / Tooling

● RHEL6.5 “numad” “tuna”, and “tuned”

● Tuna used to bind IRQ's / real-time like isolation

● Profiling challenges −Data address profiling (cache-2-cache detection), providing: • the hottest contended cachelines • the process names, addresses, pids, tids causing that contention • the cpus they ran on, • and how the cacheline is being accessed (read or write)

36 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence “tuned” Profile Summary

Tunable default enterprise- virtual-hostvirtual- latency- throughput- storage guest performance performance kernel.sched_min_ 4ms 10ms 10ms 10ms 10ms granularity_ns kernel.sched_wakeup_ 4ms 15ms 15ms 15ms 15ms granularity_ns vm.dirty_ratio 20% RAM 40% 10% 40% 40% vm.dirty_background_ra10% RAM 5% tio vm.swappiness 60 10 30 I/O Scheduler (Elevator)CFQ deadline deadline deadline deadline deadline

Filesystem Barriers On Off Off Off CPU Governor ondemand performance performance performance Disk Read-ahead 4x

Disable THP Yes

CPU C-States Locked @ 1

37 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Iozone Performance Effect of TUNED

RHEL6.4 In Cache Performance RHEL6.4 File System Out of Cache Performance

Intel Large File I/O (iozone) Intel Large File I/O (iozone)

800 not tuned tuned 4500 700

4000 600 3500

c 500 3000 e not tuned S c / e B

S tuned / M

B n

2500 i

M 400

t n u i

p t h u g p 2000 u h o g r 300 u h o T r

h 1500 T 200 1000

500 100

0 ext3 ext4 xfs gfs2 0 ext3 ext4 xfs gfs2

38 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence System Tuning Tool - tuna

• Tool for fine grained control

• Display applications / processes

• Displays CPU enumeration

• Socket (useful for NUMA tuning)

• Dynamic control of tuning

• Process affinity • Parent & threads • Scheduling policy • Device IRQ priorities, etc

39 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Tuna (RHEL6.4/ RHEL7) 1 2

3

40 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Network Tuning: IRQ affinity

● irqbalance for the common case – disable to tune

● New irqbalance automates NUMA affinity for IRQs

● Flow-Steering Technologies

● Move 'p1p1*' IRQs to Socket 1:

● Service irqbalance stop # tuna -q p1p1* -S1 -m -x # tuna -Q | grep p1p1

● Manual IRQ pinning for the last X percent/determinism

● Guide on Red Hat Customer Portal

41 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Tuna IRQ/CPU affinity context menus

 CPU affinity for IRQs

 CPU affinity for PIDs  Scheduler Policy  Scheduler Priority

42 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence RHEL6.5 and RHEL7 Virt Performance RHEL 6.5 ● Virtio dataplane, 4TB mem limit RHEL 7 ● NUMA balance code

● KVM pvticketed_spinlocks, ACPIv Large Guest Perf ● NUMA in a guest, ACPIv, New 4TB mem limit

RHEV 3.3 (Based on RHEL 6.5)

● New memory overcommit manager – MOM

● Network QOS, Native Gluster (libgfapi)

43 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence RHEL7 w/ ticketed spinlocks 3.10.0-12.el7 pvticketlocks.x86_64 – note R6 unfair-locks

Linpack NxN 20000x20016

Westmere 12core, 64 GB mem, pvticketlocks

140 42

120 41.5

100 41 Bare-metal noticketed 80 pvticketed s

p 40.5

o %diff l f g 60

40 40

39.5 20

0 39 1 2 4 8 12

44 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence RH 3.10 OLTP Performance

R7 / F17 OLTP w/ spinlock backoff

(perf74, 4-socket, 512 GB, 2 FC clarion

1000000 1.1

900000 1.08 800000 1.06 700000 80U 1.04 600000 100U delta M 500000 1.02 P T 400000 1 300000 0.98 200000 0.96 100000

0 0.94 RHEL63 – all nodes 3.6.0-0.24.autonuma28fast.test.x86_64 3.6.10-2.tlw16upstream.fc17.x86_64

45 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence RH/IBM Top virtualized benchmarks

● SPECvirt2010/2012

● IBM SAP SD 2-tier bare metal / virtualized results − IBM System x3850 X5, 4 socket 40 core 80 thread system − Bare metal 12,560 SD users, KVM (80 CPU guest) 10,700 − 85% of bare metal ● IBM TPC-C – World Record w/ DB2

46 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Virtualization Benchmarks

SPECvirt_sc2013 − Increased workload injection rates − Multi vcpu guests • All one vcpu guests in SPECvirt_sc2010 − Up to four tiles using the same database VM TPC-VMS − Three independent TPC-C, TPC-H, TPC-E, or TPC-DS benchmarks • running simultaneously − Metric is lowest of the three scores − Large vcpu count guests − Large disk IO requirements

47 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence SPECvirt2010: RHEL 6 KVM Post Industry Leading Results

> 1 SPECvirt Tile/core > 1 SPECvirt Tile/core

Key Enablers:

 SR-IOV

 Huge Pages

Blue = Disk I/O  Green = Network I/O Virtualization Layer and Hardware NUMA System Under Test (SUT)  Client Hardware Node Binding

http://www.spec.org/virt_sc2010/results/

48 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Best SPECvirt_sc2010 Scores by CPU Cores

(As of May 30, 2013)

8-socket 64/80 10,000 8,956 9,000 8,000 e r 7,000 o 4-socket 40 c 6,000 5,467 s 4,682 0 5,000

1 3,824

0 4,000

2 2,742 c 3,000 2,442 s 2-socket 16 1,8782-socket2,144 20 _ 2,000 1,570 t 1,221 1,367 r i 1,000 2-socket 12 v

C 0

E VMware RHEL 6 VMware RHEV 3.1 VMware RHEL 6 VMware VMware RHEL 6 RHEL 6 RHEL 6 P ESX 4.1 (KVM) IBM ESXi 5.0 HP DL380p ESXi 4.1 (KVM) IBM ESXi 4.1 ESXi 4.1 (KVM) HP (KVM) IBM (KVM) HP S HP DL380 HS22V (12 HP DL385 gen8 (16 HP BL620c HX5 w/ HP DL380 IBM x3850 DL580 G7 x3850 X5 DL980 G7 G7 (12 Cores, 84 G7 (16 Cores,150 G7 (20 MAX5 (20 G7 (12 X5 (40 (40 Cores, (64 (80 Cores, Cores, 78 VMs) Cores, 102 VMs) Cores, 120 Cores, 132 Cores, 168 Cores, 234 288 VMs) Cores,336 552 VMs) VMs) VMs) VMs) VMs) Vms) VMs) VMs)

Comparison based on best performing Red Hat and VMware solutions by cpu core count publishedSystem at www.spec.org as of May 17, 2013. SPEC® and the name SPECvir_sct® are registered trademarks of the Standard Performance Evaluation Corporation. For more information about SPECvirt_sc2010, see www.spec.org/virt_sc2010/.

49 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence KVM / RHS Tuning

● gluster volume set group virt

● XFS mkfs -n size=8192, mount inode64, noatime

● RHS server: tuned-adm profile rhs-virtualization

● Increase in readahead, lower dirty ratio's ● KVM host: tuned-adm profile virtual-host

● Better response time shrink guest block device queue

● /sys/block/vda/queue/nr_request (16 or 8) ● Best sequential read throughput, raise VM read-ahead

● /sys/block/vda/queue/read_ahead_kb (4096/8192)

50 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Iozone Performance Comparison RHS2.1/XFS w RHEV

Out-of-the-box tuned rhs-virtualization

7000

6000

5000

4000

3000

2000

1000

0 rnd-write rnd-read seq-write seq-read

51 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence RHEL6 Performance Tuning Summary

● Use “Tuned”, “NumaD” and “Tuna” in RHEL6.x ● Tuned selects the deadline IO elevator

● Power savings mode (performance), locked (latency)

● Transparent Hugepages for annon memory (monitor it)

● Multi-instance consider NUMAD

● Virtualization – virtio drivers, consider SR-IOV

● Manually Tune ● NUMA – via numactl, monitor numastat -c pid

● Huge Pages – static hugepages for pinned shared-memory

● Managing VM, dirty ratio and swappiness tuning

● Use cgroups for further access control

● Perf and Tuna examples in appendix

52 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Helpful Links

● Red Hat Low Latency Performance Tuning Guide

● Optimizing RHEL Performance by Tuning IRQ Affinity

● Red Hat Performance Tuning Guide

● Red Hat Virtualization Tuning Guide

● STAC Network I/O SIG

● Finteligent Low Latency Tuning w/KVM

53 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Questions

54 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence