Red Hat Customer Convergence #rhconvergence
1 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence RED HAT ENTERPRISE LINUX: PERFORMANCE ENGINEERING PERFORMANCE UPDATE RHEL 6/7
Douglas Shakshober Senior Consulting Software Engineer February 6, 2014
2 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Red Hat Performance Engineering
Benchmarks – code path coverage
CPU – linpack, lmbench
Memory – lmbench, McCalpin Streams
Disk IO – Iozone, aiostress – scsi, FC, iSCSI
Filesystem – IOzone, postmark– ext3/4, xfs. gfs2,gluster
Network – Netperf – 10 Gbit, 40 Gbit IB, PCI3
Bare Metal, RHEL6/7 KVM
White box AMD/Intel, with our OEM partners
3 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER Red#rhconvergence Hat Confidential Red Hat Performance Engineering L
Application Performance
Linpack MPI, SPECcpu (omp) – single systems, clusters
AIM 7 – single systems, large smp
Database DB2, Oracle 11G, Sybase 15.x , MySQL, Postgres, Mongo
OLTP – metal/kvm/RHEV-M clusters - TPC-C/virt
DSS – metal/kvm/RHEV-M, IQ, TPC-H/virt
SPECsfs NFS, Postmark
SAP – SLCS, SD
STAC = FSI – trading AMQP,Reuters, Tibco, etc
4 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Red Hat Performance R7 beta vs R6.5
● RHEL7 partner beta
− Intel in intel_idle driver - control cstate to 1 or 0 − NUMA (numa_balance), scheduler w/ large memory - 12 TB Testing: − CPU Performance Linpack/Stream, Java - SPECjbb − Iozone Performance w/ various filesystem +/- 3, EXT4 write issue − Databases (Oracle, Sybase, DB2, mySQL, Postgress, SAP
• Advanced Performance Tools − Tuna / Tuned / Perf − ISV support/request
● KVM new virtualization features
6 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence RHEL NUMA Scheduler
● RHEL6
● numactl, numastat enhancements ● numad – usermode tool, dynamically monitor, auto-tune ● RHEL7 beta – numabalance
● 3.10-35 checked in by Rik van Riel
● Derived from Andrea Arcangeli, Mel Gorman, Peter Zijlstra, Ingo M ● Enable / Disable
● echo NUMA > /sys/kernel/debug/sched_features ● echo NO_NUMA > /sys/kernel/debug/sched_features
7 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Non-Uniform Memory Access - NUMA
● The Linux system scheduler is very good at maintaining responsiveness and optimizing for CPU utilization
● Tries to use idle CPUs, regardless of where process memory is located.... Using remote memory degrades performance!
● Red Hat is working with the upstream community to increase NUMA awareness of the scheduler and to implement automatic NUMA balancing. ● Remote memory latency matters most for long- running, significant processes, e.g., HPTC, VMs, etc.
8 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence How to manage NUMA manually - Checklist
● Research NUMA topology of each system
● Make a resource plan for each system
● Bind both CPUs and Memory
● Might also consider devices and IRQs ● Use numactl for native jobs:
● numactl -N
● Edit xml:
9 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Know Your Hardware (hwloc)
Solarflare SFN6322
10 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Numa Performance – Specjbb
Multi-instance Java peak SPECjbb2005
Multi-instance Java loads fit within 1-node
1200000 1.2
1000000 1.15
800000 1.1 4 3 2 )
l 1 a t
o 600000 1.05 %gain vs noauto t (
s p o b
400000 1
200000 0.95
0 0.9 3.10-54 nonuma 3.10-54 numa numactl
11 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Use numastat to see memory layout
● Rewritten for RHEL to show per-node system and process memory information
● 100% compatible with prior version by default, displaying /sys...node
● Any command options invoke new functionality
● -m for per-node system memory info ●
12 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence numastat - java processes w/NUMA-balance on
# numastat -c java (default scheduler – non-optimal) Per-node process memory usage (in MBs) PID Node 0 Node 1 Node 2 Node 3 Total ------57501 (java) 755 1121 480 698 3054 57502 (java) 1068 702 573 723 3067 57503 (java) 649 1129 687 606 3071 57504 (java) 1202 678 1043 150 3073 ------Total 3674 3630 2783 2177 12265
# numastat -c java (numabalance close to opt) Per-node process memory usage (in MBs) PID Node 0 Node 1 Node 2 Node 3 Total ------56918 (java) 49 2791 56 37 2933 56919 (java) 2769 76 55 32 2932 56920 (java) 19 55 77 2780 2932 56921 (java) 97 65 2727 47 2936 ------Total 2935 2987 2916 2896 11734
13 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence NUMA Performance – Database Single Large DB
Postgres Sysbench OLTP
2-socket Westmere EP 24p/48 GB
700000 1.2
600000 1.15 500000 3.10-54 base 1.1 c 400000 3.10-54 numa e s
/ NumaD % s
n 300000 a
r 1.05 t 200000 1 100000
0 0.95 10 20 30
threads
14 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Numa Performance – Single Oracle Database
RHEL7 vs RHEL6 Oracle OLTP Performance Miminize impact on large single app
RHEL6.4 RHEL6.4 – numad 3.10-54 numa 3.10-54 no numa
15 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence RHEL7 beta Performance Tuning
● RHEL 7 beta potential tuning
● tuned-adm profile throughput-performance ● tuned-adm profile latency-performance (to turn cstate=1)
● NUMAbalance scheduler via
● echo NO_NUMA > /sys/kernel/debug/sched_feature
● Adjust dirty ratios back to rhel6 40 and 10
● vm.dirty_ratio = 40 ● vm.dirty_background_ratio = 10
16 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence RHEL7 Network Features
• Overview of new Networking Features in RHEL7 • Adaptive Tickless (dynticks) Patchset • BUSY_POLL Socket Option • Power Management • Tunable Workqueues
17 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence RHEL7 Networks 1/3
● IPv4 Routing Cache, bye-bye − Reduce overhead for route lookups ● Socket BUSY_POLL (aka low latency sockets) − Performance numbers later ● 127/8 is (optionally) routable now – for cloud stuff ● 40G NIC support, bottleneck moves back to CPU :-) ● RFS, aRFS, XPS etc ● ipset is included, accelerates complex iptables rules ● netsniff-ng included ... ifpps awesome
18 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence RHEL7 Networks 2/3
● SO_REUSEPORT socket option − Multiple sockets listen on same port, TCP & UDP
● Bufferbloat Avoidance – non-LAN-latency situations − TCP Small Queues (tcp_limit_output_bytes) − CoDel and FW CoDel Packet Schedulers
● TCP Proportional Rate Reduction (PRR) − Improves reaction time of window scaling, 3-10% range
● TCP connection repair − hostTo support LXC, stop TCP connection and restart on another
19 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence RHEL7 Networks 3/3
● Performance Co-Pilot Support − pmatop awesome, also pmcollectl ● Per-cgroup TCP Buffer Limits − Memory pressure controls for TCP ● Stacked VLANs 802.1ad QinQ Support − Frame header includes > 1 VLAN tag ● PTP full support in 6.5 and 7.0 − Requires NIC driver enablement ● Chrony offered instead of ntpd (ntpd still included)
20 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence New Networking Features in RHEL7
● Linux Containers (LXC) Network Namespaces −Per-namespace sysctl tunables ● TCP Fast Open socket option −Combines first 2 steps of handshake ● TCP Tail Loss Probe −Reduce impact of lost packets (RTO ~ 15%)
21 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence RHEL “tuned” package
# yum install tune* # tuned-adm profile latency-performance # tuned-adm list Available profiles: - latency-performance - default - enterprise-storage - virtual-guest - throughput-performance - virtual-host Current active profile: latency-performance # tuned-adm profile default (to disable)
22 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence “tuned” Profile Summary
Tunable default enterprise- virtual-hostvirtual- latency- throughput- storage guest performance performance kernel.sched_min_ 4ms 10ms 10ms 10ms 10ms granularity_ns kernel.sched_wakeup_ 4ms 15ms 15ms 15ms 15ms granularity_ns vm.dirty_ratio 20% RAM 40% 10% 40% 40% vm.dirty_background_ra10% RAM 5% tio vm.swappiness 60 10 30 I/O Scheduler (Elevator)CFQ deadline deadline deadline deadline deadline
Filesystem Barriers On Off Off Off CPU Governor ondemand performance performance performance Disk Read-ahead 4x
Disable THP Yes
CPU C-States Locked @ 1
23 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Impact of Power Management on Latency and High Context-Switching Workloads (storage/network) ) s d n o 250 c C6 C3 C1 C0 e s 200 o Current status r c i 150 Network off +/-3% M (
y Storage +/-5% c 100 n Future Plans e t a 50 Impact on Customers L 0
60000
50000 s
/ 40000 s n a r T
30000 R R f r 20000 e p t e n 10000
0 R7 UDP baseline R7 TCP baseline R7 UDP lat-perf R7 TCP lat-perf 24 R6 UDP baseline R6 TCPRED baseline HAT CONFIDENTIAL | DOUGLASR6 UDP SHAKSHOBER lat-perf R6 TCP lat-perf #rhconvergence Adaptive Tickless (DynTicks) Patchset
● Goal of this patchset is to stop interrupting userspace when
● nr_running=1 (see /proc/sched_debug)
● Idea being that if runqueue depth is 1, then the scheduler
● should have nothing to do on that core
● Move all timekeeping to non-latency-sensitive cores
● Mark certain cores as full_nohz cores
● In addition to cmdline options full_nohz and rcu_nocbs − Also need to move RCU threads yourself (pgrep, taskset, tuna)
25 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Precision Time Protocol (IEEE-1588v2)
● Tech Preview in RHEL 6.4, Full Support in 6.5 −Limited driver enablement in 6.4 −6.5: bnx2x, tg3, e1000e, igb, ixgbe, and sfc ● Improved synchronization accuracy over NTP −PTP Hardware timestamping most accurate • Query your NICs PTP capabilities: ethtool -T p1p1 ● Improve time sync by disabling tickless kernel −nohz=off −Increased power consumption
26 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Precision Time Protocol (IEEE-1588v2) nohz=on
nohz=off
27 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Adaptive Tickless (DynTicks) Patchset
● Reading: −https://www.kernel.org/doc/Documentation/timers/NO_HZ.t xt −http://lwn.net/Articles/549580/ −http://www.youtube.com/watch?v=G3jHP9kNjwc
28 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Timeline of a tick...tick...tick...RHEL5 Time
jiffies jiffies+1 jiffies+2 jiffies+3 jiffies+4
Userspace Task Timer Interrupt
29 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Timeline of a tick...tick...tick...RHEL6 and 7 CONFIG_NO_HZ Time
jiffies jiffies+1 jiffies+2 jiffies+3 jiffies+4
Userspace Task Timer Interrupt Idle
30 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Timeline of a tick...tick...tick...RHEL7 CONFIG_NO_HZ_FULL Time
jiffies jiffies+1 jiffies+2 jiffies+3 jiffies+4
Tickless doesn't require idle Userspace Task Timer Interrupt
31 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Examining the tick 1/3
# egrep 'CPU|LOC' /proc/interrupts # perf list|grep local_timer irq_vectors:local_timer_entry [Tracepoint event] irq_vectors:local_timer_exit [Tracepoint event]
32 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Examining the tick 2/3
# perf stat -C 1 -e irq_vectors:local_timer_entry sleep 1 9 irq_vectors:local_timer_entry
# perf stat -C 1 -e irq_vectors:local_timer_entry taskset -c 1 /root/pig -s 1 1,002 irq_vectors:local_timer_entry
Reboot with full_nohz=1 rcu_nocbs=1 # tuna -c 1 -i ; tuna -q \* -c 1 -i
# perf stat -C 1 -e irq_vectors:local_timer_entry taskset -c 1 /root/pig -s 1 5 irq_vectors:local_timer_entry
33 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Examining the tick 3/3 (debugfs)
# mount -t debugfs nodev /sys/kernel/debug # cd /sys/kernel/debug/tracing # echo 1 > events/irq_vectors/enable # cat trace # tracer: nop # # entries-in-buffer/entries-written: 432/432 #P:8 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | |
34 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence NUMA Topology and PCI Bus
● Servers may have more than 1 PCI bus.
● Install adapters “close” to the CPU that will run the performance critical application.
● When BIOS reports locality, irqbalance handles NUMA/IRQ affinity automatically.
42:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
# cat /sys/devices/pci0000\:40/0000\:40\:03.0/0000\:42\:00.0/local_cpulist
1,3,5,7,9,11,13,15
# dmesg | grep "NUMA node" pci_bus 0000:00: on NUMA node 0 (pxm 1) pci_bus 0000:40: on NUMA node 1 (pxm 2) pci_bus 0000:3f: on NUMA node 0 (pxm 1) pci_bus 0000:7f: on NUMA node 1 (pxm 2)
35 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Performance Projects / Tooling
● RHEL6.5 “numad” “tuna”, and “tuned”
● Tuna used to bind IRQ's / real-time like isolation
● Profiling challenges −Data address profiling (cache-2-cache detection), providing: • the hottest contended cachelines • the process names, addresses, pids, tids causing that contention • the cpus they ran on, • and how the cacheline is being accessed (read or write)
36 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence “tuned” Profile Summary
Tunable default enterprise- virtual-hostvirtual- latency- throughput- storage guest performance performance kernel.sched_min_ 4ms 10ms 10ms 10ms 10ms granularity_ns kernel.sched_wakeup_ 4ms 15ms 15ms 15ms 15ms granularity_ns vm.dirty_ratio 20% RAM 40% 10% 40% 40% vm.dirty_background_ra10% RAM 5% tio vm.swappiness 60 10 30 I/O Scheduler (Elevator)CFQ deadline deadline deadline deadline deadline
Filesystem Barriers On Off Off Off CPU Governor ondemand performance performance performance Disk Read-ahead 4x
Disable THP Yes
CPU C-States Locked @ 1
37 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Iozone Performance Effect of TUNED
RHEL6.4 File System In Cache Performance RHEL6.4 File System Out of Cache Performance
Intel Large File I/O (iozone) Intel Large File I/O (iozone)
800 not tuned tuned 4500 700
4000 600 3500
c 500 3000 e not tuned S c / e B
S tuned / M
B n
2500 i
M 400
t n u i
p t h u g p 2000 u h o g r 300 u h o T r
h 1500 T 200 1000
500 100
0 ext3 ext4 xfs gfs2 0 ext3 ext4 xfs gfs2
38 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence System Tuning Tool - tuna
• Tool for fine grained control
• Display applications / processes
• Displays CPU enumeration
• Socket (useful for NUMA tuning)
• Dynamic control of tuning
• Process affinity • Parent & threads • Scheduling policy • Device IRQ priorities, etc
39 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Tuna (RHEL6.4/ RHEL7) 1 2
3
40 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Network Tuning: IRQ affinity
● irqbalance for the common case – disable to tune
● New irqbalance automates NUMA affinity for IRQs
● Flow-Steering Technologies
● Move 'p1p1*' IRQs to Socket 1:
● Service irqbalance stop # tuna -q p1p1* -S1 -m -x # tuna -Q | grep p1p1
● Manual IRQ pinning for the last X percent/determinism
● Guide on Red Hat Customer Portal
41 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Tuna IRQ/CPU affinity context menus
CPU affinity for IRQs
CPU affinity for PIDs Scheduler Policy Scheduler Priority
42 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence RHEL6.5 and RHEL7 Virt Performance RHEL 6.5 ● Virtio dataplane, 4TB mem limit RHEL 7 ● NUMA balance code
● KVM pvticketed_spinlocks, ACPIv Large Guest Perf ● NUMA in a guest, ACPIv, New 4TB mem limit
RHEV 3.3 (Based on RHEL 6.5)
● New memory overcommit manager – MOM
● Network QOS, Native Gluster (libgfapi)
43 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence RHEL7 w/ ticketed spinlocks 3.10.0-12.el7 pvticketlocks.x86_64 – note R6 unfair-locks
Linpack NxN 20000x20016
Westmere 12core, 64 GB mem, pvticketlocks
140 42
120 41.5
100 41 Bare-metal noticketed 80 pvticketed s
p 40.5
o %diff l f g 60
40 40
39.5 20
0 39 1 2 4 8 12
44 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence RH 3.10 OLTP Performance
R7 / F17 OLTP w/ spinlock backoff
(perf74, 4-socket, 512 GB, 2 FC clarion
1000000 1.1
900000 1.08 800000 1.06 700000 80U 1.04 600000 100U delta M 500000 1.02 P T 400000 1 300000 0.98 200000 0.96 100000
0 0.94 RHEL63 – all nodes 3.6.0-0.24.autonuma28fast.test.x86_64 3.6.10-2.tlw16upstream.fc17.x86_64
45 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence RH/IBM Top virtualized benchmarks
● SPECvirt2010/2012
● IBM SAP SD 2-tier bare metal / virtualized results − IBM System x3850 X5, 4 socket 40 core 80 thread system − Bare metal 12,560 SD users, KVM (80 CPU guest) 10,700 − 85% of bare metal ● IBM TPC-C – World Record w/ DB2
46 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Virtualization Benchmarks
SPECvirt_sc2013 − Increased workload injection rates − Multi vcpu guests • All one vcpu guests in SPECvirt_sc2010 − Up to four tiles using the same database VM TPC-VMS − Three independent TPC-C, TPC-H, TPC-E, or TPC-DS benchmarks • running simultaneously − Metric is lowest of the three scores − Large vcpu count guests − Large disk IO requirements
47 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence SPECvirt2010: RHEL 6 KVM Post Industry Leading Results
> 1 SPECvirt Tile/core > 1 SPECvirt Tile/core
Key Enablers:
SR-IOV
Huge Pages
Blue = Disk I/O Green = Network I/O Virtualization Layer and Hardware NUMA System Under Test (SUT) Client Hardware Node Binding
http://www.spec.org/virt_sc2010/results/
48 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Best SPECvirt_sc2010 Scores by CPU Cores
(As of May 30, 2013)
8-socket 64/80 10,000 8,956 9,000 8,000 e r 7,000 o 4-socket 40 c 6,000 5,467 s 4,682 0 5,000
1 3,824
0 4,000
2 2,742 c 3,000 2,442 s 2-socket 16 1,8782-socket2,144 20 _ 2,000 1,570 t 1,221 1,367 r i 1,000 2-socket 12 v
C 0
E VMware RHEL 6 VMware RHEV 3.1 VMware RHEL 6 VMware VMware RHEL 6 RHEL 6 RHEL 6 P ESX 4.1 (KVM) IBM ESXi 5.0 HP DL380p ESXi 4.1 (KVM) IBM ESXi 4.1 ESXi 4.1 (KVM) HP (KVM) IBM (KVM) HP S HP DL380 HS22V (12 HP DL385 gen8 (16 HP BL620c HX5 w/ HP DL380 IBM x3850 DL580 G7 x3850 X5 DL980 G7 G7 (12 Cores, 84 G7 (16 Cores,150 G7 (20 MAX5 (20 G7 (12 X5 (40 (40 Cores, (64 (80 Cores, Cores, 78 VMs) Cores, 102 VMs) Cores, 120 Cores, 132 Cores, 168 Cores, 234 288 VMs) Cores,336 552 VMs) VMs) VMs) VMs) VMs) Vms) VMs) VMs)
Comparison based on best performing Red Hat and VMware solutions by cpu core count publishedSystem at www.spec.org as of May 17, 2013. SPEC® and the benchmark name SPECvir_sct® are registered trademarks of the Standard Performance Evaluation Corporation. For more information about SPECvirt_sc2010, see www.spec.org/virt_sc2010/.
49 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence KVM / RHS Tuning
● gluster volume set
● XFS mkfs -n size=8192, mount inode64, noatime
● RHS server: tuned-adm profile rhs-virtualization
● Increase in readahead, lower dirty ratio's ● KVM host: tuned-adm profile virtual-host
● Better response time shrink guest block device queue
● /sys/block/vda/queue/nr_request (16 or 8) ● Best sequential read throughput, raise VM read-ahead
● /sys/block/vda/queue/read_ahead_kb (4096/8192)
50 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Iozone Performance Comparison RHS2.1/XFS w RHEV
Out-of-the-box tuned rhs-virtualization
7000
6000
5000
4000
3000
2000
1000
0 rnd-write rnd-read seq-write seq-read
51 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence RHEL6 Performance Tuning Summary
● Use “Tuned”, “NumaD” and “Tuna” in RHEL6.x ● Tuned selects the deadline IO elevator
● Power savings mode (performance), locked (latency)
● Transparent Hugepages for annon memory (monitor it)
● Multi-instance consider NUMAD
● Virtualization – virtio drivers, consider SR-IOV
● Manually Tune ● NUMA – via numactl, monitor numastat -c pid
● Huge Pages – static hugepages for pinned shared-memory
● Managing VM, dirty ratio and swappiness tuning
● Use cgroups for further access control
● Perf and Tuna examples in appendix
52 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Helpful Links
● Red Hat Low Latency Performance Tuning Guide
● Optimizing RHEL Performance by Tuning IRQ Affinity
● Red Hat Performance Tuning Guide
● Red Hat Virtualization Tuning Guide
● STAC Network I/O SIG
● Finteligent Low Latency Tuning w/KVM
53 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence Questions
54 RED HAT CONFIDENTIAL | DOUGLAS SHAKSHOBER #rhconvergence