Aixmiami, and FL Power7 Best Pratices for Performance & Tuning IBM
Total Page:16
File Type:pdf, Size:1020Kb
2011 IBM Power Systems Technical University OctoberTitle: Running10-14 | Fontainebleau Oracle DB Miami on Beach IBM | AIXMiami, and FL Power7 Best Pratices for Performance & Tuning IBM Session: 17.11.2011, 16:00-16:45, Raum Krakau Speaker: Frank Kraemer Frank Kraemer Francois Martin IBM Systems Architect IBM Oracle Center mailto:[email protected] mailto:[email protected] Materials may not be reproduced in whole or in part without the prior written permission of IBM. Agenda Deutsche Oracle Anwenderkonferenz DOAG 2011 CPU Power 7 Memory AIX VMM tuning IO Storage consideration AIX LVM Striping Disk/Fiber Channel driver optimization Virtual Disk/Fiber channel driver optimization AIX mount option Asynchronous IO NUMA Optimization © Copyright IBM Corporation 2011 2 IBM Power 7 (Socket/Chip/Core/Threads) Deutsche Oracle Anwenderkonferenz DOAG 2011 1 Power7 Each Power7 socket = chip Chip Each Power7 Chip = 8 x Power7 4Ghz Cores Power Hardware SMT SMT SMT SMT Each Power7 Core = 4 HW SMT threads lcpu lcpu lcpu lcpu In AIX, when SMT is enable, each SMT thread is seen like a logical CPU Software 32 sockets = 32 chips = 256 cores = 1024 SMT threads = 1024 AIX logical CPUs © Copyright IBM Corporation 2011 3 Scalability with IBM POWER Systems and Oracle Deutsche Oracle Anwenderkonferenz DOAG 2011 Oracle Certifies for the O/S & All Power servers run the same AIX = Power 795 Complete flexibility for Oracle workload Power 780 deployment Power 770 Power 750 Power 740 2S/4U Power 720 1S/4U Consistency: Binary compatibility Power 730 2S/2U Mainframe-inspired reliability Power 710 1S/2U Blade Server Support for full HW based virtualization AIX, Linux and IBM i OS 4 © Copyright IBM Corporation 2011 4 POWER7 Server RAS Features Deutsche Oracle Anwenderkonferenz DOAG 2011 Virtualization AIX v6/v7 PowerVM is core firmware LVM and JFS2 Virtual Virtual Thin bare metal Hypervisor I/O I/O OS - #1 OS - #N SMIT – reduce human errors Device driver free Hypervisor Server1 Server2 Integrated HW / OS / FW error HW enforced virtualization reporting LPAR LPAR LPAR LPAR IBM Integrated stack Hot AIX kernel patches Dynamic LPAR operations WPAR and WPAR mobility PowerVM Hypervisor Redundant VIOS support Configurable error logs Resource monitor & control Concurrent remote support General CPU Memory Network Disk Live Partition Mobility (LPM) EAL 4+ security certification General CPU Memory/Cache I/O First Failure Data Capture Dynamic CPU deallocation Redundant bit steering Integrated I/O product Hot-node add/repair Dynamic processor sparing Dynamic page deallocation development - stable drivers Concurrent firmware updates Processor instruction retry & Error checking channels & Redundant I/O drawer links Redundant clocks & service alternate processor recovery spare signal lines Independent PCIe busses processors Partition availability priority Memory protect keys (AIX) PCIe Bus Error Recovery Dynamic clock failover Processor contained Dynamic cache deallocation Hot add GX bus adapter ECC on fabric busses checkstop & cache line delete I/O drawer hot add / Light path diagnostics Uncorrectable error handling HW memory & L3 scrubbing concurrent repair Electronic service agent CPU CoD Memory CoD Hot swap disk, media, & PCIe adapters © Copyright IBM Corporation 2011 5 POWER7 SMT performance Deutsche Oracle Anwenderkonferenz DOAG 2011 • Requires POWER7 Mode – POWER6 Mode supports SMT1 and SMT2 • Operating System Support – AIX 6.1 and AIX 7.1 – IBM i 6.1 and 7.1 – Linux • Dynamic Runtime SMT scheduling – Spread work among cores to execute in appropriate threaded mode – Can dynamical shift between modes as required: SMT1 / SMT2 / SMT4 • LPAR-wide SMT controls – ST, SMT2, SMT4 modes – smtctl / mpstat commands • Mixed SMT modes supported within same LPAR – Requires use of “Resource Groups” © Copyright IBM Corporation 2011 6 POWER7 specific tuning Deutsche Oracle Anwenderkonferenz DOAG 2011 • Use SMT4 Gives a cpu boost performance to handle more concurrent threads in parallel • Disabling HW prefetching. Usually improve performance on Database Workload on big SMP Power system # dscrctl –n –b –s 1 (this will dynamically disable HW memory prefetch and keep this configuration across reboot) # dscrctl –n –b –s 0 (to reset HW prefetching to default value) • Use Terabyte Segment aliasing Improve CPU performance by reducing SLB miss (segment address resolution) # vmo –p –o esid_allocator=1 (by default on AIX7) • Use Large Pages (LP) 16MB memory pages Improve CPU performance by reducing TLB miss (Page address resolution) # vmo -r -o lgpg_regions=xxxx -o lgpg_size=16777216 (xxxx= # of segments of 16M you want) (check large page usage with smon –mwU oracle) # vmo –p –o v_pinshm = 1 Enable Oracle userid to use Large Pages # chuser capabilities=CAP_NUMA_ATTACH,CAP_BYPASS_RAC_VMM,CAP_PROPAGATE oracle export ORACLE_SGA_PGSZ=16M before starting oracle (with oracle user) © Copyright IBM Corporation 2011 7 Virtual Memory Management Deutsche Oracle Anwenderkonferenz DOAG 2011 ACTION 1 - AIX is started, applications load some 1 - System Startup computational pages into the memory. As a UNIX system, AIX will try to take 2 - DB Activity (needs more memory) advantage of the free memory by using it as a cache file to reduce the IO on the 3 - => Paging Activity physical drives. 2 - The activity is increasing, the DB needs MEMORY PAGING SPACE more memory but there is no free pages available. LRUD (AIX page stealer) is starting to free some pages into the Performance memory. ! degradation 3 - With the default setting, LRUD will page out some computational pages instead of FS CACHE removing only pages from the File System Cache. • Objective : SGA + PGA Tune the VMM to protect computational Programs pages (Programs, SGA, PGA) from being paged out and force the LRUD to Kernel SGA + PGA steal pages from FS-CACHE only. © Copyright IBM Corporation 2011 8 Memory : VMM Tuning for Filesystems (lru_file_repage) Deutsche Oracle Anwenderkonferenz DOAG 2011 This VMM tuning tip is applicable for AIX 5.2 ML4+ or AIX 5.3 ML1+ or AIX 6.1+ – lru_file_repage : • This new parametter prevents the computational pages to be paged out. • By setting lru_file_repage=0 (1 is the default value) you’re telling the VMM that your preference is to steal pages from FS Cache only. – “New” vmo tuning approach : Recommendations based on VMM developer experience. Use “vmo –p –o” to set the following parametters : lru_file_repage = 0 page_steal_method = 1 (reduce the scan:free ratio) minfree = (maximum of either 960 or 120 x # lcpus*) / # memory pools** maxfree = minfree +(j2_maxPageAhead*** x # lcpus*) / # memory pools** * to see # lcpus, use “bindprocessor –q” ** for # memory pools, use “vmstat –v” *** to see j2_maxPageReadAhead, use “ioo –L j2_maxPageReadAhead” Increase minfree and recalculate maxfree if “Free Frame Waits” increase over time (vmstat –s) minperm% = from 3 to 10 maxperm% = maxclient% = from 70 to 90 strict_maxclient = 1 © Copyright IBM Corporation 2011 9 Memory : Use jointly AIX dynamic LPAR and Oracle dynamic allocation of memory + AMM Deutsche Oracle Anwenderkonferenz DOAG 2011 Scenario : Initial configuration Final configuration Oracle tuning advisor indicates Memory_max_size that SGA+PGA need to be Memory_max_size = 18 GB increased to 11GB: = 18 GB memory_target can be increased Real memory = 15 GB dynamically to 11GB but real AIX Real memory memory is only 12GB, so it needs + free = 12 GB to be increased as well. Memory_target AIX = 11 GB + free – Step 1 - Increase physical Memory_target memory allocated to the = 8 GB LPAR to 15 GB ssh hscroot@hmc "chhwres -r mem -m <system> -o a -p <LPAR name> -q 3072" SGA SGA + – Step 2 - Increase + PGA SGA+PGA allocated to the PGA instance of the database to 11GB alter system set memory_target=11GB; Î Memory allocated to the system has been increased dynamically, using AIX DLPAR Î Memory allocated to Oracle (SGA and PGA) has been increased on the fly © Copyright IBM Corporation 2011 10 IO : Database Layout Deutsche Oracle Anwenderkonferenz DOAG 2011 Having a good Storage configuration is a key point : Because disk is the slowest part of an infrastructure Reconfiguration can be difficult and time consuming Stripe and mirror everything (S.A.M.E) approach: ¾Goal is to balance I/O activity across all disks, loops, adapters, etc... ¾Avoid/Eliminate I/O hotspots ¾Manual file-by-file data placement is time consuming, resource intensive and iterative ¾Additional advices to implement SAME : • apply the SAME strategy to data, indexes • if possible separate redologs (+archivelogs) © Copyright IBM Corporation 2011 11 IO : RAID Policy with ESS, DS6/8K Deutsche Oracle Anwenderkonferenz DOAG 2011 Storage ¾ RAID-5 vs. RAID-10 Performance Comparison HW Striping I/O Profile RAID-5 RAID-10 LUN 1 Sequential Read Excellent Excellent LUN 2 Sequential Write Excellent Good Random Read Excellent Excellent LUN 3 Random Write Fair Excellent LUN 4 ¾ With Enterprise class storage (with huge Use RAID-5 or RAID-10 to create striped cache), RAID-5 performances are comparable to LUNs RAID-10 (for most customer workloads) If possible try to minimize the number of ¾ Consider RAID-10 for workloads with a high LUNs per RAID array to avoid contention on percentage of random write activity (> 25%) and physical disk. high I/O access densities (peak > 50%) © Copyright IBM Corporation 2011 12 IO : 2nd Striping (LVM) Deutsche Oracle Anwenderkonferenz DOAG 2011 AIX Storage HW Striping Volume Group hdisk1 LUN 1 hdisk2 LUN 2 hdisk3 LUN 3 (raw or with jfs2) LVM Striping LVM LV hdisk4 LUN 4 1. Luns are striped across physical disks (stripe-size of the physical RAID : ~ 64k, 128k, 256k) 2. LUNs are seen as hdisk device on AIX server. 3. Create AIX Volume Group(s) (VG) with LUNs from multiple arrays 4. Logical Volume striped across hdisks (stripe-size : 8M, 16M, 32M, 64M) => each read/write access to the LV are well balanced accross LUNs and use the maximum number of physical disks for best performance. © Copyright IBM Corporation 2011 13 IO : LVM Spreading / Striping Deutsche Oracle Anwenderkonferenz DOAG 2011 Two different ways to stripe on the LVM layer : 1. Stripe using Logical Volume (LV) – Create Logical Volume with the striping option : mklv –S <stripe-size> ..