Maximizing Performance An Introduction to Performance Tuning

Written by Mattias Sundling, Evangelist, Dell

Introduction Requirements for top VM performance VM performance is ultimately determined by the underlying System requirements physical hardware and the that serves as the To ensure top performance from your VMs, your system must foundation for your virtual infrastructure. The construction of have the following: this foundation has become simpler over the years, but there • VMware vSphere 5.0 or later—If you are running an older version, are still several areas that should be fine-tuned in order to you must upgrade. Performance and scalability have increased maximize the VM performance in your environment. While significantly since versions 3 and 4. some of the content of this writing will be generic toward any • Virtual machine hardware version 8—This hardware version hypervisor, this document focuses on VMware vSphere 5.0. introduces features to increase performance. If you are not running virtual hardware version 8, upgrade VMware Tools first, and then shut This is an introduction to performance tuning and is not down the VM’s guest OS. In the vSphere client, right-click the VM and intended to cover everything in detail. Most topics have links to select Upgrade Virtual Hardware. sites that contains deep-dive information if you wish to learn more. Warning: Once you upgrade the virtual hardware version to 8, you will lose backward compatibility to versions prior vSphere 5.0. Therefore, if you have a mixed environment, make sure to upgrade all vSphere hosts first. Virtual hardware and guest OS configuration The sections below make recommendations for configuring the various hardware components for best performance, as well as for optimizations that can be done inside the guest OS.

CPU Start with one vCPU. Windows 2008 uses the same HAL for Start with one vCPU; most applications both UP and SMP, which makes it easy works well with that. If you start with to reduce the number of CPUs. Note the multiple vCPUs and then realize following: that you have over-provisioned, it • Windows 2003 and earlier have different may be cumbersome to remove the HAL drivers for UP versus SMP. Windows unnecessary vCPUs, depending on your automatically changes the HAL driver OS. Therefore, start with one vCPU and when going from UP to SMP. It can be later you can evaluate CPU utilization very complicated to go from SMP to UP, Start with one vCPU. and application performance. If the depending on the OS and version. application response is poor, you can • If you have a VM running Windows If the application add vCPUs as needed. 2003 SP2 or later that has been reduced response is from two vCPUs to one vCPU, you will Select the correct still have the multiprocessor HAL in the poor, you can layer in the guest OS. OS. This results in slower performance Make sure you select the correct than a system with correct HAL. The add vCPUs as hardware abstraction layer (HAL) in the HAL driver can be manually updated; needed. Removing guest OS. The HAL drives the OS for however, Windows versions prior to the CPU; choices are “Uni-Processor Windows 2003 SP2 cannot be easily unnecessary (UP) single processor” or ”Symmetric corrected. I have personally experienced Multiprocessing (SMP) multiple systems with an incorrect HAL driver. They vCPUS can be processors.” consume more CPU, which can often cumbersome.

Figure 1. Foglight vOPS Enterprise from Dell looks beyond the hypervisor into the application layer.

2 peak to unnecessarily high CPU-utilization performance 10–30 percent, depending percentages when the system gets stressed. on the workload. • Make sure your multi-processor VMs have an OS and application that support multi- ESXi 5 has some enhancements around threading, and take advantage of it. If they Intel SMT to ensure high efficiency don’t, you’ll be wasting resources. and performance for mission-critical applications. The number of cores in Be aware that CPU varies each pCPU can be up to 10, which depending on the version of VMware makes CPU scheduling easier. ESX or ESXi. VMware ESX 2 used strict co-scheduling, Watch CPU % Ready; a value of 5–10 which required a two-vCPU VM to have percent indicates CPU congestion. two physical CPUs (pCPUs) available at The best indication that a VM is suffering the same time. pCPUs had a single or from CPU congestion on a vSphere dual core, leading to slow performance host is when CPU % Ready reaches when hosting too many VMs. 5–10 percent over time. In this range, further analysis might be needed. Values ESXi 3 introduced relaxed co-scheduling, higher than 10 percent definitely show which allows a two-vCPU VM to be a critical contention. This means the The best indication scheduled, even though there were not VM has to wait for the vSphere host to two pCPU available at the same time. schedule its CPU requests, due to CPU that a VM is resource conflicts with other VMs. This ESXi 4 refined the relaxed co-scheduler suffering from performance metric is one of the most even further, increasing performance important ones to monitor in order to CPU congestion and scalability. Intel SMT was introduced, understand the overall performance in a which exposes two hardware contexts virtual environment. This metric can be on a vSphere host from a single core. This can increase seen only at the hypervisor level and not is when CPU % inside the guest OS. Ready reaches 5–10 percent.

Figure 2. This example shows a VM with almost same CPU utilization across all vCPUs. That means the OS and application are multi-threaded.

3 Figure 3. CPU % Ready is an important metric for understanding VM performance.

The best practice is Virtual NUMA can improve performance. Virtual NUMA (vNUMA) exposes the The only situation I have found for using to set the memory host NUMA topology to the guest OS. the memory limit is an application limit to unlimited. If the guest OS and applications are that requires 16 GB of memory (as an NUMA-aware, they can benefit by using example) to install or start, but only 4 GB the underlying NUMA architecture in operation. In a case like that, you can more efficiently, which will improve create a memory limit at a much lower performance. This requires virtual value than the actual memory allocation. hardware version 8. The guest OS and application will see the full 16 GB memory, but the vSphere host Memory limits the physical memory to 4 GB. The memory limit setting often hurts more than it helps. In reality, the memory limit often gets When you create a VM, you allocate it set on VMs that were not intended to a certain amount of memory. There is be limited. This can happen when you a feature in the VM settings known as move VMs across different resource memory limit; this often hurts more than pools or perform a P2V of a physical it helps. This setting limits the hypervisor system. The worst case scenario, which memory allocation to a value other than I have seen in the field multiple times, what is actually assigned. This means is setting this memory limit to a value the guest OS will still see the full amount (such as 512 MB) for VM templates, since of memory allocation. However, the all VMs deployed from the templates will hypervisor will allow use of physical inherit the memory limit setting. memory only up to the memory limit amount.

Figure 4. Foglight vOPS Enterprise enables you to detect, diagnose and resolve VM problems.

4 Memory definitions Granted: Physical memory being granted to VM by ESX(i) host Active: Physical memory actively being used by VM Ballooned: Memory being used by the VMware Memory Control Driver to allow VM OS to selectively swap memory Swapped: Memory being swapped to disk

LUNs that are Figure 5. Memory utilization (active memory) in this example is very low over time, making it safe to decrease memory setting without affecting VM and application too big result in performance. too many VMs, For example, if you allocate 2 GB To determine the correct amount of SCSI reservation memory to a VM and there is a limit memory, you’ll need to monitor active- of 512 MB, the guest OS will see 2 GB memory utilization over at least 30–90 conflicts, and memory. But the vSphere host will allow days in order to see patterns. Some only 512 MB physical memory. If guest systems might be in use only during a potentially lower OS requires more than 512 MB memory, certain period of the 90-day period, but disk I/O due to the memory balloon driver will start to used very heavily during that period. inflate to let guest OS decide what pages metadata locking. are actively being used. If balloon can’t Understand VSphere’s memory reclaim any more memory, guest OS will reclamation techniques. start to swap. If the balloon can’t deflate, It’s widely considered a best practice to or if memory usage is too high on right-size memory allocation in order vSphere host, it will start to use memory to avoid placing extra load on vSphere compression, then VMkernel swapping hosts due to memory reclamation. You’ll as a last resort. Ballooning is a first want to run as many VMs as possible warning signal. Guest OS and VMkernel and will probably over-commit memory swapping will definitely hurt VM (allocate more than you have). performance, plus the vSphere host and the storage subsystem that have to serve There are several techniques that as . The best practice vSphere uses to reclaim VM memory: is to set the memory limit to unlimited. • Ballooning*—Reclaiming memory by For a more in-depth explanation, see increasing memory pressure inside the “Memory Behavior when VM Limits are VM (this requires VMware Tools). Do not Set - Revisited.” disable ballooning, since that will hurt performance. If you experience a lot Memory sizing: avoid having either too of ballooning, try to vMotion the VM to much or too little memory. another host, as it will allocate all memory When configuring the amount ofVM back to VM. Also, make sure you don’t have memory, consider the following: a fixed memory limit configured on the VM. • Too much memory will increase the VM • Swapping*—Reclaiming memory by having memory overhead. Consequently, your VM vSphere host swap out VM memory to disk. density (number of VMs per host) will not • Memory compression*—Reclaiming be as high as it could be. memory by compressing pages before they • Too little memory can result in guest OS are swapped out to disk. swapping, hurting performance. • Transparent page sharing—Reclaiming memory by removing redundant pages

5 LUNs that are too big result in too many VMs, Figure 6. Foglight Storage Monitor from Dell allows you to monitor your physical storage beyond the datastore. SCSI reservation conflicts, and with same content (in the same VM or reservation conflicts, and potentially across VMs). 10 percent of VM memory lower disk I/O due to metadata locking potentially lower allocation can be used as compression (for example, Motion, VM power on, disk I/O due to cache. snapshot). * Active only when vSphere host is metadata locking. experiencing memory contention vStorage API for Array Integration (VAAI) is a new API in vSphere 4.1 that removes For more details, see: some of the heavy lifting from the • The VMware performance study hypervisor and transfers it to the storage “Understanding Memory Resource hardware. If your hardware supports it, Management in VMware ESX 4.1” you will be able to run bigger datastores • “Memory reclamation, when and how?” without performance problems. This also helps reduce metadata locking, as Configure VMkernel swap to use a SSD mentioned previously. For more details, swap cache instead of the datastore see “vStorage for Array Integration where the VM resides. aka VAA.” In vSphere 5.0 you can configure VMkernel swap to use a SSD swap cache If you are upgrading from VMFS-3 to instead of the datastore where the VMFS-5, pay attention to block size VM resides, which is typically a much issues. slower disk type than SSD. The biggest Prior to VMFS-5 it was recommended performance increase is due to much to use an 8 MB block size when lower disk latency on SSD. This means creating the datastores, since that block that performance won’t suffer as much if size will have no negative impact on swapping will occur. performance and it can hold larger VMDK files. Having the same block size Disk is required on all datastores in order to Now, let’s move on to the most complex leverage VAAI. building block of the foundation, the disk configuration. VMFS-5, however, uses a unified block size of 1 MB and can handle VMDK files Size LUNs properly. up to 2 TB. If you are upgrading your Create the datastores with the correct datastores from VMFS-3 to VMFS-5, the size (500–1000GB). LUNs that are block size will be retained. Best practice too big result in too many VMs, SCSI is to create new VMFS-5 datastores and

6 leverage storage vMotion to move VMs vSphere 4.1 has improved PVSCSI to be from VMFS-3 to VMFS-5 datastores. It´s able to handle low disk I/O, where older very important to have a unified block vSphere versions had problems with size to be able to leverage VAAI. For queuing that could result in latency. For more details about VMFS-5, see “vSphere more details, see “PVSCSI and Low IO 5.0 Storage Features Part 1 – VMFS-5.” Workloads.”

Storage DRS simplifies the of For more information on how to VM provisioning and datastore load configure PVSCSI, see the VMware balancing. Knowledge Base article “Configuring vSphere 5.0 introduced Storage DRS to disks to use VMware Paravirtual SCSI simplify the process of VM provisioning (PVSCSI) adapters.” and datastore load balancing. It will look at datastore latency and capacity to Separate OS, swap and data disks into On average, choose the best datastore for your VMs. separate VMDK files. This improves This used to be an overlooked step when performance and data protection properly aligning the deploying new VMs and could lead to (primarily by excluding the swap data). disks can increase serious performance problems. Consider creating a separate virtual- performance by In order for Storage DRS to work you disk controller for each disk. This allows need to create a datastore cluster and higher disk I/O than a single controller. 12 percent and add your datastores to it. It works for Power off VM and change SCSI ID 1:0, decrease latency by both VMFS- and NFS-based datastores 2:0 and so on, and you’ll get additional but you can´t mix them in the same controllers. 10 percent. datastore cluster. It also allows you to create VMDK affinity or anti-affinity rules Pay attention to VMFS and guest OS that can be used to make sure VMDK alignment. files are placed on different or same If you create new VMFS volumes from datstores, depending on your choice. the vSphere client, the volumes will be aligned correctly. If you create the VMFS Storage DRS is a big step forward volumes during ESXi installation, your when it comes to capacity and disk I/O volumes will be unaligned. The only way balancing, but there are more challenges to fix this is to use Storage vMotion to that need to be adressed in order to move all VMs in the affected datastore maximize the storage subsystem. to a new datastore, and then recreate it Those are outside of the scope for this from the vSphere client. You can put the whitepaper, but it´s recommended to datastore in maintenance mode and all consult with your storage vendor. VMs will be moved to other datastores automatically. For more information about vSphere 5.0 storage, see “What’s New in VMware Windows 2008, 7 and Vista align NTFS vSphere 5.0 – Storage.” volumes by default. All prior Windows OS misalign the disks. You can align the Use paravirtualized SCSI (PVSCSI). disk only when you create it. Most PVSCSI provides higher throughput distributions have this misalignment and require less CPU on the vSphere tendency as well. host. Studies demonstrate a 12 percent increase in throughput with PVSCSI On average, properly aligning the disks implemented, plus an 18 percent can increase performance by 12 percent decrease of CPU utilization, when and decrease latency by 10 percent. compared with the LSI Logic-based For more information, see the VMware controller. For VMware benchmarking performance study “Recommendations of PVSCSI versus LSI Logic, see “PVSCSI for Aligning VMFS Partitions.” Storage Performance.”

7 vOptimizer from Dell can detect and vSphere 4.1 supports fault tolerance on resolve alignment problems on existing the VMXNET3 guest OS . disks for Windows and Linux. For more detailed information and Use storage I/O control (SIOC) if performance testing, see the VMware needed. performance study “Performance From vSphere 4.1, SIOC can be enabled Evaluation of VMXNET3 Virtual Network for iSCSI and FC on a per-datastore Device.” basis. vSphere 5.0 added support for NFS. SIOC is helpful if you fear that some Network I/O control (NetIOC) mission-critical VMs are not getting the NetIOC allows you to control the required disk I/O during times of network bandwidth used by vMotion, Unnecessary disk congestion. NFS, iSCSI, Fault Tolerance, VMs and management. This is done by devices in the virtual You can also configure disk shares per configuring share or limits, which allow VM. In the event of disk congestion, VMs you to control quality of service, making hardware and with higher disk shares (shares will be sure critical components always get the inside the guest OS used only when there’s contention) have network bandwidth they require. priority for more disk I/O. This works in will require more the same way as memory shares. vMotion vSphere 5.0 improved vMotion CPU and memory Network performance and also added support resources to Physical network for multiple network adaptors, making Make sure you have multiple redundant vMotion even faster. emulate. If you don’t physical NICs at 1 Gbit/s or 10 Gbit/s speeds connected to VM virtual-network Storage vMotion use them, be sure to switches. Storage vMotion allows you to move delete them. VMs while they are running between VMXNET3 datastores. This can be usefull if you The network driver in the guest OS are upgrading to vSphere 5.0 and don´t can be updated from the default want to upgrade your VMFS version but E1000 to VMXNET3, (paravirtualized instead you want to create new VMFS- network driver). VMXNET3 has the same 5 datastores and move your VMs to enhancements as paravirtualized storage the newly created datastores. vSphere described above and can leverage 5.0 improves migration time and also 10Gbit/s network speeds. supports open snapshots.

Caution: The IP address will reset to Delete unnecessary devices from your DHCP and a new MAC address will be virtual hardware and guest OS generated. Make sure to capture your Unnecessary devices in the virtual old settings. The following command hardware and inside the guest OS will capture your settings into ip.txt for will require more CPU and memory Windows: resources to emulate. If you don’t use them, be sure to delete them. Ipconfig /all >c:\ip.txt Cleanup inside the guest OS will not gain much performance; it’s more of a As an alternative, the option to enable housekeeping issue. Devices to clean jumbo frames on the network can up may include floppy drives, CD drives, help maximize packets that traverse USB ports, serial ports, COM ports and the environment. Set MTU to 9,000 sound. in the guest OS driver, vSwitch, and physical network ports (end to end). Your Having fewer devices means less network infrastructure must also support overhead on your VM. Clean up deleted jumbo frames hardware in the OS as well.

8 For Windows, take the following steps: 1. At the command prompt, enter the following command: Set devmgr_show_ nonpresent_devices=1 2. Start Device Manager (devmgmt.msc). 3. In Device Manager, show hidden devices. 4. Delete all devices that are no longer present.

Conclusion A well-tuned foundation enables you to make better use of your virtual infrastructure. Optimizing your building blocks—CPU, memory, disk and Unnecessary network—will improve performance and make your virtual environment more devices in the virtual efficient to manage. hardware and Acknowledgements inside the guest OS Thanks to my colleagues Tommy Patterson, Chris Walker, Paul Martin, will require more Thomas Bryant, Scott Herold and CPU and memory Scott Polly for reviewing and providing valuable feedback. resources to

A special thanks to VMware trainer emulate. If you don’t and blogger Eric Sloof at ntpro.nl for use them, be sure to additional review and finding some errors. delete them.

9 For More Information © 2013 Dell, Inc. ALL RIGHTS RESERVED. This document DELL ASSUMES NO LIABILITY WHATSOEVER AND DISCLAIMS contains proprietary information protected by copyright. No ANY EXPRESS, IMPLIED OR STATUTORY WARRANTY RELATING part of this document may be reproduced or transmitted in TO ITS PRODUCTS INCLUDING, BUT NOT LIMITED TO, THE any form or by any means, electronic or mechanical, including IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR photocopying and recording for any purpose without the A PARTICULAR PURPOSE, OR NON-INFRINGEMENT. IN NO written permission of Dell, Inc. (“Dell”). EVENT SHALL DELL BE LIABLE FOR ANY DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE, SPECIAL OR INCIDENTAL Dell, Dell Software, the Dell Software logo and products—as DAMAGES (INCLUDING, WITHOUT LIMITATION, DAMAGES identified in this document—are registered trademarks of Dell, FOR LOSS OF PROFITS, BUSINESS INTERRUPTION OR LOSS Inc. in the U.S.A. and/or other countries. All other trademarks OF INFORMATION) ARISING OUT OF THE USE OR INABILITY and registered trademarks are property of their respective TO USE THIS DOCUMENT, EVEN IF DELL HAS BEEN ADVISED owners. OF THE POSSIBILITY OF SUCH DAMAGES. Dell makes no representations or warranties with respect to the accuracy or The information in this document is provided in connection completeness of the contents of this document and reserves with Dell products. No license, express or implied, by estoppel the right to make changes to specifications and product or otherwise, to any intellectual property right is granted by descriptions at any time without notice. Dell does not make this document or in connection with the sale of Dell products. any commitment to update the information contained in this EXCEPT AS SET FORTH IN DELL’S TERMS AND CONDITIONS AS document. SPECIFIED IN THE LICENSE AGREEMENT FOR THIS PRODUCT,

About Dell Dell Inc. (NASDAQ: DELL) listens to customers and delivers worldwide innovative technology, business solutions and services they trust and value. For more information, visit www.dell.com.

If you have any questions regarding your potential use of this material, contact:

Dell Software 5 Polaris Way Aliso Viejo, CA 92656 www.dell.com Refer to our Web site for regional and international office information.

WhitePaper-MaxVirMacPerf-US-VG-02-13-2013