Maximizing Virtual Machine Performance an Introduction to Performance Tuning
Total Page:16
File Type:pdf, Size:1020Kb
Maximizing Virtual Machine Performance An Introduction to Performance Tuning Written by Mattias Sundling, Evangelist, Dell Introduction Requirements for top VM performance VM performance is ultimately determined by the underlying System requirements physical hardware and the hypervisor that serves as the To ensure top performance from your VMs, your system must foundation for your virtual infrastructure. The construction of have the following: this foundation has become simpler over the years, but there • VMware vSphere 5.0 or later—If you are running an older version, are still several areas that should be fine-tuned in order to you must upgrade. Performance and scalability have increased maximize the VM performance in your environment. While significantly since versions 3 and 4. some of the content of this writing will be generic toward any • Virtual machine hardware version 8—This hardware version hypervisor, this document focuses on VMware vSphere 5.0. introduces features to increase performance. If you are not running virtual hardware version 8, upgrade VMware Tools first, and then shut This is an introduction to performance tuning and is not down the VM’s guest OS. In the vSphere client, right-click the VM and intended to cover everything in detail. Most topics have links to select Upgrade Virtual Hardware. sites that contains deep-dive information if you wish to learn more. Warning: Once you upgrade the virtual hardware version to 8, you will lose backward compatibility to versions prior vSphere 5.0. Therefore, if you have a mixed environment, make sure to upgrade all vSphere hosts first. Virtual hardware and guest OS configuration The sections below make recommendations for configuring the various hardware components for best performance, as well as for optimizations that can be done inside the guest OS. CPU Start with one vCPU. Windows 2008 uses the same HAL for Start with one vCPU; most applications both UP and SMP, which makes it easy works well with that. If you start with to reduce the number of CPUs. Note the multiple vCPUs and then realize following: that you have over-provisioned, it • Windows 2003 and earlier have different may be cumbersome to remove the HAL drivers for UP versus SMP. Windows unnecessary vCPUs, depending on your automatically changes the HAL driver OS. Therefore, start with one vCPU and when going from UP to SMP. It can be later you can evaluate CPU utilization very complicated to go from SMP to UP, Start with one vCPU. and application performance. If the depending on the OS and version. application response is poor, you can • If you have a VM running Windows If the application add vCPUs as needed. 2003 SP2 or later that has been reduced response is from two vCPUs to one vCPU, you will Select the correct hardware abstraction still have the multiprocessor HAL in the poor, you can layer in the guest OS. OS. This results in slower performance Make sure you select the correct than a system with correct HAL. The add vCPUs as hardware abstraction layer (HAL) in the HAL driver can be manually updated; needed. Removing guest OS. The HAL drives the OS for however, Windows versions prior to the CPU; choices are “Uni-Processor Windows 2003 SP2 cannot be easily unnecessary (UP) single processor” or ”Symmetric corrected. I have personally experienced Multiprocessing (SMP) multiple systems with an incorrect HAL driver. They vCPUS can be processors.” consume more CPU, which can often cumbersome. Figure 1. Foglight vOPS Enterprise from Dell looks beyond the hypervisor into the application layer. 2 peak to unnecessarily high CPU-utilization performance 10–30 percent, depending percentages when the system gets stressed. on the workload. • Make sure your multi-processor VMs have an OS and application that support multi- ESXi 5 has some enhancements around threading, and take advantage of it. If they Intel SMT to ensure high efficiency don’t, you’ll be wasting resources. and performance for mission-critical applications. The number of cores in Be aware that CPU scheduling varies each pCPU can be up to 10, which depending on the version of VMware makes CPU scheduling easier. ESX or ESXi. VMware ESX 2 used strict co-scheduling, Watch CPU % Ready; a value of 5–10 which required a two-vCPU VM to have percent indicates CPU congestion. two physical CPUs (pCPUs) available at The best indication that a VM is suffering the same time. pCPUs had a single or from CPU congestion on a vSphere dual core, leading to slow performance host is when CPU % Ready reaches when hosting too many VMs. 5–10 percent over time. In this range, further analysis might be needed. Values ESXi 3 introduced relaxed co-scheduling, higher than 10 percent definitely show which allows a two-vCPU VM to be a critical contention. This means the The best indication scheduled, even though there were not VM has to wait for the vSphere host to two pCPU available at the same time. schedule its CPU requests, due to CPU that a VM is resource conflicts with other VMs. This ESXi 4 refined the relaxed co-scheduler suffering from performance metric is one of the most even further, increasing performance important ones to monitor in order to CPU congestion and scalability. Intel SMT was introduced, understand the overall performance in a which exposes two hardware contexts virtual environment. This metric can be on a vSphere host from a single core. This can increase seen only at the hypervisor level and not is when CPU % inside the guest OS. Ready reaches 5–10 percent. Figure 2. This example shows a VM with almost same CPU utilization across all vCPUs. That means the OS and application are multi-threaded. 3 Figure 3. CPU % Ready is an important metric for understanding VM performance. The best practice is Virtual NUMA can improve performance. Virtual NUMA (vNUMA) exposes the The only situation I have found for using to set the memory host NUMA topology to the guest OS. the memory limit is an application limit to unlimited. If the guest OS and applications are that requires 16 GB of memory (as an NUMA-aware, they can benefit by using example) to install or start, but only 4 GB the underlying NUMA architecture in operation. In a case like that, you can more efficiently, which will improve create a memory limit at a much lower performance. This requires virtual value than the actual memory allocation. hardware version 8. The guest OS and application will see the full 16 GB memory, but the vSphere host Memory limits the physical memory to 4 GB. The memory limit setting often hurts more than it helps. In reality, the memory limit often gets When you create a VM, you allocate it set on VMs that were not intended to a certain amount of memory. There is be limited. This can happen when you a feature in the VM settings known as move VMs across different resource memory limit; this often hurts more than pools or perform a P2V of a physical it helps. This setting limits the hypervisor system. The worst case scenario, which memory allocation to a value other than I have seen in the field multiple times, what is actually assigned. This means is setting this memory limit to a value the guest OS will still see the full amount (such as 512 MB) for VM templates, since of memory allocation. However, the all VMs deployed from the templates will hypervisor will allow use of physical inherit the memory limit setting. memory only up to the memory limit amount. Figure 4. Foglight vOPS Enterprise enables you to detect, diagnose and resolve VM problems. 4 Memory definitions Granted: Physical memory being granted to VM by ESX(i) host Active: Physical memory actively being used by VM Ballooned: Memory being used by the VMware Memory Control Driver to allow VM OS to selectively swap memory Swapped: Memory being swapped to disk LUNs that are Figure 5. Memory utilization (active memory) in this example is very low over time, making it safe to decrease memory setting without affecting VM and application too big result in performance. too many VMs, For example, if you allocate 2 GB To determine the correct amount of SCSI reservation memory to a VM and there is a limit memory, you’ll need to monitor active- of 512 MB, the guest OS will see 2 GB memory utilization over at least 30–90 conflicts, and memory. But the vSphere host will allow days in order to see patterns. Some only 512 MB physical memory. If guest systems might be in use only during a potentially lower OS requires more than 512 MB memory, certain period of the 90-day period, but disk I/O due to the memory balloon driver will start to used very heavily during that period. inflate to let guest OS decide what pages metadata locking. are actively being used. If balloon can’t Understand VSphere’s memory reclaim any more memory, guest OS will reclamation techniques. start to swap. If the balloon can’t deflate, It’s widely considered a best practice to or if memory usage is too high on right-size memory allocation in order vSphere host, it will start to use memory to avoid placing extra load on vSphere compression, then VMkernel swapping hosts due to memory reclamation. You’ll as a last resort. Ballooning is a first want to run as many VMs as possible warning signal. Guest OS and VMkernel and will probably over-commit memory swapping will definitely hurt VM (allocate more than you have). performance, plus the vSphere host and the storage subsystem that have to serve There are several techniques that as virtual memory.