Running Critical Application Workloads on Microsoft Azure Virtual Machines

Silvano Coriani, Principal Program Manager, Microsoft AzureCAT Paolo Salvatori, Principal Program Manager, Microsoft AzureCAT December 10, 2014

Executive summary Many organizations want to port an application designed to run in a traditional data center to Microsoft Azure Virtual Machines (VMs). This scenario is one of the most popular we encounter on the Azure Customer Advisory Team (AzureCAT). Yet some critical application workloads may experience performance issues when moved as-is from custom, high-performance hardware configurations to general-purpose environments where energy and cost management play an important role in the overall economic model. Fortunately, a new generation of hardware components is gradually being introducedi into our public cloud offerings. Called the D-Series, it offers several key performance advantages over the earlier A-Series VMs. This article describes critical performance improvements we developed while working with four organizations on their projects. We highlight the performance differences between Azure A-Series and the latest D-Series VMs and offer suggestions for improving application performance:  Case 1: How persistent disk latency can directly impact application response times.  Case 2: How limited throughput from persistent disks can impact application performance when SQL Server tempdb use is significant.  Case 3: How SSD-based storage in the application tier can speed processing.  Case 4: How to reduce compile and startup time for a large ASP.NET web application by moving %temp% folder on temporary drive in a D-Series VM.

Running Critical Application Workloads on Microsoft Azure Virtual Machines

Contents Overview of D-Series benefits for critical workloads ...... 3 Comparing IO performance ...... 3 Case studies: Critical application workloads on Azure VMs ...... 5 Case 1: How persistent disk latency can directly impact application response times ...... 5 Using BPE on SSD drives with D-Series to speed up access to data pages in a larger pool ...... 5 Improving performance using BPEs ...... 9 Case 1 summary ...... 11 Case 2: How limited throughput from persistent disks can impact application performance when SQL Server tempdb use is significant ...... 11 Local storage with D-Series VMs ...... 13 Case 2 summary ...... 15 Case 3: How SSD-based storage in the application tier can speed temporary file processing...... 15 Case 4: How to reduce compile and startup time for a large ASP.NET web application by moving the %temp% folder on a temporary drive in a D-Series VM ...... 16 ASP.NET precompilation and website performance ...... 17 Performance after moving the temporary ...... 18 Performance for dynamic compilation ...... 19 Summary ...... 21

Information in this document, including URL and other Internet website references, is subject to change without notice. Unless otherwise noted, the example companies, organizations, products, domain names, email addresses, logos, people, places, and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, email address, logo, person, place, or event is intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the . Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.

Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.

© 2014 Microsoft Corporation. All rights reserved.

Microsoft, MS-DOS, Windows, Windows NT, Windows Server, and the other product names listed on the trademarks page of the Microsoft website are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. All other trademarks are property of their respective owners.

ii Running Critical Application Workloads on Microsoft Azure Virtual Machines

Overview of D-Series benefits for critical workloads

Microsoft recently announced support for a new series of VM sizes for Azure VMs and Web/Worker Roles called the D-Series. Thanks to the hardware characteristics of the underlying host systems, D-Series VMs help increase performance when running critical applications. In particular, applications can take advantage of the following two key features, which do not require specific application changes:  Local storage (temporary) based on solid-state drives (SSDs)  Higher number of attached data disks (up to 32 for D14 VMs) The D-Series provides great performance for workloads that require low latency and high throughput access to storage. With the earlier VM series, local storage was based on traditional (HDD) technologies and subject to highly variable performance. The best practice was to store performance-sensitive assets on attached data disks. The D-Series changes that.

Comparing IO performance To give you a quick example of the difference in executing the same IO operation on a SQL Server workload, compare Figure 1 to Figure 2. Placing the tempdb file on local storage on a D13 VM gave approximately 4.5 times the throughput at a fraction of previous latency compared to an attached data disk on an A7 VM.

Figure 1 64k IO block performance on Azure data disk (E:)

Figure 2 64k IO block write performance on local SSD drive (D:)

D-Series offers clear improvements for the data tier. For example, you can move Microsoft SQL Server tempdb or Buffer Pool Extensions (BPE) files to the D: drive for workloads that require a better performing IO subsystem. Other examples include data technologies such as MongoDB or Cassandra that can take advantage of nonpersistent (although replicated) high- performance storage.

Another D-Series advantage is the ability to attach up to 32 Azure data disks to a single instance to reach a maximum of 32 terabytes (TB) of disk space. In addition, those disks can now be striped into a single volume, achieving better throughput and bandwidth for the storage subsystem. The earlier A-Series VMs had a limit of 16 attached disks. For details about performance characteristics, see Performance Guidance for SQL Server in Azure Virtual Machines.ii

In addition to the performance tests we ran on the earlier A-Series VMs, we conducted the same set of tests based on the SQLIO Disk Subsystem Benchmark Tooliii using the new D-Series.

3 Running Critical Application Workloads on Microsoft Azure Virtual Machines

The results were interesting, as the following table shows.

Random I/O Sequential I/O (8 KB pages) (64 KB extents)

Reads Writes Reads Writes IOPS 10,500 15,000 2,500 7,700

Bandwidth 82 MB/second 117 MB/second 160 MB/second 480 MB/second

NOTE Because Azure Infrastructure Services is a multi-tenant environment, performance results may vary. These results are an indication of what can be achieved but are not a guarantee. We suggest you repeat these tests and measurements based on your specific workload.

It’s important to remember that single data disks are limited to 500 input/output operations per second (IOPS) or 60 MB/second, yet D14 VMs with 32 attached disks can provide up to 85 percent more write IOPS and bandwidth compared to an A7 VM with 16 attached disks. The advantage is less for small IO blocks (8 KB pages—typical for SQL Server data files with an OLTP workload), but even so the D-Series offers approximately 30 percent more IOPS and bandwidth in MB/second.

Most application-tier frameworks also benefit when high-performance local storage is available. For example, ASP.NET can generate many local files during dynamic compilation steps,iv and accessing faster local storage for these steps can improve page response time for users. Other scenarios include temporary files that applications such as .NET or Java create to cache or process information on the data tier. High-performance computing (HPC) solutions are taking these scenarios to the extreme, and local SSD storage is critical.

Moreover, these improvements don’t affect costs much as Figure 3 shows. Costs are comparable to previous-generation VMs for similar configurations.

Figure 3 Comparing costs of A-Series (top) and D-Series (bottom) VMs

4 Running Critical Application Workloads on Microsoft Azure Virtual Machines

Case studies: Critical application workloads on Azure VMs

The following case studies describe performance-critical aspects that can affect application experience and are based on real-world projects from organizations that have engaged with AzureCAT.

Case 1: How persistent disk latency can directly impact application response times

Organization 1’s web application is based on several ASP.NET front-end nodes and a SQL Server database. When they migrated this application from a hosting environment with a dedicated, high-performing hardware configuration to Azure VMs, they noticed a significant increase in page response times—from approximately 150 mS–350 mS—for a portion of the overall pages managed by the application. SQL Server is hosted in an A7 VM (8 cores, 56 GB RAM) with 16 attached data disks, striped in a single volume that uses Windows Server Storage Spaces. Their main database is around 250 GB.

During a previous proof-of-concept test based on a subset of the data, performance was on par with the on-premises environment. Moreover, the development team applied all the migration best practices. So the team decided to investigate the source of the increased response time.

They profiled the most-accessed ASP.NET pages and realized that most of the time spent on those high response-time executions was related to database calls. The application is chatty— each page invokes between 10 and 20 different stored procedures (SPs) before returning to a user. When the SP’s input parameters point to data pages already loaded in the buffer pools, each stored procedure takes around 1 mS to execute. However, when data pages are not in cache, execution times jump to 25–30 mS, increasing overall page response time to approximately 350 mS.

The team knew how to improve execution for a single SP using indexing and T-SQL query structure, but they discovered that the time SQL Server needs to load a data page from disk is the time required for the IO subsystem to execute.

Using BPE on SSD drives with D-Series to speed up access to data pages in a larger pool For Organization 1, a large portion of the database—approximately 250 GB—is actively touched by user queries, so they can’t maintain the entire database in memory. Even in today’s largest VMs (A9 and D14), the maximum amount of available RAM is 112 GB. Figure 4 shows a sample TPCC databasev where a single table is larger than maximum VM memory size.

5 Running Critical Application Workloads on Microsoft Azure Virtual Machines

Figure 4 Database with large table sizes

However, D-Series VMs make it possible to use local, temporary SSD-based storage by enabling SQL Server 2014 BPEs and extending available buffer pools with direct attached, low- latency storage to cache a bigger chunk of the active portion of the database. Here is the command to use:

When enabled, SQL Server engine starts loading clean data pages into BPE as it access them. Figure 5 shows BPE reading data pages from data files (on E: volume, on 16 attached disks) and populating the SSD.BPE file hosted on D: volume (temporary disk).

Figure 5 BPE reads data pages from files on E: to populate SSD.BPE

6 Running Critical Application Workloads on Microsoft Azure Virtual Machines

BPE activities can be monitored through a set of performance counters that show current use, reads and writes, throughput, plus other details, as Figure 6 shows.

Figure 6 Monitoring BPE performance counters

A number of Dynamic Management Views (DMV) queries are also available to determine the total amount of data available in buffer pools and specifically those loaded into BPEs, as figures 7 and 8 show.

7 Running Critical Application Workloads on Microsoft Azure Virtual Machines

Figure 7 Total cached pages by database

Figure 8 Amount of data pages cached in BPE

8 Running Critical Application Workloads on Microsoft Azure Virtual Machines

Improving performance using BPEs To demonstrate the performance improvement that can be achieved through BPE, we can select a specific page that is loaded already and compare query execution times with a page loaded from disk. Figure 9 shows a randomly selected database file (database_ID = 7 in our TPCC sample database).

Figure 9 Sample file from the TPCC database

9 Running Critical Application Workloads on Microsoft Azure Virtual Machines

Using the DBCC PAGE statement, we can identify a specific record loaded in that page.

Figure 10 DBCC PAGE statement for the sample file

If we query the record after it loads in BPE, we see very good response times, as Figure 11 shows, because the SQL Server engine does not need to access the Azure attached data disks.

Figure 11 Response times when page is loaded in BPE

For comparison, executing the same query when the page is not yet loaded in BPE takes longer

10 Running Critical Application Workloads on Microsoft Azure Virtual Machines by orders of magnitude, as Figure 12 shows.

Figure 12 Response times when page is not loaded in BPE

Case 1 summary In this scenario, Organization 1 took advantage of BPE on local SSDs to increase the amount of data loaded in buffer pools, an approach that significantly reduced overall application response time by eliminating the persistent disk’s higher latency.

NOTE One application’s solution may not work for all, so it’s important to must understand the data access patterns of an application before trying this approach and to verify that the proposed changes will provide a benefit. When using caching strategies, it’s important to make sure that the data you need will not only fit in cache but also get loaded there before it’s used. Also, make sure that the cache doesn’t interfere with data integrity requirements.

Case 2: How limited throughput from persistent disks can impact application performance when SQL Server tempdb use is significant In the data access pattern for Organization 2’s application, most of the queries generated by the application use a similar approach:  Load a large but variable amount of data into a temporary table based on a number of parameters.  Execute aggregations and calculations based on these temporary tables.  Return results to users. After the organization moved the application to Azure VMs, performance decreased significantly compared to their earlier system, despite their efforts to apply best practices and optimizations. Their previous on-premises environment included a specialized, tiered IO subsystem based on SSDs and traditional spindles. When the team analyzed the application code and query syntax, they saw that tempdb was heavily used by this particular data access pattern. AzureCAT tested this pattern using the same TPCC sample database as in Case 1 and noticed a similar behavior during query execution. Data was loaded into tempdb files hosted on the same volume of main database files. The tempdb files experienced the same high disk response times—caused by the remote networked storage (attached disks), as Figure 13 shows.

11 Running Critical Application Workloads on Microsoft Azure Virtual Machines

Figure 13 Response times for TPCC data pattern that relies heavily on tempdb

Consequently, the response time for procedures increased dramatically as Figure 14 shows. The delays occurred mostly when reading data from data files and writing it into tempdb.

12 Running Critical Application Workloads on Microsoft Azure Virtual Machines

Figure 14 An increase in response times for procedures

Local storage with D-Series VMs In Organization 2’s scenario, D-Series VMs offer the ability to move tempdb to a low-latency, high-throughput local storage that can speed up access to this shared and highly contended resource (see Figure 15). Data file access still depends on the current performance of the persistent disks; however, in our test, tempdb experienced five times greater write performance and a significant reduction in latency—from 100 s to approximately 6 mS. As a result, overall query execution times sped up dramatically, as Figure 16 shows.

13 Running Critical Application Workloads on Microsoft Azure Virtual Machines

Figure 15 The same test when run using D-Series VMs

14 Running Critical Application Workloads on Microsoft Azure Virtual Machines

Figure 16 Write performance improves and latency is reduced with D-Series VMs

Case 2 summary In Organization 2’s scenario, we found that query response time could be reduced by around 40 percent with no application changes by moving tempdb data and log files to SSD-based, temporary disks on D-Series Azure VMs. Although tempdb files and BPE compete for the same shared resources—local storage—the two options can be combined to benefit mixed workloads that include both pure OLTP and some large scan queries.

Case 3: How SSD-based storage in the application tier can speed temporary file processing Like many other similar application scenarios, Organization 3 uses an ASP.NET front-end to receive batches of file uploads from end users’ devices, and then executes a number of processing activities before submitting the information to the solution’s back-end tier. Files are temporarily stored in the local until processed, and then deleted. A batch typically contains between 20 and 50 files, which are uploaded and processed. At first Organization 3 migrated their web front-end layer to A5 VMs (2 cores, 14 GB RAM, approximately $246 per month at the time of writingvi), using a temporary (D:\) drive to store and process files locally. Compared to their previous on-premises environment, however, they experienced a significant reduction in overall performance. Although file posts are processed asynchronously, the organization saw an increase of approximately 30 to 35 percent in processing time for bigger batches. They decided to test new D11 (2 cores, 14 GB RAM, approximately $300 per month at the time of writingvii) D-Series VMs. Without any change in their application configuration, they took advantage of the faster processors and new SSD-based, temporary drive to store and process incoming files. Results have been positive, and processing time for a single incoming file of approximately

15 Running Critical Application Workloads on Microsoft Azure Virtual Machines

64 KB decreased from 960 mS on average to around 650 mS—a 32-percent reduction in file processing time. As in Case 2, a large part of this increase came from reduced latency when writing files to disks, reading content, and applying computation. Maximum latency decreased to approximately 1 mS or less for D-Series. By contrast, when using the temporary disk in A-Series VMs, latency was around 4–5 mS.

Figure 17 Comparing file storage and processing on new D-Series (top) with A-Series (bottom)

Case 4: How to reduce compile and startup time for a large ASP.NET web application by moving the %temp% folder on a temporary drive in a D- Series VM For their ASP.NET website running on A-Series VMs, Organization 4 explored various compilation options to balance their need for flexibility with the ability to modify site content quickly while delivering great performance to their customers who first access a particular section of their website. However, the load time for customers first accessing their website was still slower than they wanted. The issue turned out to be an easily changed default setting.

16 Running Critical Application Workloads on Microsoft Azure Virtual Machines

ASP.NET precompilation and website performance With ASP.NET dynamic compilation, developers can modify source code without having to explicitly compile the code before deploying a web application. If a source file is modified, ASP.NET automatically recompiles the file and updates all linked resources. ASP.NET also provides precompilation options so that developers can compile a website before it has been deployed, or to compile it after it has been deployed but before a user requests it. Precompilation has several advantages—for example, it can improve the performance of a website on first request because there is no lag time while ASP.NET compiles the site. When a web application is compiled, the compiled code is placed in the Temporary ASP.NET Files folder by default, a subdirectory of the location where the .NET framework is installed. Typically, the location is: %SystemRoot%\Microsoft.NET\Framework\versionNumber\Temporary ASP.NET Files This default location introduced an issue for Organization 4, because %SystemRoot% folder is created on the persisted (OS) disk (C:) by default, causing compilation operations to be affected by the disk’s level of performance. These compilation operations are write-intensive, so we suggested that the organization move their ASP.NET temporary folder to the temporary drive (D:) to achieve better performance. ASP.NET creates a discrete subfolder under the Temporary ASP.NET File folder for each application. The root location can be configured using the tempDirectory attribute of the section of the configuration file. This optional attribute enables developers to specify the directory to use for temporary file storage during compilation, as shown:

Regardless of the compilation approach used, moving tempDirectory to an SSD-based D: drive provides a great advantage compared to an OS disk drive. Figure 18 shows the different disk response times.

17 Running Critical Application Workloads on Microsoft Azure Virtual Machines

Figure 18 Comparing the disk response times for the OS disk (top) and temporary disk (bottom)

Performance after moving the temporary directory In this precompiled website scenario, Organization 4 reduced the compilation time of a large ASP.NET solution with thousands of items by about 30 percent—from 33 minutes (Figure 19) down to 21 minutes (Figure 20). We noticed that the performance gain is proportional to the average size of the ASP.NET artifacts (source files, intermediate files, .NET assemblies, and other items) that the compilation process handles.

Figure 19 Precompilation time with tempDirectory on OS disk

18 Running Critical Application Workloads on Microsoft Azure Virtual Machines

Figure 20 Precompilation time with tempDirectory on the temporary drive

Performance for dynamic compilation Looking at dynamic compilation, we see similar improvements. When a user accesses the application main page for the first time (without precompilation, as discussed earlier), ASP.NET must first parse and compile the code of the web application into one or more assemblies. When compiled, the code is translated into a language-independent and CPU-independent representation called Microsoft Intermediate Language (MSIL). This compile operation directly impacts first-page delivery time. Figure 21 shows the difference between running tempDirectory on the default OS disk location versus temporary disk. As with precompiled scenarios, delivery time is reduced by about 30 percent.

Figure 21 Comparing display speed when tempDirectory runs on the default OS disk location (top) versus temporary disk (bottom)

19 Running Critical Application Workloads on Microsoft Azure Virtual Machines

And the reason is quite clear when we compare the disk access for ASP.NET temporary folders, as Figure 22 shows.

Figure 22 Comparing disk access paths on default OS disk location (top) to temporary disk (bottom)

More importantly, when the organization ran the same set of ASP.NET pages on comparable A-Series (A7) and D-Series (D13) test VM instances, they experienced much better average response times overall—an improvement of up to approximately 40 percent, as Figures 23 and 24 show.

Figure 23 Average page response time on A-7 test instance

Figure 24 Average page response time on D-13 test instance

20 Running Critical Application Workloads on Microsoft Azure Virtual Machines

Summary

New D-Series VMs in Azure can help run performance-critical workloads on both the data tier and application tier, offering better performance overall for storage and networking, with a price-to-performance ratio that can be favorably compared to other VM series. Certain application scenarios such as OLTP database servers benefit mainly from local SSD- based temporary storage for extending buffer pools and hosting temporary operations. Application servers benefit from faster and low-latency local storage and also from the increased CPU performance provided by this new generation of VMs.

References i “New D-Series Virtual Machine Sizes,” on the Azure Blog at azure.microsoft.com/blog/2014/09/22/new-d-series-virtual-machine-sizes/ ii See “Performance Guidance for SQL Server in Azure Virtual Machines,” on the MSDN website at http://msdn.microsoft.com/en-us/library/azure/dn248436.aspx. iii Download the SQLIO Disk Subsystem Benchmark Tool at www.microsoft.com/en- us/download/details.aspx?id=20163. iv “Understanding ASP.NET Dynamic Compilation,” on the MSDN website at msdn.microsoft.com/en-us/library/ms366723(v=vs.100).aspx v Based on a Transaction Processing Council–Type C (TPCC) benchmark database created using the HammerDb tool. For details, see www.hammerdb.com/. vi For details about A-Series pricing, see azure.microsoft.com/en-us/pricing/details/virtual- machines/. vii For details about D-Series pricing, see azure.microsoft.com/en-us/pricing/details/virtual- machines/.

21