IBM FlashSystem A9000 and A9000R – Single-Threaded SQL Backup Best Practices

A technical report

Itay Raviv IBM Systems Performance Group

January 2018

© Copyright IBM Corporation, 2018

Table of Contents Abstract ...... 1 Scope ...... 1 Prerequisites and Target Audience ...... 1 The SQL Backup Job ...... 2 Single Thread vs. Multi-Thread Performance ...... 2 Plan for Better Performance ...... 2 Database Host – HBA Intro ...... 3 Database Host – HBA Qdepth Configuration ...... 3 Changing Qlogic HBA Qdepth in Linux ...... 3 Changing Qlogic HBA Qdepth in Windows Server ...... 4 Changing Emulex HBA Qdepth in Linux ...... 4 Changing Emulex HBA Qdepth in Windows Server ...... 5 Qdepth Effect on an SQL Database ...... 5 Database Host – SQL Server Configuration ...... 6 Best Practice Example ...... 8 SQL Backup – Intro ...... 8 SQL Backup Environment ...... 8 SQL Server Default Values Configuration ...... 9 SQL Server Modified Values Configuration ...... 10 Acknowledgement ...... 13 About the Author ...... 13 Appendix A: Online Resources ...... 14 Trademarks and Special Notices ...... 15

IBM FlashSystem A9000 and A9000R – Single Threaded SQL Backup – Best Practices

2

Abstract

This white paper demonstrates how the bandwidth rate and overall performance of a single threaded application or backup job can be enhanced when used with the IBM FlashSystem A9000 and A9000R storage systems. IBM FlashSystem A9000 and A9000R provide embedded real-time compression that significantly and efficiently decreases the consumption of physical storage capacity. Real-time compression can be applied to file systems and databases that occasionally use single-threaded processes, mainly for backup jobs. This applies to databases such as Oracle, SQL, and more. The target audience for this paper are technical leads, system administrators, and database and storage administrators planning to implement IBM FlashSystem A9000 or A9000R within database and backup environments.

Scope This technical report discusses the enhanced data reduction features of IBM FlashSystem A9000 and A9000R, as well as their configuration with SQL backup. This technical report does not discuss the configuration aspects of the IBM XIV storage system, and does not replace any official documentation provided by Microsoft for deploying SQL databases. The screen captures and graphs in this paper show the bandwidth rate and some performance data captured in the lab setup. The output might differ in other environments depending on data, workload type, and resources available.

Prerequisites and Target Audience This paper assumes that an SQL database host is available with the SQL database software and the latest patch set software is installed. For planning SQL installation, refer to the SQL documentation website at: https://docs.microsoft.com/en-us/sql/sql-server/install/planning-a-sql-server-installation. For the database software installation guide, refer to https://docs.microsoft.com/en-us/sql/database- engine/install-windows/install-sql-server.

The prerequisite technological skills for this paper’s target audience are:  Familiarity with Linux and Windows operating systems  Familiarity with SQL database installation and administration  Familiarity with storage terminology

IBM FlashSystem A9000 and A9000R – Single Threaded SQL Backup – Best Practices

1

The SQL Backup Job

Many enterprises today are using SQL databases that have SQL backup jobs in a specific timeslot for the backup job to start and finish. The challenge is to address this timeslot. In addition, in some cases, there is a need to speed up the process because the primary stored data is growing rapidly, and more data requires backup. Another case is when the backup job timeslot is narrowed. Although a simple backup job might be considered as a simple process, the rate of the backup job is influenced by many parameters and configurations in the production environment. This includes host resources and OS configurations, FC infrastructure, such as FC cables and switches, and the target storage system with its own configuration. Single Thread vs. Multi-Thread Performance

In a nutshell, a thread is the smallest subset of a process that could be independently handled by the OS. It is the smallest piece of code that could be executed independently from the rest of the code.

Single-threaded: Threads can only run sequentially, so only one piece of the code can be executed at a time.

Thread #1 Thread #2 Thread #3 Thread #4

Figure 1: Single Threaded process example

Multi-threaded: Threads can run in parallel, so multiple pieces of code can be executed simultaneously.

Thread #1

Thread #2

Thread #3

Thread #4

Figure 2: Multi-Threaded process example

Accordingly, multi-threaded processes can split up the work, allowing the computer to process several tasks asynchronously. This allows the computer to run at maximum efficiency by utilizing all the processing time available, rather than locking up when a process is waiting for a resource.

Plan for Better Performance Among its many advantages, such as ease of management, reliability and stability, IBM FlashSystem A9000 and A9000R are aimed to deliver the highest flash-storage performance in the market.

To harness the IBM FlashSystem A9000 or A9000R performance capabilities and its highly parallel architecture based on grid architecture, you should preferably plan the other environment components

IBM FlashSystem A9000 and A9000R – Single Threaded SQL Backup – Best Practices

2

to work in a parallel method starting with the host that generates the I/Os (multiple threads; input/output operations), along with multiple physical links (FC links, number of paths, and so on) and multiple volumes/LUNs on the storage side. This would increase the overall environment’s parallel work capabilities and result in a higher performance.

When it is not possible to use multi-threaded applications (for instance, with the SQL backup job process), it is still possible to configure some environment variables for maximizing the bandwidth rate from the database host side.

The next few sections discuss the default configuration and the required changes for increasing the bandwidth rate of an SQL backup job.

Database Host – HBA Intro The host HBA (Host Bus Adapter) hardware component is used to connect the host to the SAN environment through Fiber Channel connections.

In today’s market, Qlogic and Emulex HBAs are in wide use. Each HBA provides 1 to 4 FC ports (mostly 2 ports), usually in bandwidth speeds of 4 GB/s, 8 GB/s or 16 GB/s. Roughly, you can assume that each 8 GB/s port can provide approximately ~750 MB/s to ~800 MB/s. Multiplying this value by the total number of connected ports on your host, results in what is referred to as “wire speed”, or, in other words, the maximum bandwidth that the host can produce.

Database Host – HBA Qdepth Configuration The HBA Queue depth (Qdepth) setting is used to throttle the maximum amount of I/O operations that can flow simultaneously to the SAN from the HBA port.

Modifying the Qdepth setting depends on the brand and model of your HBA. Provided that the 2 common HABs are Qlogic and Emulex, the following sections provide an example of how to change the parameter in Linux and Windows.

Changing Qlogic HBA Qdepth in Linux

For SuSE and the “Max Queue Depth” parameter is set in the modprobe.conf file.

Options include: ql2xxx ql2xmaxqdepth=X

The ql2xmaxqdepth parameter defines the maximum queue depth reported to SCSI mid-level per device. The Qdepth setting specifies the number of outstanding requests per LUN. The default is 32 and max is 2048 (although 2048 is too high and could probably saturate the line).

In some particular cases it is recommended to adjust this option to a lower or higher value. The Qdepth can be adjusted by creating a dedicated file for qla2xxx in the /etc/modprobe.d/ directory with the following line:

options qla2xxx ql2xmaxqdepth=new_queue_depth

A reboot of the Linux host is required for the change to take effect.

IBM FlashSystem A9000 and A9000R – Single Threaded SQL Backup – Best Practices

3

Changing Qlogic HBA Qdepth in Windows Server Use the following procedure to change the Qdepth parameter in Windows Server:

1. Click on Start  Run, and then open the Windows Registry Editor (REGEDIT or REGEDT32) program. 2. Go to HKEY_LOCAL_MACHINE and drill down the tree to the QLogic driver as follows:

HKEY_LOCAL_MACHINE SYSTEM CurrentControlSet Services Ql2300 Parameters Device

3. In Device, double-click on:

DriverParameter:REG_SZ:qd=32

4. If the string "qd=" does not exist, append the following to end of string:

;qd=32

5. Enter a value up to 254 (0xFE). The default value is 32 (0x20).

6. Click OK.

7. Exit the Registry Editor and then reboot the Windows Server host.

Another available method in Windows Server:

Install the Qlogic SANSurfer software, and then:

1. Run the SANSurfer HBA manager utility. 2. Click on HBA port  Settings. 3. Select Advanced HBA port settings from the drop-down list. 4. Update the Execution Throttle parameter.

Changing Emulex HBA Qdepth in Linux For Emulex HBAs, the Max Queue Depth parameter is set in modprobe.conf file.

Options include: “lpfc_lun_queue_depth” and “lpfc-hba_queue depth”.

The Queue depth setting specifies the number of outstanding requests per LUN and per HBA. The default is 32 and max is 8192 (8192 is way too high and will probably saturate the line). In some particular situations it is recommended to adjust this option to a lower or higher value.

The queue depth can be adjusted by creating a dedicated file for the HBA in the /etc/modprobe.d/ directory with the following lines:

options lpfc lpfc_hba_queue_depth=32 options lpfc lpfc_lun_queue_depth=32

IBM FlashSystem A9000 and A9000R – Single Threaded SQL Backup – Best Practices

4

HBAnyware or hbacmd can also be used to change the parameters.

A reboot of the Linux host is required for the change to take effect.

Changing Emulex HBA Qdepth in Windows Server

Option 1:

1. Run the LPUTILNT utility located in: “c:\\WINNT\system32”.

2. Select Drive Parameters from the drop-down menu on the right.

3. Scroll down and double-click QueueDepth.

Note: If you are setting QueueDepth to the value greater than 150, the following Windows Registry value also needs to be increased appropriately: HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\lpxnds\Parameters\Device\NumberOfReques ts

Option 2:

1. Click on Start  Run, and then open the Windows Registry Editor (REGEDIT or REGEDT32) program.

2. Select HKEY_LOCAL_MACHINE and drill down the tree to the Emulex driver as follows: HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\elxstor\Parameters\Device

3. Double-click DriverParameter to edit the string. Example: DriverParameterREG_SZqd=0x20

4. Add “qd=” to the value field. If the string “qd=” does not exist, append it to end of the string with a semicolon: “;qd=value”

5. Enter a value up to 254 (0xFE). The default value is 32 (0x20). The provided value must be in hexadecimal format. Set this value to the queue depth value on the HBA.

6. Click OK.

7. Exit the Registry Editor, then shut down and reboot the system.

Qdepth Effect on an SQL Database Setting a too low or too high Qdepth value for the HBA will have a direct effect on performance. If the Qdepth value is too low, your SQL server instance I/O throughput might suffer, as the HBA will “hold” the I/Os on the way out.

If the Qdepth value is too high, this might impact the performance on the SAN as a whole, particularly when there are multiple servers and all of them “open the floodgates” by concurrently increasing their HBAs Qdepths past a recommended level.

As with any complex system, the correct queue depth value depends on several factors, such as the number of concurrent hosts connecting to the SAN, LUNs involved, HBA brand/throughput capabilities, number of HBAs on the host, and more. Therefore, finding a one-size-fits-all Qdepth value recommendation from the HBA vendor is difficult and depends on the specific environment.

IBM FlashSystem A9000 and A9000R – Single Threaded SQL Backup – Best Practices

5

To summarize, SQL Server DBAs should work with the company’s SAN administrator to make sure they find a healthy balance between the SQL Server instance’s overall I/O throughput and the SAN’s overall capacity.

Database Host – SQL Server Configuration The SSMS (SQL Server Management Studio) in an integrated environment which offers accessing, configuring, managing, administering, and developing all components of SQL Server database. There are a few backup parameters and options that can be changed to increase the backup parallelism and by that increase the bandwidth and backup job operation.  BUFFERCOUNT: The BUFFERCOUNT parameter specifies the total number of I/O buffers to be used for the backup operation.

The default value of the BUFFERCOUNT parameter can be calculated using the following formula: "NumberofBackupDevices  3 + (2  NumberofVolumesInvolved)". For example, running with one destination and two source drives, this parameter gets a low value of 9.

* You can also check these values for any backup command you run by checking the SQL error log after enabling trace flags 3605 and 3213.

 MAXTRANSFERSIZE: The MAXTRANSFERSIZE parameter specifies the largest unit of transfer in bytes to be used between the SQL Server and the backup media. The possible values are multiples of 64 KB, ranging up to 4194304 bytes (4 MB). The default is 1 MB.

The total space that will be used by the buffers is: BUFFERCOUNT x MAXTRANSFERSIZE. The output can be verified in the “Total buffer space:” field on the “Log File Viewer”.

 BLOCKSIZE: The BLOCKSIZE parameter specifies the physical block size. Supported sizes are: 512, 1024, 2048, 4096, 8192, 16384, 32768, and 65536 (64 KB) bytes. The default is 65536 for tape devices and 512 for other devices.  BACKUP to DISK = ‘NUL’: This optional parameter can be used when testing only read performance. You can set it to estimate and check how fast you can read the data from a database or a file group. Just add once the option to backup to: “ DISK = ‘NUL’ “

BACKUP DATABASE [X] TO DISK = 'nul' WITH NOFORMAT, NOINIT , NAME = N'Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10 ,BUFFERCOUNT = 1000 ,BLOCKSIZE = 65536 ,MAXTRANSFERSIZE=2097152 GO

Figure 3: SQL Parameters Example

 Avoid compression of backup destination: FlashSystem A9000/A9000R has a built-in data reduction mechanism (including deduplication and compression), so all data in the storage system is compressed by default. Accordingly, there is no need to use the SQL compression mechanism, and it is not recommended to do so.

Setting from the CLI: Just add “NO_COMPRESSION”

Setting from the GUI:

IBM FlashSystem A9000 and A9000R – Single Threaded SQL Backup – Best Practices

6

Figure 4: Disable SQL Compression

 Use multiple destinations: This opens parallel workers for the backup. The destination can be a directory on the same volume (not necessarily a different volume). This change is probably the most significant, as it causes the BUFFERCOUNT calculation to be higher automatically by the way it is calculated. Keep in mind that higher BUFFERCOUNT value and number of destinations will probably require a higher Qdepth to fill up and push all the data towards the SAN. Accordingly, as mentioned earlier, as each production environment is different, there is no “one-fits-all” value. Some trial and error might be needed for tweaking the parameters according to your specific environment.

IBM FlashSystem A9000 and A9000R – Single Threaded SQL Backup – Best Practices

7

Best Practice Example

SQL Backup – Intro The following example describes a typical MS SQL DB backup job when the source is located on a FlashSystem A9000R (2 grid elements) and backup is performed by a Windows Server 2008 R2 host running SQL Server 2008 to an XIV Gen3 as a destination. The database total size is 3 TB, to be backed up in a single backup job.

An infrastructure description will be followed by a comparison of the MS SQL Server default parameter values and optimized values to show the improvement of bandwidth in MB/s.

SQL Backup Environment The infrastructure connectivity is physically connected with enough FC ports, so that the SAN would not be the limiting factor in terms of maximum achievable bandwidth.

Figure 5: SQL Backup Environment Connectivity

IBM FlashSystem A9000 and A9000R – Single Threaded SQL Backup – Best Practices

8

SQL Server Default Values Configuration

Example 1: As mentioned above, the default values for the relevant parameters are: HBA Qdepth: 32 Number of Destinations: 1 MAXTRANSFERSIZE: 1024 BLOCKSIZE: 512 Backup to DISK = ‘Nul’ Compression: Disabled BUFFERCOUNT: Calculated, in this example: NumberofBackupDevices = 1 destination NumberofVolumesInvolved = 1 destination + 2 sources NumberofBackupDevices x 3 + (2 x NumberofVolumesInvolved)

BUFFERCOUNT = 1 x 3 + (2 x 3) = 9

BACKUP DATABASE [X] TO DISK = 'nul' WITH NOFORMAT, NOINIT, NO_COMPRESSION NAME = N'Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10 GO

Figure 6: Query Example – Default Values

Executing this query results in a backup job with an average bandwidth rate of ~480MB/s.

Figure 7: BW Rate Statistics from the Hyper-Scale Manager UI

IBM FlashSystem A9000 and A9000R – Single Threaded SQL Backup – Best Practices

9

SQL Server Modified Values Configuration Now we will modify the values to improve the rate and use the infrastructure resources better. Keep in mind that those might be different in different production environments, and some trial and other processes on the SAN should be taken into consideration.

Example 2: First, modify only the BUFFERCOUNT value: HBA Qdepth: 32 Number of Destinations: 1 MAXTRANSFERSIZE: 1024 BLOCKSIZE: 512 Backup to DISK = ‘Nul’ Compression: Disabled BUFFERCOUNT: Manually set to 1000*

BACKUP DATABASE [X] TO DISK = 'nul' WITH NOFORMAT, NOINIT, NO_COMPRESSION NAME = N'Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10 BUFFERCOUNT = 1000 GO

Figure 8: Query Example – Modified Values

Executing this modified query results in a backup job with an average bandwidth rate of ~740MB/s.

Figure 9: Higher BW Rate Statistics from the Hyper-Scale Manager UI

* BUFFERCOUNT=1000 was chosen as an example. A relatively lower value will do the job and require less memory allocation. We suggest starting from 100 and then going higher.

IBM FlashSystem A9000 and A9000R – Single Threaded SQL Backup – Best Practices

10

Example 3: To increase the bandwidth even further, we will now add more destinations, this will use more worker threads and increase the parallelism dramatically.

Please note that increasing the number of destinations, requires more resources, since there will be no longer a single-threaded process. Instead, parallel threads will work together (as described earlier).

The default BUFFERCOUNT value will automatically set to higher value (see calculation formula). In addition, we will also increase the Qdepth, MAXTRANSFERSIZE and the BLOCKIZE values, to ensure that the host will send more data into the SAN.

Set the values to: HBA Qdepth: 254 Number of Destinations: 32 MAXTRANSFERSIZE: 1024 BLOCKSIZE: 512 Backup to DISK = ‘Nul’ Compression: Disabled BUFFERCOUNT: Manually set to 1000*

BACKUP DATABASE [X] TO DISK = 'nul' WITH NOFORMAT, NOINIT, NO_COMPRESSION NAME = N'Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10 ,BUFFERCOUNT = 1000 ,BLOCKSIZE = 65536 ,MAXTRANSFERSIZE=2097152 GO

Figure 10: Query Example – Modified Values

Executing this modified query results in a backup job with an average bandwidth rate of ~1,280MB/s.

Figure 9: Extreme BW Rate Statistics from the Hyper-Scale Manager UI

IBM FlashSystem A9000 and A9000R – Single Threaded SQL Backup – Best Practices

11

Examples comparison:

1400 1280

1200

1000

[MB/s] 800 740

600 479

Bandwidth 400

200

0 Example 1 Example 2 Example 3

Figure 11: SQL Backup Job - Examples Comparison

As seen in the examples above, the more we parallelize the backup job with any of the given parameters, the more throughput will increase accordingly. Keep in mind that there is a logic correlation between the parameters: one can impact the other, or one can compensate for the other, etc.

DBAs should work with their system and storage admins in order to identify the throughput bottlenecks and open those by tuning the values described in this paper.

IBM FlashSystem A9000 and A9000R – Single Threaded SQL Backup – Best Practices

12

Acknowledgement

The benchmark testing is carried out by Tomer London, a database administrator (DBA) at the IBM Systems lab in Tel Aviv, Israel. Tomer has more than 10 years of experience as a DBA and system administrator.

About the Author Itay Raviv is a senior team leader in the IBM Storage Performance Group, located at the IBM Systems lab in Tel Aviv, Israel. Itay and his team are responsible for the performance of the IBM Spectrum Accelerate family of storage systems, including XIV Gen3, FlashSystem A9000 and FlashSystem A9000R, making sure that each version release provides optimal performance in terms of bandwidth, throughput, latency, and more. These performance levels must constantly meet the growing high standards in the market of high-end storage systems.

Itay has more than 10 years of experience in storage, networking, and IT infrastructure. He has been with IBM since the Storwize company acquisition in 2010, and holds IBM certifications in the administration of FlashSystem A9000, FlashSystem A9000R, and XIV Gen3. Itay holds a Bachelor of Science (BSc) in Computer Science from the College Of Management in Israel.

IBM FlashSystem A9000 and A9000R – Single Threaded SQL Backup – Best Practices

13

Appendix A: Online Resources

The following websites provide useful references to supplement the information contained in this paper:

 IBM FlashSystem A9000 and A9000R Redbooks: https://www.redbooks.ibm.com/redbooks.nsf/portals/Storage

 IBM Knowledge Center: https://www.ibm.com/support/knowledgecenter

 IBM FlashSystem A9000 and A9000R developerWorks community: https://ibm.biz/BdjURM

 SQL Documentation: https://docs.microsoft.com/en-us/sql/sql-hub-menu

 Qlogic Support Center: http://support.qlogic.com/SupportCenter/Customer_Support_main

 Emulex Support Center: https://www.broadcom.com/support/emulex

IBM FlashSystem A9000 and A9000R – Single Threaded SQL Backup – Best Practices

14

Trademarks and Special Notices

© Copyright IBM Corporation 2018. References in this document to IBM products or services do not imply that IBM intends to make them available in every country. IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. UNIX is a registered trademark of The Open Group in the United States and other countries. VMware, the VMware logo, VMware Cloud Foundation, VMware Cloud Foundation Service, VMware vCenter Server, and VMware vSphere are registered trademarks or trademarks of VMware, Inc. or its subsidiaries in the United States and/or other jurisdictions. Other company, product, or service names may be trademarks or service marks of others.

Information is provided "AS IS" without warranty of any kind. All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products. All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Contact your local IBM office or IBM authorized reseller for the full text of the specific Statement of Direction.

Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements.

IBM FlashSystem A9000 and A9000R – Single Threaded SQL Backup – Best Practices

15

The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here. Photographs shown are of engineering prototypes. Changes may be incorporated in production models. Any references in this information to non-IBM websites are provided for convenience only and do not in any manner serve as an endorsement of those websites. The materials at those websites are not part of the materials for this IBM product and use of those websites is at your own risk.

IBM FlashSystem A9000 and A9000R – Single Threaded SQL Backup – Best Practices

16