Monitoring and Building for SharePoint Farm Performance

Sean P. McDonough MVP Chief Technology Officer Bitstream Foundry LLC Download the SPTechCon Mobile App!

Search for SPTechCon in your App Store and download the 2018 Mobile App to stay connected throughout the entire event.

• Conference and Session Feedback • Get up-to-date show details • Reference speaker profiles • Take notes and download presentations • Connect with other attendees • Find exhibiting sponsors and much more! What We’ll Be Covering

1. Farm Environments 2. Getting a Solid Start 3. Tools and Monitoring Servers 4. Page Performance Monitoring 5. Questions & Answers 6. (LOTS of) References Farm Environments Farm Environments

Yes, I said farm, not stamp • Subtle distinction, but it means we’re likely on-premises … • No SharePoint Online / Office 365 • Unless you’re on a “farm in the cloud” • Why on-premises? • Significant surface reduction for monitoring in the cloud • It’s “someone else’s” problem (i.e., a value-add for consumers) • Administrative APIs very limited vs. on-premises • Limited tools (no perfmon, developer dashboard, etc.) • In short: we can’t get at the counters and logs we need! Farm Environments

Some on-premises assumptions I’m making for this session: • The big question: are you virtualizing? • Virtualization affords many options • Virtualization provides many ways to destroy performance • My assumption: you are virtualizing your environment • Within the datacenter and beyond it • Easier than ever to build farm interdependencies and distributed environments • Application and customization options push connections beyond the farm • My assumption: we are focusing on basic (end-user) SharePoint performance Getting a Solid Start Getting a Solid Start

“An ounce of prevention is worth a pound of cure.”

When you have the luxury of starting from scratch, you can get the basics right: • VM Product Configuration • Virtual Machine (VM) Setup • Configuration • SQL Server Installation Getting a Solid Start

VM Product Configuration • Treatment of Memory • Amount of memory actively used and swapped is configurable • Depending on product, oversubscription is possible • Your goal should be 1:1 • What if I can’t do 1:1? • Minimize swapping • Swapping degrades performance Getting a Solid Start

VM Product Configuration • Virtualization Options • Avoid oversubscription on processors and cores – nothing is free! • Understand what you’re enabling and avoid going crazy to avoid potential performance implications Getting a Solid Start

VM Product Configuration • Virtualization Options • Avoid oversubscription on processors and cores – nothing is free! • Understand what you’re enabling and avoid going crazy to avoid potential performance implications Getting a Solid Start

VM Product Configuration • Virtualization Options • Avoid oversubscription on processors and cores – nothing is free! • Understand what you’re enabling and avoid going crazy to avoid potential performance implications • Do you really know what Intel’s VT- X/EPT extensions are? Getting a Solid Start

VM Product Configuration • Virtualization Options • Avoid oversubscription on processors and cores – nothing is free! • Understand what you’re enabling and avoid going crazy to avoid potential performance implications • Do you really know what Intel’s VT- X/EPT extensions are? • VMs inside of VMs … Getting a Solid Start

VM Setup • Creating virtual disks • SharePoint is an I/O monster, especially when it comes to SQL Server • Big question: what is the storage technology in- use? Traditional hard drives? SSDs? Combination? • Next question: where are the bottlenecks likely to occur? Drives? Controllers? Elsewhere? • Our goal: avoid too many (abstraction) layers; each layer adds overhead Getting a Solid Start

VM Setup • Setup a new 100GB HD: Common Example • Create a new 100GB drive • Place on traditional hard drive(s) • Drives are in a RAID-5 configuration • Space allocated on demand • What you don’t see … • Cost of parity calculations • Cost of virtual translation • Cost to “auto-grow”/allocate real drive space • Latency at every step Getting a Solid Start

VM Setup: ways to improve the scenario • Understand common RAID configurations. • RAID-5: cheaper, but costly parity calculations • RAID-10: better performance, more expensive Getting a Solid Start

VM Setup: ways to improve the scenario • Understand common RAID configurations. • RAID-5: cheaper, but costly parity calculation • RAID-10: better performance, more expensive • Go “direct to disk” with pass-through options • Hyper-V – “pass-through” storage • VMware – “mapping” a disk Getting a Solid Start

VM Setup: ways to improve the scenario • Understand common RAID configurations. • RAID-5: cheaper, but costly parity calculation • RAID-10: better performance, more expensive • Go “direct to disk” with pass-through options • Hyper-V – “pass-through” storage • VMware – “mapping” a disk • Pre-allocate virtual disk space • Initialize drive space before it’s needed • This will chew up real drive space! Getting a Solid Start

VM Setup: ways to improve the scenario (summary) • Remove unnecessary sources of latency • Allocation at run-time also hurts a lot. Pre- allocate your storage space. • Stay off of USB drives in performance-critical SLOW! scenarios. USB latency hurts. Getting a Solid Start

VM Setup: ways to improve the scenario (summary) • Remove unnecessary sources of latency • Allocation at run-time also hurts a lot. Pre- allocate your storage space. • Stay off of USB drives in performance-critical scenarios. USB latency hurts. • Avoid VMDKs and VHDs by passing-through to dedicated storage. Getting a Solid Start

VM Setup: ways to improve the scenario (summary) • Remove unnecessary sources of latency • Allocation at run-time also hurts a lot. Pre- allocate your storage space. • Stay off of USB drives in performance-critical scenarios. USB latency hurts. • Avoid VMDKs and VHDs by passing-through to dedicated storage. • Software RAID kills performance in so many ways. Use hardware RAID. Getting a Solid Start

VM Setup: ways to improve the scenario (summary) • Remove unnecessary sources of latency • Allocation at run-time also hurts a lot. Pre- allocate your storage space. • Stay off of USB drives in performance-critical scenarios. USB latency hurts. • Avoid VMDKs and VHDs by passing-through to dedicated storage. • Software RAID kills performance in so many ways. Use hardware RAID. • Realize that multiple virtual drives on a single drive array degrades performance! Getting a Solid Start

VM Setup: ways to improve the scenario (summary) • The great equalizer – solid state drives (SSDs) • Dramatically better performance hands-down • No moving parts • Performance varies for SSDs • Choose your protocol: SATA (AHCI) vs. NVMe • M.2 is a new form factor, not a new standard • Limits (theoretical) • SATA (300MB/s) or (600MB/s) (SATA 2 or 3) • NVMe (2GB/s) or (4GB/s) (PCIe Gen 2 or 3) • The role of so-called “hybrids” Getting a Solid Start

Operating System Options: Paging File • Various strategies for managing • System managed • Manual allocation(s) • Remove entirely • I prefer to assign an allocation manually • Create a dedicated paging drive • Pre-allocate space in VM environment • Size is 1.5x the amount of memory Getting a Solid Start

Operating System Options: Paging File • Various strategies for managing • System managed • Manual allocation(s) • Remove entirely • I prefer to assign an allocation manually • Create a dedicated paging drive • Pre-allocate space in VM environment • Size is 1.5x the amount of memory • Alter Windows paging options • (note: be sure to set no paging for C:) Getting a Solid Start

Operating System Options: Miscellaneous • Visual Effects • “Adjust for best performance” • Turns off animations and other CPU-wasting eye-candy Getting a Solid Start

Operating System Options: Miscellaneous • Visual Effects • “Adjust for best performance” • Turns of animations and other CPU-wasting eye-candy • If your host drives are removable … • Review removal policies for each drive • Will likely have policies that optimize for performance – typically by enabling caching. Getting a Solid Start

SQL Server is the performance lynchpin in nearly all SharePoint environments. Luckily, there are some pretty basic adjustments that will help improve performance: • Instant File Initialization … • Storage Selection … • Drive Formatting … • Data and Log Assignment … • TempDB Configuration … • DB Sizing and Autogrowth … Tools and Monitoring Servers Reasons

Why do we monitor performance? Reasons typically fall into one of the following three categories: • We are seeking to understand why our SharePoint environment is underperforming • Troubleshooting! • We want to ensure that we have enough headroom to scale and grow as desired. • Capacity! • We want to quantify changes we’ve made to our farm in terms of performance • Improvements! Troubleshooting

We’re looking for the source of a performance problem. Where should we start?

Performance issues typically originate in at least one general sub-system: • Memory • Network • Processor (CPU) • Storage (Disk)

Of course, SharePoint problems often muddy the waters by spanning more than one category Tools

Recommendation: start with monitoring the server(s) over time to gain an understanding: • First understand “the normal state” of a server • Then observe the server when a problem occurs Tools

Recommendation: start with monitoring the server(s) over time to gain an understanding: • First understand “the normal state” of a server • Then observe the server when a problem occurs

Many different tools at our disposal: • Farm Health Analyzer • Event Viewer • ULS Viewer • Fiddler • Developer Dashboard • Wireshark Performance Counters

Today’s focus for performance monitoring is on counters • Specific performance counters that can help direct further investigation and keep us out of the weeds Performance Counters

Today’s focus for performance monitoring is on counters • Specific performance counters that can help direct further investigation and keep us out of the weeds

How do we view performance counters? • Windows Performance Monitor (perfmon.exe) Performance Counters

Today’s focus for performance monitoring is on counters • Specific performance counters that can help direct further investigation and keep us out of the weeds

How do we view performance counters? • Windows Performance Monitor (perfmon.exe) • Windows Resource Monitor (resmon.exe) Performance Counters

Today’s focus for performance monitoring is on counters • Specific performance counters that can help direct further investigation and keep us out of the weeds

How do we view performance counters? • Windows Performance Monitor (perfmon.exe) • Windows Resource Monitor (resmon.exe) • More specialized tools (e.g., SysKit’s tools) Performance Counters

Performance Counter Basics

The operating system exposes counters • Memory, CPU, network, and more Performance Counters

Performance Counter Basics

The operating system exposes counters • Memory, CPU, network, and more

Applications oftentimes expose their own counters • For instance, SharePoint alone exposes over 20 categories and hundreds of counters Performance Counters

Performance Counter Basics

The operating system exposes counters • Memory, CPU, network, and more

Applications oftentimes expose their own counters • For instance, SharePoint alone exposes over 20 categories and hundreds of counters

Bottom line: unless you know what to watch, you’ll suffer a cruel and horrible death at the hands of the Performance Counter Gods. Server Roles and Counters

What should I be watching?

That depends on the role of the server • Web Front-End • Application Server • SQL Server Web Front-Ends

WFEs serve-up pages through IIS, so we want low values for all of these counters • ASP.NET: Requests Queued (should be “low”) • ASP.NET: Requests Rejected (should be 0) • ASP.NET: Request Wait Time (should be near 0) • ASP.NET: Worker Process Restarts (should be 0) Web Front-Ends

WFEs serve-up pages through IIS, so we want low values for all of these counters • ASP.NET: Requests Queued (should be “low”) • ASP.NET: Requests Rejected (should be 0) • ASP.NET: Request Wait Time (should be near 0) • ASP.NET: Worker Process Restarts (should be 0)

WFEs also use their memory for caching to accelerate web requests. • ASP.NET Applications: Cache API Trims (should be near 0) • ASP.NET Applications: Cache API Hit Ratio (should be “high”) • SharePoint Publishing Cache: Total Number of Cache Compactions (should be near 0) • SharePoint Publishing Cache: Publishing Cache Hit Ratio (should be “high”) • SharePoint Publishing Cache: Publishing Cache Flushes / Second (should be 0) Web Front-Ends

WFEs use disks for BLOB caching • SharePoint Publishing Cache: BLOB Cache % Full (maintain headroom) Application Servers

Unless an application server is experiencing issues specific to its function (which might require monitoring specialized counters), consider monitoring the following: • Processor: % Processor Time (>75% - 85% is bad) • Memory: Available Mbytes (<2 GB is bad) • Memory: Cache Faults/sec (>1 is bad) • Memory: Pages/sec (>10 is bad) • Disk: Avg. Disk Queue Length (depends) • Disk: % Idle Time (<90% is bad) • Disk: % Free Space (<30% is bad)

These also are valid for WFEs, as well! SQL Servers

Consider watching the following: • SQLServer:Buffer Manager: Buffer Cache Hit Ratio • SQLServer:Databases: Transactions/sec • SQLServer:General Statistics: User Connections • SQLServer:Latches: Average Latch Wait Time (ms) • SQLServer:Latches: Latch Waits/sec • SQLServer:Locks: Average Wait Time (ms) • SQLServer:Locks: Lock Wait Time (ms) • SQLServer:Locks: Number of Deadlocks/sec • SQLServer:Plan Cache: Cache Hit Ratio • SQLServer:SQL Statistics:SQL Compilations/sec • SQLServer:SQL Re-Compilations/sec Page Performance Monitoring Page Performance Monitoring

We’ve been looking at server-side performance monitoring thus far. It represents only half of the overall equation. Page Performance Monitoring

We’ve been looking at server-side performance monitoring thus far. It represents only half of the overall equation.

We need go to put ourselves in the role of the end-user to monitor and diagnose a number of other issues, including page performance issues. Page Performance Monitoring

We’ve been looking at server-side performance monitoring thus far. It represents only half of the overall equation.

We need go to put ourselves in the role of the end-user to monitor and diagnose a number of other issues, including page performance issues.

What can we do from the other end of the wire? Page Performance Monitoring

The answer is “quite a bit”

Your browser is an amazingly capable performance tool – if you understand how to use it. Page Performance Monitoring

The answer is “quite a bit”

Your browser is an amazingly capable performance tool – if you understand how to use it.

Requests and their responses are recorded chronologically – including all sorts of information such as HTTP headers, response codes, cookies, and much more. Page Performance Monitoring

X-SharePointHealthScore • A measure of the front-end’s general load or stress. Values from 0 (no stress) to 10 (max stress). We want this low. Page Performance Monitoring

X-SharePointHealthScore • A measure of the front-end’s general load or stress. Values from 0 (no stress) to 10 (max stress). We want this low.

SPRequestDuration • The amount of time your request spends processing on the server (in ms). Ideally less than three seconds (3000ms) Page Performance Monitoring

X-SharePointHealthScore • A measure of the front-end’s general load or stress. Values from 0 (no stress) to 10 (max stress). We want this low.

SPRequestDuration • The amount of time your request spends processing on the server (in ms). Ideally less than three seconds (3000ms)

SPIisLatency • The amount of time your request spends waiting on the server (in ms). Should be near zero. Page Performance Monitoring

Round Trip Time – (SPRequestDuration + SPIisLatency) = Time lost “Elsewhere” Page Performance Monitoring

Round Trip Time – (SPRequestDuration + SPIisLatency) = Time lost “Elsewhere”

For example: • Round Trip Time = 76.04ms • SPRequestDuration = 51ms • SPIisLatency = 0 • Time Lost Elsewhere = 25.04ms Page Performance Monitoring

Round Trip Time – (SPRequestDuration + SPIisLatency) = Time lost “Elsewhere”

For example: • Round Trip Time = 76.04ms • SPRequestDuration = 51ms • SPIisLatency = 0 • Time Lost Elsewhere = 25.04ms

This is a high-performance SharePoint farm that is not under load. • May not reflect real world conditions SharePoint On-Premises

This will work for … • SharePoint 2013 on-prem SharePoint On-Premises

This will work for … • SharePoint 2013 on-prem • SharePoint 2016 on-prem The Common Outcomes

I’ve got consistently high SPRequestDuration values • This is oftentimes where we find questionable dev practices • May be related to server (over-)load or other factors • X-SharePointHealthScore can corroborate (or not) The Common Outcomes

I’ve got consistently high SPRequestDuration values • This is oftentimes where we find questionable dev practices • May be related to server (over-)load or other factors • X-SharePointHealthScore can corroborate (or not)

I’m seeing a lot of “time lost elsewhere” • Network congestion or failure • Web proxies inserting themselves between you and SharePoint • DNS resolution issues • Routing problems Other Questions? References References

1. What does virtualize Intel VT-x/EPT or AMD-V/RVI do? https://communities.vmware.com/thread/525101 2. What Are VMware Virtual CPU Performance Counters (vPMCs)? https://www.vladan.fr/what-are-vmware-virtual-cpu-performance-monitoring-counters- vpm1cs/ 3. The road to IOMMU (directed video card memory access) https://communities.vmware.com/thread/399066 4. Advantages and Disadvantages of Various RAID Levels https://10gbps.io/blog/advantages-disadvantages-various-raid-levels/ 5. Configuring Pass-Through Disks in Hyper-V https://blogs.technet.microsoft.com/askcore/2008/10/24/configuring-pass-through-disks-in- hyper-v/ 6. NVMe vs. vs. SSD vs HDD Performance: Is it Time to Switch? https://photographylife.com/nvme-vs-ssd-vs-hdd-performance 7. Why Storage Drive Speeds Don’t Hit Their Theoretical Limits http://www.tested.com/tech/pcs/457172-why-storage-drive-speeds-dont-hit-their-theoretical- limits/ References

8. What is SSHD (Solid State ) https://www.lifewire.com/solid-state-hybrid-drive-833451 9. Disable Automatic Updates in 2016 https://social.technet.microsoft.com/Forums/lync/en-US/d3a2694c-32da-4158-943a- 81c2904ffb3d/disable-automatic-updates-in-2016?forum=WinServerPreview 10.Storage and SQL Server Capacity Planning and Configuration (SharePoint Server) https://technet.microsoft.com/en-us/library/cc298801(v=office.16).aspx 11.Best Practices for SQL Server in a SharePoint Server Farm https://technet.microsoft.com/en-us/library/hh292622(v=office.16).aspx 12.Storage and SQL Server Capacity Planning and Configuration (SharePoint Server) https://technet.microsoft.com/en-us/library/a96075c6-d315-40a8-a739- 49b91c61978f(v=office.16)#Section6_3 13.Diskspd Utility: A Robust Storage Testing Tool (superseding SQLIO) https://gallery.technet.microsoft.com/DiskSpd-a-robust-storage-6cd2f223 14.Github repository for diskspd https://github.com/microsoft/diskspd References

15. Using Microsoft DiskSpd to Test Your Storage Subsystem https://sqlperformance.com/2015/08/io-subsystem/diskspd-test-storage 16. CrystalDiskMark 6.0.0 https://crystalmark.info/download/index-e.html 17. The Ultimate SharePoint Performance Guide https://leanpub.com/SharePointPerformanceGuide/c/SysKit 18. SysInternals Suite https://docs.microsoft.com/en-us/sysinternals/downloads/sysinternals-suite 19. AutoSPInstaller on GitHub https://github.com/brianlala/AutoSPInstaller 20. AutoSPInstaller GUI https://autospinstaller.com 21. Monitoring and maintaining SharePoint Server 2013 https://technet.microsoft.com/en-us/library/ff758658(v=office.16).aspx References

22. Performance Testing for SharePoint Server 2013 https://technet.microsoft.com/en-us/library/ff758659(v=office.16).aspx 23. Capacity management and sizing overview for SharePoint Server 2013 https://technet.microsoft.com/en-us/library/ff758647(v=office.16).aspx 24. SharePoint Performance Monitoring – How and Why? http://blog.syskit.com/sharepoint-performance-monitoring 25. Performance Counters for ASP.NET https://msdn.microsoft.com/en-us/library/fxk122b4.aspx 26. Monitor Cache Performance in SharePoint Server 2016 https://technet.microsoft.com/en-us/library/ff934623(v=office.16).aspx 27. ASP.NET Performance Monitoring, and When to Alert Administrators https://msdn.microsoft.com/en-us/library/ms972959.aspx 28. MOSS Object Cache Memory Tuning is not an Intuitive Process https://sharepointinterface.com/2009/08/30/moss-object-cache-memory-tuning-is- not-an-intuitive-process/ References

29. High Avg Disk Queue Length and Finding the Cause http://www.ithacks.com/2008/09/12/high-avg-disk-queue-length-and-finding-the- cause/ 30. SharePoint Performance: Best Practices from the Field https://www.slideshare.net/jasonhimmelstein/sharepoint-performance 31. ULS Viewer https://www.microsoft.com/en-us/download/details.aspx?id=44020 32. Fiddler https://www.telerik.com/download/fiddler 33. Using the Developer Dashboard https://msdn.microsoft.com/en-us/library/office/ff512745(v=office.16).aspx 34. The Five-Minute Page Performance Troubleshooting Guide for SharePoint Online https://sharepointinterface.com/2017/07/07/the-five-minute-page-performance- troubleshooting-guide-for-sharepoint-online/ References

35. Akamai Reveals 2 Seconds As The New Threshold Of Acceptability For Ecommerce Web Page Response Times https://www.akamai.com/us/en/about/news/press/2009-press/akamai- reveals-2-seconds-as-the-new-threshold-of-acceptability-for-ecommerce-web- page-response-times.jsp 36. How Loading Time Affects Your Bottom Line https://blog.kissmetrics.com/loading-time/

Sean P. McDonough SharePoint and Office 365 Gearhead, Tinkerer, Microsoft MVP

Email: [email protected] Twitter: @spmcdonough Blog: http://SharePointInterface.com About: http://about.me/spmcdonough