Windows Performance Troubleshooting and Analysis Daniel Pearson David Solomon Expert Seminars Daniel Pearson

• Started working with Windows NT 3.51 • Three years at Digital Equipment Corporation • Supporting Intel and Alpha systems running Windows NT • Seven years at • Senior Escalation Lead in Windows base team • Worked in the Mobile Internet sustained engineering team • Instructor for David Solomon, co-author of the Windows Internals book series Agenda

• Components of performance analysis • Understanding the tools for troubleshooting and analyzing performance issues • Troubleshooting CPU and memory issues using various Windows tools

* Portions of this session are based on material developed by Mark Russinovich and David Solomon Components of Performance Analysis

• Event Tracing for Windows • Core component of the • Kernel mode data structures • Used to store information about the system and system objects that can be read by various tools • e.g. dt nt!_KTHREAD KernelTime • CPU performance monitoring events • Refer to the Intel 64 and IA-32 Architectures Software Developer‟s Manual • http://developer.intel.com/products/processor/manuals

Event Tracing for Windows

• Built in to the system • High performance, low overhead and scalable • 2.5% CPU usage for a sustained rate of 10,000 events/sec on a 2 GHz CPU1 • Operations throughout the system that are of interest to performance are fully instrumented • e.g. process and thread activity, registry I/O, disk I/O

1. Milirud, Michael. 2008. Windows Performance Analysis: Using Windows Performance Tools. Presented at Microsoft's WinHEC conference, November 5-7, Los Angeles, CA. Event Tracing for Windows

• Uses a buffering and logging mechanism implemented in the kernel • Per-processor buffers that are written to disk by an asynchronous writer thread • Ability to enable and disable tracing dynamically • Supports a managed code provider

Sysinternals Utilities Utilities

• Useful for displaying which files, registry keys and other objects processes have open and which DLLs they have loaded • • Useful for showing real-time , registry and process & thread activity • Available for download from the TechNet site • http://technet.microsoft.com/sysinternals Resource Monitor

• Included with and greatly enhanced in and Windows 2008 R2 • Allows the viewing of CPU, memory, disk and network resources as well as handles and modules in real time • Ability to end, suspend and resume processes as well as to start, stop and restart Windows services • Useful for identifying the highest resource consumers by individual resource type, e.g. CPU • Able to list the wait chain tree of a process to determine if a process is waiting on another

Using Resource Monitor Performance Monitor

• Queries performance counters that measure system state or activity • Current values are read at specific intervals • Performance counters are included in the operating system and can be included as part of applications • Able to collect event trace data from trace providers that report actions or events • Can combine multiple trace providers into a single session • Configuration information can be collected from registry keys at a specific time or interval

Using Performance Monitor Windows Performance Analyzer Windows Performance Analyzer

• Part of the Windows Performance Toolkit • Support for both x86, x64, and IA64 architectures • Consists of three primary programs • xperf.exe • Used for controlling tracing and processing trace data • xbootmgr.exe • Automates on and off state transitions and captures traces during those transitions • xperfview.exe • A graphical trace visualization tool to represent data in the form of interactive graphs and summary tables Windows Performance Analyzer

• Primarily uses the Event Tracing for Windows infrastructure built in to the system • Can be enabled or disabled at any time without requiring a system or process restart • Supports symbol decoding, sample profiling, and recording of call stacks on kernel events • Designed to be used during automation • All the functions of the tools are available via the command line tool xperf.exe Support for Earlier Systems

• The Windows Performance Toolkit will fail to install on Windows XP and on Windows Server 2003 although data collection is supported • Copy xperf.exe and perfctrl.dll • Trace analysis is only supported on Windows Vista and later systems Capturing a Performance Trace

• Kernel options divided into two parts • Kernel Flags • Identified by the use of uppercase characters • e.g. PROC_THREAD, LOADER, PROFILE • Kernel Groups • Indentified by the use of title case characters • e.g. Base, Diag, Latency, FileIO • Kernel Groups are made up of a collection of Kernel Flags • e.g. SysProf = PROC_THREAD+LOADER+PROFILE • Flags and groups are separated by the „+‟ token • e.g. xperf.exe -on FileIO+DISK_IO_INIT Merging of Performance Trace Data

• Traces can be copied to another system for analysis • The trace file should be “merged” on the collection system before analysis to include additional system information • xperf -d trace.etl

System and symbol information

Trace Merged trace

Kernel trace XPerf Using the Windows Performance Toolkit Understanding CPU Activity

• Windows uses 32 priority levels 31 • The system implements a preemptive, priority driven scheduler Real time • Priority adjustments can be applied to threads in the “dynamic” range 16 • At least one runnable thread with the 15 highest priority will be running

Dynamic

0 Context Switching

• A switch from one thread to another is known as a context switch • Switching involves saving the hardware state of a thread and restoring the state of another • When a thread is scheduled, that thread‟s context switch count is also incremented • The context switch count represents how often a thread begins running, not how long it ran Time Accounting Quirks

• Looking at total CPU time for each process may not reveal where the system has spent its time • CPU time accounting is driven by an interrupt timer which is set by the Hardware Abstraction Layer • Usually at either 10 or 15 msec intervals • Thread execution and context switches that happen between clock intervals are not accounted for • e.g. a thread runs and enters a wait before the clock fires • Thus threads may run but never get charged Time Accounting Prior to Windows Vista

• Windows accounted for CPU time based on the interval clock timer • Thread quantum expiration was not always fair • A thread might get almost no turn • Threads were also charged for interrupts that occurred while they were running

Idle Idle T1 T2 T2 Time Accounting Since Windows Vista

• Windows Vista and later reads the Time Stamp Counter during every context switch • The actual CPU cycles consumed are charged to a thread • Any interrupt time is not charged to the interrupted thread • Allows for more accurate quantum accounting • A thread gets at least one turn and at most will be given one turn plus an additional tick

Idle Idle T1 T1 T2 Troubleshooting High CPU Utilization Understanding Memory Management

• Windows provides two system memory pools • Nonpaged Pool and Paged Pool • Used for system wide persistent data • Prior to Windows Vista, pool sizes were a function of memory size and whether or not the system was configured as a server or a workstation • Windows Vista introduced the concept of a dynamic system address space

Dynamic System Address Space

• In 32-bit Windows Vista and later, virtual memory is assigned as needed • Permits larger paged, nonpaged, and session pools • Components still cannot exceed 2 GB on 32-bit systems • On 64-bit systems, address space regions are configured to their current maximum limits for all memory sizes Troubleshooting Memory Leaks Additional Information

• Windows Internals 5th edition • Windows Performance Analysis Developer Center • http://msdn.microsoft.com/performance • Windows Server Performance Team Blog • http://blogs.technet.com/winserverperformance • Ask the Performance Team Blog • http://blogs.technet.com/askperf Additional Information

• David Solomon Expert Seminars offers training on Windows Internals both as public and private workshops and public webinars via the Internet • Currently scheduled up and coming classes • Public workshop in London, April 12th – April 16th • Public webinar, April 26th & April 28th • Public workshop in New York, May 3rd – May 7th • Public workshop in San Francisco, November 8th – November 12th • Visit http://www.solsem.com for further course descriptions and up to date information