JIT Compilation Module Overview

JIT Compilation Native vs. Managed Compilation Managed Execution Phases Assembly Loading & Initialization JIT Compilation JIT Optimizations What’s new in NGEN 4.0? When to use NGEN?

2 Running Code

Behavior in Windows 2000 - Legacy entry point mscoree!CorExeMain gets used Behavior in Windows XP - The operating system loader checks for managed modules by examining a bit in the common object file format (COFF) header - The bit being set denotes a managed module - If the loader detects managed modules, it loads mscoree.dll, and clr!CorValidateImage and clr!CorImageUnloading notify the loader when the managed module images are loaded and unloaded clr!CorValidateImage performs the following: - Ensures that the code is valid managed code - Changes the entry point in the image to an entry point in the runtime On 64-bit Windows, _CorValidateImage modifies the image that is in memory by transforming it from PE32 to PE32+

3 Native compile vs Managed compile

Simplified view of native code compilation

Native Code

.CPP or .C file .OBJ File (Machine .EXE or .DLL File containing C or C++ Compile Link language) (Machine language) code

d T e y i s p T n c e y fo r p r ib e m in d a g e ti s o c n r ib in g

in fo rm a t io n

4 Native compile vs Managed compile

Simplified view of managed code compilation

Managed Code

Machine Language Assembly(.EXE or .CS File containing C# generated in memory Compile .DLL) containing MSIL Execute code at runtime by JIT and Metadata compiler

5 Managed Execution Phases

Phase .NET Source code

Compile Time CompileIL and Metadata

IL and Metadata

Run Time JITNative (CLR) Code

Native Code

6 JIT Compilation

What does JIT do? Checks if function is called 1st time - JIT compiles IL code to native code if true Stores native code in memory Updates MethodDescriptor field - Reference updated to point to memory location of native code

7 Managed Execution

JIT Compilation

MyModule::Main

No First call? Retrieve address of native

Yes code from MethodDesc

CILJit::compileMethod (x86) PreJit::compileMethod (x64) [verifies and compiles the IL] Execute the native code

Native code, GCInfo, EH data,etc.

Store native code in memory Store the address in MethodDesc

8 MethodDescriptor

Contains implementation of a managed method Generated as part of the class loading procedure Initially points to IL Code Can be determined during debugging - !SOS.DumpMD

0:004> !dumpmd 009969a8 Method Name: MyApp.MainForm.menu_Click(System.Object, System.EventArgs) Class: 00cd5c0c MethodTable: 00996ad4 mdToken: 0600028d Module: 00992c3c IsJitted: no CodeAddr: ffffffff Transparency: Critical

9 Managed Execution

Assembly Loading and Initialization

MSCoreEE.dll is loaded Process starts and loads the .NET Process Main thread starts executing Framework by calling _CorExeMain

Initializes CLR

ClassLoader is called

Reads MetaData tables Execute Main Build InMemory representation

JIT compile the Main method MethodTable & EEClass 10 Anatomy of a managed non-value instance

(very schematic) Loader Heap

EEClass MethodTable GC Heap Full (“Cold”) Instance EEClass* metadata “Hot” metadata MethodTable* Method MethodDesc Desc* Pointer to VTables •PreJittedStub Field layout •JItted code •“IL” stub

* == Pointer 11 !DumpClass & DumpMT

12 JIT Optimizations

Summary Types of Optimization: - JIT Inlining - JIT Tail Calls

13 JIT Optimizations

Tail Calls When the last thing a function does is call another function - Calls without Optimization: static public void Main() Call One() { Call Two() Helper(); Call Three() }

- With Optimization: static public void Helper() { Call One() One(); Call Two() Two(); Jump Three() Three(); - will use the same stack space as the caller. }

- improve data locality, memory usage, static public void Three() and cache usage. { ... }

14 JIT Optimizations

Tail Calls Tail Call Feature set different between X86 and X64 - Can lead to e.g. Stack Overflow Exception on X86 Debug but works fine on X64 ( where x86 stack just enough in release with tail call, but overflows without optimization) No Tail Calls possible if: - Caller doesn't return immediately after the call - Stack arguments between caller and callee are incompatible in a way that would require shifting things around in the caller's frame before the callee could execute - Caller and callee return different types - We inline the call instead (inlining is way better than tail calling, and opens the door to many more optimizations) - Security issues - The debugger / profiler turned off JIT optimizations - Full list see: .NET 2.0 Tail limits and .NET 4.0 Tail Limits

15 JIT Optimizations – Inlining

class Test { Without inlining static int And(int i1, int i2) { return i1 & i2; } static int i; static public void Main() { i = And(i, 0); } }

With inlining class Test { static int i; static public void Main() { i = 0 // xor edx,edx }

} 16 JIT Optimizations – Inlining

Main() Without Inlining: mov ecx,dword ptr ds:[183368h] ; setup first argument (i) xor edx,edx ; setup second argument (0) call dword ptr ds:[183818h] (Inline.Program+Test.And(Int32, Int32), mdToken: 06000002) ; Call And(…) mov dword ptr [ebp-4],eax ; save return value mov eax,dword ptr [ebp-4] ; assign result to static mov dword ptr ds:[00283368h],eax ; assign result to static ret ; return

And(Int32, Int32) mov eax,dword ptr [ebp-4] ; move arg 1 to eax and eax,dword ptr [ebp-8] ; Add argument 2 to eax (return register) ret ; return to caller

17 JIT Optimizations – Inlining

With Inlining - And(.,.) is inlined now - No add reg,reg - because not needed (argument is 0)

MethodDesc Table Entry MethodDesc JIT Name 53dda7e0 53bb4934 PreJIT System.Object.ToString() 53dde2e0 53bb493c PreJIT System.Object.Equals(System.Object) 53dde1f0 53bb495c PreJIT System.Object.GetHashCode() 53e61600 53bb4970 PreJIT System.Object.Finalize() 001dc019 001d3828 NONE Inline.Program+Test..ctor() 001dc011 001d3810 NONE Inline.Program+Test.And(Int32, Int32) 00270070 001d381c JIT Inline.Program+Test.Main()

Main() xor edx,edx ; generate final result mov dword ptr ds:[1D3368h],edx ; move result to static ret ; return

18 Demo: JIT Compilation

!dumpmt –md bp cmdStartJit_Click JIT Optimizations

Additional Config Instruct CLR not to optimize the code (during jit) without recompiling the dll: - Use an ini file (and symbols) • MyDll.ini: [.NET Framework Debugging Control] GenerateTrackingInfo=1 (per default on up from .NET 2.0) AllowOptimize=0 • (useable for GAC as well) Instruct CLR to ignore (optimized) Ngen Image - Use Environment variable: set COMPLUS_ZapDisable=1

20 JIT Performance Counters - % Time in Jit

% elapsed time in JIT compilation since JIT started Updated at the end of every JIT compilation phase. A JIT compilation phase occurs when a method and its dependencies are compiled. A value > 5% can indicate a problem - Is Ngen an option? • http://msdn.microsoft.com/en-us/magazine/cc163610.aspx - Do you use multiple AppDomains? • loading assemblies as domain neutral can help - Minimize the classes and assemblies within code path • Use code coverage to determine these components. - See .NET Framework Usage Performance Rules/DA0009

21 JIT Performance Counters -summary

Performance counter Description Displays the total number of intermediate language (MSIL) bytes compiled by # of IL Bytes the just-in-time (JIT) compiler since the application started. This counter is equivalent to JITted the Total # of IL Bytes Jitted counter. # of IL Methods Displays the total number of methods JIT-compiled since the application started. This JITted counter does not include pre-JIT-compiled methods.

Displays the percentage of elapsed time spent in JIT compilation since the last JIT % Time in Jit compilation phase. This counter is updated at the end of every JIT compilation phase. A JIT compilation phase occurs when a method and its dependencies are compiled.

Displays the number of MSIL bytes that are JIT-compiled per second. This counter is IL Bytes Jitted / not an average over time; it displays the difference between the values observed in the sec last two samples divided by the duration of the sample interval.

Displays the peak number of methods the JIT compiler has failed to compile since the Standard Jit application started. This failure can occur if the MSIL cannot be verified or if there is an Failures internal error in the JIT compiler. Total # of IL Displays the total MSIL bytes JIT-compiled since the application started. This counter is Bytes Jitted equivalent to the # of IL Bytes Jitted counter.

22 JIT Performance Counters - % Time in Jit

% elapsed time in JIT compilation since JIT started Updated at the end of every JIT compilation phase. A JIT compilation phase occurs when a method and its dependencies are compiled. A value > 5% can indicate a problem - Is Ngen an option? • http://msdn.microsoft.com/en-us/magazine/cc163610.aspx - Do you use multiple AppDomains? • loading assemblies as domain neutral can help - Minimize the classes and assemblies within code path • Use code coverage to determine these components. - See .NET Framework Usage Performance Rules/DA0009

23 JIT Performance Counters -summary

Performance counter Description Displays the total number of Microsoft intermediate language (MSIL) bytes compiled by # of IL Bytes the just-in-time (JIT) compiler since the application started. This counter is equivalent to JITted the Total # of IL Bytes Jitted counter. # of IL Methods Displays the total number of methods JIT-compiled since the application started. This JITted counter does not include pre-JIT-compiled methods.

Displays the percentage of elapsed time spent in JIT compilation since the last JIT % Time in Jit compilation phase. This counter is updated at the end of every JIT compilation phase. A JIT compilation phase occurs when a method and its dependencies are compiled.

Displays the number of MSIL bytes that are JIT-compiled per second. This counter is IL Bytes Jitted / not an average over time; it displays the difference between the values observed in the sec last two samples divided by the duration of the sample interval.

Displays the peak number of methods the JIT compiler has failed to compile since the Standard Jit application started. This failure can occur if the MSIL cannot be verified or if there is an Failures internal error in the JIT compiler. Total # of IL Displays the total MSIL bytes JIT-compiled since the application started. This counter is Bytes Jitted equivalent to the # of IL Bytes Jitted counter.

24 .NET 4.5: Enabling Multi-Core Background JIT using public static void

/*Sets Application Profiler Path */

/* Starts JIT on multicore Systems Starts process of recording current method use, which later overwrites the specified profile file. */

25 .NET 4.5: Multi-Core Background JIT

ProfileOptimization.StartProfile("MyApp.Scenario1"); … ProfileOptimization.StartProfile("MyApp.Scenario2"); …. ProfileOptimization.StartProfile("MyApp.Scenario3");

http://support.microsoft.com/kb/2715214/en-us

26 .NET 4.6 - RyuJIT

Better througput (25% for Bing)

Disable:

27 Demo: JIT Optimizations

AllowOptimize ETW Architecture

Controllers

Event Tracing Sessions

Session Session Session 1 2 … 64 Log Files

Events Events Events

Providers Consumers

29 JIT ETW tracing in .NET Framework 4

Gives Information about Inlining or tail-calling a certain method - Register .NET Events: wevtutil im ….\v4.0.21006\clr-etw.man

- Start ETW tracing (JIT events on: 1010)

Xperf –on base xperf- Stop-startETW JitTracing-onand e13c0d23view Trace-ccbc with-4e12Xperf-931b-d9cc2eee27e4:0x1010:5 -f JIT.etl

xperf –stop Jit Xperf –d base.etl Xperf –merge Jit.etl base.etl merge.etl Wpa merge.etl 30 .NET ETW Events -summary

Runtime keyword name Value Purpose

GCKeyword 0x00000001 Enables the collection of garbage collection events.

LoaderKeyword 0x00000008 Enables the collection of loader events. JITKeyword 0x00000010 Enables the collection of just-in-time (JIT) events.

Enables the collection of events for native image methods (methods processed by the Native Image Generator, NGenKeyword 0x00000020 Ngen.exe); used with StartEnumerationKeyword and EndEnumerationKeyword.

StartEnumerationKeyword 0x00000040 Enables the enumeration of all the methods in the runtime; used in conjunction with NGenKeyword.

Enables the enumeration of all the methods destroyed in the runtime; used in conjunction with JITKeyword and EndEnumerationKeyword 0x00000080 NGenKeyword.

SecurityKeyword 0x00000400 Enables the collection of security events.

AppDomainResourceManagementK 0x00000800 Enables the collection of resource monitoring events at an level. eyword

JITTracingKeyword 0x00001000 Enables the collection of JIT tracing events. InteropKeyword 0x00002000 Enables the collection of interop events. ContentionKeyword 0x00004000 Enables the collection of contention events. ExceptionKeyword 0x00008000 Enables the collection of exception events. ThreadingKeyword 0x00010000 Enables the collection of threadpool events. StackKeyword 0x40000000 Enables the collection of CLR stack trace events 31 Demo: JIT ETW Events

Wevtutil im CLR-ETW.man xperf .NET Decompilation Managed decompilers - Output source code and IL - Source output available in multiple languages Obfuscation can make it more difficult to decompile source code Many different options

33 PreJit / NGEN

.NET 2.0 - 3.5 Ngen is calling a Service (LocalSystem). This service compiles the image within background.

C:\WINDOWS\\ mscorsvw.exe Ngen MyNiceExe.EXE NativeImages_v2.0.50215_32 \MyExe\MyExe.exe JIT

Support compilation of all dependent DLLs and update functionality Metadata is now included within created dll/exe

34 NGEN 4.0

Side by Side support Ngen.exe now compiles assemblies with full trust, CAS policy is no longer evaluated. Native images that are generated with Ngen.exe can no longer be loaded into applications that are running in partial trust. Located in folder Framework\v4.0.xxxxx Supports .NET 4.0 and .NET 2.0 assemblies - Generates 2.0 image • ngen.exe install <2.0 assembly> - Generates 4.0 image • ngen.exe install <2.0 assembly> /ExeConfig: OR • ngen.exe install <2.0 EXE with a config file that indicates 4.0 as the preferred runtime>

35 NGEN 4.0

Target Patching In .NET 2 - 3.5, if Assembly Y depends on X then CLR re-ngen Y for any change in X, because • Y may inline methods from X • Y may use fields in X’s classes (layout of classes might change) • Y may derive X’s classes (layout of classes might change) BUT ~half of changes only modify bodies of large methods • Large methods not inlined cross-assembly • No need to re-NGEN – if only function bodies changed (unless function prototype changed) • Works great for QFEs and GDRs (small security fixes) • Unlikely to work for a service pack

36 .NGEN 4.0

Priotization of NGEN Priority 1 images compiles on all cores - ngen.exe install /queue:1 - ngen.exe install /queue:1 Priority 3 images compile at idle time - ngen.exe install /queue:3

37 .NGEN 4.5

Starting from windows 8 and .NET framework 4.5

Native images will be created Automatically by Auto NGen Maintenance Task

Images will be created based on “Assembly Usage Logs” created by the application in the AppData windows directory

Auto NGEN Maintenance Task is based on Automatic Maintenance that runs at background when the machine is idle

Auto NGEN Maintenance Task also reclaiming native images that are not in use anymore

38 .NGEN 4.5 Notes

The assembly must targets the .NET Framework 4.5

The Auto NGen runs only on Windows 8 and above For Desktop apps the Auto NGen applies only to GAC assemblies For Modern Style Apps Auto NGen applies to all assemblies Auto NGen will not remove not used rooted native images (Images NGened by the developers).

39 .NET 4.5: Managed Profile Guided Optimization

MPGO co-locates frequently used image data within a native image reduce the number of pages loaded from disk. • Less page faults reduce the number of copy-on-write pages. Improves startup time, memory usage (All apps)

IL Assembly Optimized with IL Assembly MPGO NGEN precompiled embedded native Image training profile

40 .NET 4.5: How to MPGO

1. Run the MPGO tool (as an administrator) with the necessary parameters. The optimized IL assemblies are created in the C:\Optimized folder.

MPGO -scenario MyLargeApp.exe -AssembyList *.* -OutDir C:\Optimized\

2. Run the NGen tool (as an administrator) with the necessary parameters for each application DLL:

NGEN.exe c:\Optimized\myLargeApp.exe .NET 4.6 - Ngen

Better througput (25% for Bing)

Disable for specific assemblies

42 When to use Ngen?

If DLL used in several loaded processes If multiple instances of application started e.g. Terminal Server Be sure to set the base address of your assemblies correctly Rebasing DLL during load impacts perf & prevents sharing image

43 ASLR

/DYNAMICBASE Address Space Layout Randomization - Up from Vista and Windows 2008 Server • Comes with .NET 3.5 Sp1 • C++: use DynamicBase Rebasing - done within the kernel - Pages still shareable - Backed up by the image – not the page file - Base Address is no issue anymore - Rebuild your app with .NET 3.5 sp1!

44 Review

1. What is the benefit of using NGEN? 2. What is tail optmization? 3. What tool can be used for viewing ETW files?

45 Reference

.NET Code Generation Blog - http://blogs.msdn.com/b/clrcodegeneration/ The Performance Benefits of NGen. - http://msdn.microsoft.com/en-us/magazine/cc163610.aspx JIT ETW tracing in .NET Framework 4 - http://blogs.msdn.com/b/clrcodegeneration/archive/2009/05/11/jit-etw-tracing- in-net-framework-4.aspx

46