Revealing the performance aspects in your code
1 Three corner stones of HPC
• The parallelism can be exploited at three levels: message passing, fork/join, SIMD• Hyperthreading is not quite threading • Caches have many levels of hierarchy with own latencies and rules • A popular• strategyNodes can have different is choose frequency, one• Memory of is typicallyfirst non two-uniform and too let compiler doarch, the # of cores vectorization and SIMD width • Latency and bandwidth are different between different nodes • This view is functional, but performance agnostic
• Different instructions may be available for different types (SP vs. DP) • Alignment requirements can be strict and contradict with other constraints
Software and Services Group 2 First Things First! Get Base Line Performance (I/III) Single node: Xeon E5 and Xeon Phi™ separate
Software and Services Group 4 Get Base Line Performance (II/III) Heterogeneous: Xeon E5 + Xeon Phi™
Software and Services Group 5 Get Base Line Performance (III/III) Heterogeneous Cluster: N*(Xeon E5 + Xeon Phi™)
Software and Services Group 6 Base Line Results Analysis
• For 1 and 2 nodes we get a 2X Speedup for the heterogeneous version vs. Xeon E5. • For higher node numbers Xeon E5 shows a super linear speedup while Xeon E5 + Xeon Phi™ saturates. • Potential reasons may be: - message passing performance - changing load balance - sub optimal vectorization for Xeon Phi™
Software and Services Group 7 A few BKMs to Try! BKM: Tune programs with affinity set
• Problem: results are not stable b/w runs • Solution: use KMP_AFFINITY to bind threads
Compact
Scatter
Balanced
Software and Services Group 9 Check Scalability: Xeon & Xeon Phi
Using timer functions is quick and easy!
Xeon performance doesn’t scale beyond 1 socket. Need to be investigated!
100T run on 60C KNC gives Best Performance. Why?
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 10 Loop Profiler - Identify Time Consuming Loops / Functions to Optimize
Enables targeting parallelization/optimization efforts to most significant code areas ( hotspot identification ) • Easy to use: – Use compiler switches to add instrumentation to the application – Compiler instruments entry and exits of all loops and functions
icc -O1 -profile-functions -profile-loops=all -profile-loops-report=2…
– Running the application generates a report file with resulting counts – Both a human-readable text file (a table) and an XML-file are generated – Analyze data by looking at the raw text file, or use the GUI viewer shipped with compiler • Report file contains information such as: – Call count of routines – Self-time of functions / loops – Total-time of functions / loops – Average, minimum, maximum iteration counter of loops !!
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 11 Loop Profile Data Viewer GUI
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 12 Original Parallel Version: Not vectorized…Check Compiler VEC Reports!
10 static inline size_t posToIdx(const size_t width, const Position& pos){ 11 return (pos.y * width) + pos.x; 12 } 13 14 static void subtractPSF(const float* psf, ...) { ... 25 #pragma omp parallel for default(shared) 26 for (int y = starty; y <= stopy; ++y) { 27 for (int x = startx; x <= stopx; ++x) { 28 residual[posToIdx(residualWidth, Position(x, y))] -= gain * absPeakVal 29 * psf[posToIdx(psfWidth, Position(x - diffx, y - diffy))]; 30 } 31 } 32 } Compiler can’t identify 33 ... the index pattern
Compiler unable to vectorize the loop at line 27. Index compute complex!
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. 13 13 *Other brands and names are the property of their respective owners. Replacing OpenMP critical section: With a serial loop reduction loop…Better Scalability!
Original Code Optimized Code
#pragma omp parallel #pragma omp parallel Compute Compute per { { per thread thread Peak float threadAbsMaxVal = 0.0; float threadAbsMaxVal = 0.0; Peak int threadAbsMaxPos = 0; size_t threadAbsMaxPos = 0;
#pragma omp for schedule(static) #pragma omp for schedule(static) for (int i = 0; i < size; ++i) { for (int i = 0; i < size; ++i) { if (abs(image[i]) > (threadAbsMaxVal)) { if (abs(image[i]) > (threadAbsMaxVal)) { threadAbsMaxVal = abs(image[i]); threadAbsMaxVal = abs(image[i]); threadAbsMaxPos = i; threadAbsMaxPos = i; Store per } } thread
#pragma omp critical int t_num = omp_get_thread_num(); Peak if (threadAbsMaxVal > maxVal) { temp_Peak[t_num] = threadAbsMaxVal; maxVal = threadAbsMaxVal; temp_Pos[t_num] = threadAbsMaxPos; maxPos = threadAbsMaxPos; Find global } } peak across } for (int k = 0; k < num_t ; k++) { maxVal = image[maxPos]; all threads if ((temp_Peak[k]) > (maxVal)) { maxVal = temp_Peak[k]; maxPos = temp_Pos[k]; } Find global } peak across all Performance Gain: 1.12X maxVal = image[maxPos]; threads
#KNC 48C Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 14 Profiling MPI + OpenMP Heterogeneous Execution… Profile using Intel® Cluster Studio XE 10 MPI Xeon + 22 MPI Xeon Phi x 11 OpenMP
Unbalanced Load
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 16 Profile using Intel® Cluster Studio XE 12 MPI Xeon + 20 MPI Xeon Phi x 12 OpenMP
Balanced Load
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 17 Profiling using… Intel® VTune™ Amplifier XE
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 18 Intel® VTune™ Amplifier XE Performance Profiler
Where is my application…
Spending Time? Wasting Time? Waiting Too Long?
• Focus tuning on • See cache misses on • See locks by wait time functions taking time your source • Red/Green for CPU • See call stacks • See functions sorted by utilization during wait • See time on source # of cache misses
• Windows & Linux • Low overhead • No special recompiles
Advanced Profiling For Scalable Multicore Performance
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 19 Intel® VTune™ Amplifier XE Tune Applications for Scalable Multicore Performance
• Fast, Accurate Performance Profiles – Hotspot (Statistical call tree) – Call counts (Statistical) – Hardware-Event Sampling • Thread Profiling – Visualize thread interactions on timeline – Balance workloads • Easy set-up – Pre-defined performance profiles – Use a normal production build • Find Answers Fast – Filter extraneous data – View results on the source / assembly • Compatible – Microsoft, GCC, Intel compilers – C/C++, Fortran, Assembly, .NET, Java – Latest Intel® processors and compatible processors1 • Windows or Linux
– Visual Studio Integration (Windows) 1 IA32 and Intel® 64 architectures. – Standalone user i/f and command line Many features work with compatible processors. Event based sampling requires a genuine Intel® – 32 and 64-bit Processor.
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 20 Intel® VTune™ Amplifier XE Get a quick snapshot
4 cores
CPU Usage
Thread Concurrency
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. 21 *Other brands and names are the property of their respective owners. 21 Intel® VTune™ Amplifier XE Identify hotspots
Hottest Functions Hottest Call Stack
Quickly identify what is important
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 22 Intel® VTune™ Amplifier XE Identify threading inefficiency
Coarse Grain Locks
High Lock Contention Low Concurrency
Load Imbalance
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 23 Intel® VTune™ Amplifier XE Find Answers Fast
Adjust Data Grouping
… (Partial list shown) Click [+] for Call Stack
Double Click Function to View Source
Filter by Timeline Selection (or by Grid selection)
Filter by Module & Other Controls
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 24 Intel® VTune™ Amplifier XE Timeline Visualizes Thread Behavior
Transitions CPU Time Locks & Hotspot Lightweight Waits s Hotspots
Hovers:
• Optional: Use API to mark frames and user tasks • Optional: Add a mark during collection
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 25 Intel® VTune™ Amplifier XE See Profile Data On Source / Asm
Time on Source / Asm
Quick Asm navigation: Select source to highlight Asm
Right click for instruction reference manual Quickly scroll to hot spots.
Intel® VTune™ Amplifier XE Click jump to scroll Asm
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 26 High-level Features
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 27 Intel® VTune™ Amplifier XE Feature Highlights
• Basic Hot Spot Analysis (Statistical Call Graph) – Locates the time consuming regions of your application – Provides associated call-stacks that let you know how you got to these time consuming regions – Call-tree built using these call stacks • Advanced Hotspot and architecture analysis – Based on Hardware Event-based Sampling (EBS) – Pre-defined tuning experiments • Thread Profiling – Visualize thread activity and lock transitions in the timeline – Provides lock profiling capability – Shows CPU/Core utilization and concurrency information • GPU Compute Performance Analysis – Collect GPU data for tuning OpenCL applications. Correlate GPU and CPU activities • CPU Power Efficiency Analysis – Wake-up rate and frequency measurement per Core
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 28 Intel® VTune™ Amplifier XE Feature Highlights
• Attach to running processes – Hotspot and Concurrency analysis modes can attach to running processes • System wide data collection – EBS modes allows system wide data collection and the tool provides the ability to filter this data • GUI – Standalone GUI available on Windows* and Linux – Microsoft* Visual Studio integration • Command Line – Comprehensive support for regression analysis and remote collection • Platform & application support – Windows* and Linux (Android, Tizen, Yocto – in the ISS) – Microsoft* .NET/C# applications – Java* and mixed applications – Fortran applications
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 29 Intel® VTune™ Amplifier XE Feature Highlights
• Event multiplexing – Gather more information with each profiling run • Timeline correlation of thread and event data – Populates thread active time with event data collected for that thread – Ability to filter regions on the timeline • Advanced Source / Assembler View – See event data graphed on the source / assembler – View and analyze assembly as basic blocks – Review the quality of vectorization in the assembly code display of your hot spot • Provides pre-defined tuning experiments – Predefined profiles for quick analysis configuration – A user profile can be created on a basis of a predefined profile • User API – Rich set of user API for collection control, events highlighting, code instrumentation, and visualization enhancing.
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 30 Data Collectors and Analysis Types
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 31 Intel® VTune™ Amplifier XE Analysis Types (based on technology)
Software Collector Hardware Collector Any x86 processor, any virtual, no driver Higher res., lower overhead, system wide Basic Hotspots Advanced Hotspots Which functions use the most time? Which functions use the most time? Where to inline? – Statistical call counts Concurrency General Exploration Tune parallelism. Where is the biggest opportunity? Colors show number of cores used. Cache misses? Branch mispredictions? Locks and Waits Advanced Analysis Tune the #1 cause of slow threaded Dig deep to tune bandwidth, cache performance – waiting with idle cores. misses, access contention, etc.
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 32 Intel® VTune™ Amplifier XE Pre-defined Analysis Types
Advanced Hotspot analysis based on the underlying architecture
User mode sampling, Threading, IO, Signaling API instrumentation
3rd Generation Core Architecture (a.k.a SandyBridge) analysis types
4th Generation Core Architecture (a.k.a Haswell) analysis types
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 33 GUI Layout
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 34 Creating a Project GUI Layout
1
2
3
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 35 Selecting type of data collection GUI Layout
All available analysis types
Different ways to start the analysis
Helps creating new analysis types
Copy the command line to clipboard
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 36 Profile a Running Application No need to stop and re-launch the app when profiling
Two Techniques: • Attach to Process:
- Any type of analysis
• Profile System:
- Advanced Hotspots & Custom EBS - Optional: Filter by process after collection
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 37 Summary View GUI Layout
Clicking on the Summary tab shows a high level summary of the run
Timing for the whole application run
List of 5 Hotspot functions
CPU Usage
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 38 Bottom-Up View GUI Layout
Menu and Tool bars
Analysis Viewpoint currently Type being used Tabs within each result Current grouping
Grid area
Stack Pane
Filter area
Timeline area
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 39 Top-Down View GUI Layout Clicking on the Top- Down Tree tab changes stack representation in the Grid
Top-level function and it’s tree
Total Time Self Time (self + children’s)
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 40 Caller/Callee View GUI Layout Select a function in the Bottom-Up and find the caller/callee
List of functions sorted by CPU Time
List of callers and their stacks
List of callees and their stacks
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 41 Results Comparison
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 42 Intel® VTune™ Amplifier XE Terminology
• Elapsed Time The total time your target application ran. Wall clock time at end of application – Wall clock time at start of application
• CPU Time The amount of time a thread spends executing on a logical processor. For multiple threads, the CPU time of the threads is summed.
• Wait Time The amount of time that a given thread waited for some event to occur, such as: synchronization waits and I/O waits
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 43 Intel® VTune™ Amplifier XE CPU Usage
Thread1 Waiting Thread1
Thread2 Waiting Thread2
Thread3 Waiting Thread3
Thread running
Thread waiting 1sec 1sec 1sec 1sec 1sec 1sec
• Elapsed Time: 6 seconds CPU Usage • CPU Time: T1 (4s) + T2 (3s) + T3 (3s) = 10 seconds
• Wait Time: T1(2s) + T2(2s) + T3 (2s) = 6 seconds0 1 2 3 4
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 44 CPU Usage
Summary View: CPU Usage Histogram
Only CPU Time measured Wait Time is not counted in Hotspots Bottom-Up View: CPU Time
Function CPU Time By CPU Utilization My_Func() 10 s
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 45 Overhead and spin
Threading library internals
Thread1 Waiting lib Thread1
Thread2 Waiting lib Thread2
Thread3 Waiting uThread3ser code lib running or spin
Spin wait Thread running
Thread waiting 1sec 1sec 1sec 1sec 1sec 1sec
• Elapsed Time: 6 seconds CPU Usage • CPU Time: T1 (4s) + T2 (3s) + T3 (3s) = 10 seconds • Wait Time: T1(2s) + T2(2s) + T3 (2s) = 6 seconds 0 1 2 3 4 • Overhead and spin Time: T1(1s) + T2(1s) + T2(1s) = 3 s
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 46 CPU Usage
Summary View: CPU Usage Histogram
Overhead and Spin Time is not counted for CPU Usage
Bottom-Up View: CPU Time
Function CPU By CPU Utilization Overhead and Time Spin Time My_Func() 10 s 4 s
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 47 Hotspot analysis
• Displays hot functions in your application • Shows most time consuming call sequences – Statistical Call Graph • Include timeline view of threads in your application
Start the Basic Analysis Hotspot Analysis
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 48 Hotspot analysis Summary
Note Elapsed Time and CPU Time
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 49 Hotspot analysis Summary (Continued)
Note overall CPU Usage
Note # of CPUs Available on the platform
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 50 Hotspots analysis Hotspot functions
Hotspot Functions Change Viewpoint Adjust Data Grouping
Function CPU time … (Partial list shown)
Call stack Click [+] for Call Stack Thread timeline
Filter by Timeline Selection (or by Grid Selection)
Filter by Module & Other Controls
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 51 Hotspots analysis Hotspot functions by CPU usage
Double Click Function to View Source Coloring CPU Overhead Time by CPU and Spin Utilization Time
Overhead and Spin on Timeline
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 52 Hotspots analysis Source View
Source View Assembly View
Self and Total Time on Source / Asm Right click for instruction reference manual
Quick Asm navigation: Click jump to scroll Asm Select source to highlight Asm
Quickly scroll to hot spots. Scroll Bar “Heat Map” is an overview of hot spots
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 53 Advanced Hotspot analysis
• Uses Intel’s CPU hardware performance collectors • Higher resolution of sampling (~1 /ms) • System wide analysis (all processes running in a system) • OS modules and drivers profiling (ring 0 level) • OS context switches and threads synchronization issues
Start the Analysis
Advanced Hotspot Analysis
Select level of Software & Services Group, Developerdata collected Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 54 Parallelism Methodology of performance profiling and tuning
How to optimize the Hotspots? • Maximize CPU utilization and minimize elapsed time • Ensure CPU is busy all the time • All Cores busy – parallelism (high concurrency)
Elapsed (Serial) Serial Time T1 T2 Elapsed (N-threads) T3 T4 Gain Time
Elapsed (Serial) / N 4T optimal Potential Time Gain
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 55 Intel ® VTune™ Amplifier XE Terminology
Concurrency - Is a measurement of the number of active threads
Thread1 Waiting Thread1
Thread2 Waiting Thread2 Concurrency Summary
Thread3 Waiting Thread3 0 1 2 3 4
Thread running
Thread waiting 1sec 1sec 1sec 1sec 1sec 1sec
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 56 Intel® VTune™ Amplifier XE Parallelism/Concurrency Analysis • For Parallelism / Concurrency analysis, – Stack sampling is done just like in Hotspots analysis – Wait functions are instrumented (e.g. WaitForSingleObject, EnterCriticalSection) – Signal functions are instrumented (e.g. SetEvent, LeaveCriticalSection) – I/O functions are instrumented (e.g. ReadFile, socket)
Start the Analysis Concurrency Analysis
Software & Services Group, Developer Products Division Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 57 Concurrency Analysis Summary
Concurrency Levels
Adjustable Metrics
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 58 Concurrency Analysis Summary: Concurrency vs. CPU Usage Histogram
Threads might be in active state, but not using CPU
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 11/26/2014 59 Concurrency View
Concurrency Overhead Wait Level
Overhead Thread is waiting Thread Thread is Transitions running
Software & Services Group, Developer Products Division Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 60 Concurrency Timeline Investigate reasons for transitions
Select and Zoom Hover over a transition line
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 61 Source Code View by Concurrency
Concurrency coloring for CPU Time against source lines
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 62 Waiting on locks
Sync Sync Sync Object Object Object Signal Signal Signal
Thread1 Waiting Thread1
Idle Thread2 Waiting Thread2
Idle Thread3 Waiting Thread3
Thread running
Thread waiting 1sec 1sec 1sec 1sec 1sec 1sec
Begin main End main thread Calculating Wait and Idle time thread
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 63 Intel® VTune™ Amplifier XE Locks and Waits Analysis • Identifies those threading items that are causing the most thread block time – Synchronization locks – Threading APIs – I/O
Start the Analysis
Locks & Waits Analysis
Software & Services Group, Developer Products Division Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 64 Locks and Waits View
Grouping by Sync Object
Waits # Wait Objects CPU Utilization Spinning Stack for the wait object
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 65 Locks-and-Waits Source View
Wait count
Waiting time on the Critical Section Critical Section object
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 66 Intel® VTune™ Amplifier XE User APIs
User APIs • Collection Control API • Thread Naming API • User-Defined Synchronization API • Task API • User Event API • Frame API • JIT Profiling API
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 67 Windows & Linux Versions Available Stand-alone GUI, Command line, Visual Studio Integration
Microsoft Windows* OS – Windows XP*1, Windows 7*, Windows 8 Desktop* – Windows Server* 2003, 2008 – Microsoft Visual Studio* 2008, 2010 and 2012 – Standalone GUI and command line – IA32 and Intel® 64 Linux* OS – RHEL*, Fedora*, SUSE*, CentOS*, Ubuntu* – Additional distributions may also work – Standalone GUI and command line – IA32 and Intel® 64 Single user and floating licenses available
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 68 Intel® VTune™ Amplifier XE Command Line Interface - Examples
• Display a list of available analysis types and preset configuration levels
amplxe-cl –collect-list
• Run Hot Spot analysis on target myApp and store result in default-named directory, such as r000hs
amplxe-cl –c hotspots -- myApp
• Run the Parallelism analysis, store the result in directory r001par
amplxe-cl -c parallelism -result-dir r001par -- myApp
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 69 Intel® VTune™ Amplifier XE Command Line Interface – Gropof-like output
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 70
Legal Disclaimer & Optimization Notice
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
Copyright © , Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon, Core, VTune, and Cilk are trademarks of Intel Corporation in the U.S. and other countries.
Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804
Software & Services Group, Developer Products Division
Copyright © 2014, Intel Corporation. All rights reserved. 72 *Other brands and names are the property of their respective owners.