I Am Grateful to My Parents, Relatives and Friends Who Help Us in Documentation of Intel

Total Page:16

File Type:pdf, Size:1020Kb

I Am Grateful to My Parents, Relatives and Friends Who Help Us in Documentation of Intel

Acknowledgment

I am grateful to my parents, relatives and friends who help us in documentation of Intel VTune ISAS.

I am also grateful to our faculty Mr. Ram Sharma (GL – NIIT ABC) for his keen guidance and support while documentation of this document in your hands. I also want to thank God who gives knowledge and potentiality to write this all documents clearly and present it in front of you readers.

During the documentation I feel lot of troubles such as crashing of Windows and damaged to computer hardware, hindrance in working of software Intel VTune. I would also like to mention the name of the person, our friend, Sandeep Chaudhary, for providing his PC for presenting this application.

And at last I would like to say a special thanks to NIIT who provides us a wonderful chance to present this document in front of you all readers. Thanks everybody who directly or indirectly helps for presenting this file.

- Vikram Singh Saini

Page 2 of 26 Contents at Glance

1. Introduction ------4

2. Sampling

2.1.Introduction------7

2.2.Sampling mechanism------7

2.3.Diff.between TBS & EBS------8

2.4.What happens during Sampling------8

2.5.Features of Sampling------9

2.6.Sampling Over Time------10

3. Call Graph

3.1 Introduction------12

3.2 Features of Call Graph------12

4. Counter Monitor

4.1 Introduction------16

4.2 Features of Counter Monitor------16

4.3 Working of Counter Monitor------19

5. Tuning Assistant

5.1 Tuning Assistant------20

5.2 Tuning Assistant Concepts------20

5.3 Features of Tuning Assistant------20

5.4 Understanding Tuning Methodology------21

Page 3 of 26 5.5 Strategies for Improving Performance------21

5.6 Types of Advice------21

5.7 Information that Tuning Assistant provides------22

6. References------24

Introduction

The VTune analyzer provides an integrated performance analysis and tuning environment that helps you analyze your code's performance on systems with IA-32, Intel(R) 64, and IA- 64 architecture. VTune analyzer can plug in into Microsoft Visual Studio and Eclipse integrated development environments. One can work with the VTune analyzer using the graphical interface and command line interface. All commands to create and run Activities must be preceded by vtl at the command line. LINUX SUPPORT The VTune(TM) Performance Analyzer can analyze the performance of your Linux* application. The VTune analyzer is installed on a controlling system and controls the run of your Linux application on a Remote Agent system. The VTune analyzer then collects data on your Linux application by collecting data remotely. JAVA SUPPORT

Page 4 of 26 When the VTune(TM) Performance Analyzer analyzes the performance of your Java* application or applet (.class), the Virtual Machine (VM) and Just-in-Time Compiler (JIT) are enhanced to provide the VTune analyzer with specific information required to analyze the performance of a Java application. During sampling, the VM and JIT provide the VTune analyzer with information about JIT- compiled Java methods being loaded into memory, such as their memory addresses, sizes, and symbol information. .NET SUPPORT The VTune(TM) Performance Analyzer enables you to profile .NET* and ASP.NET web services running on your machine. The VTune analyzer will set the necessary environment variables and restart the web service before collecting sampling or call graph data. The environment variables will be deleted and the service restarted on completing data collection. Use the sampling configuration wizard and call graph configuration wizard for profiling ASP.NET/.NET web services.

FEATUERS OF INTEL VTUNE PERFORMANCE ANALYZER 1. CALL GRAPH – Provides a graphical view of the application and helps you identify critical functions and timing details in the application. 2. SAMPLING – Calculates the actual performance of an application over a period (Time-based sampling) and for various processor events(Event-based sampling). 3. COUNTER MONITOR – Provides system level performance, such as resource consumption, during the execution of an application. 4. TUNING ASSISTANT – Provides tuning advice from an anzlusis of the performance data. The tuning advice helps you improve performance of an application. 5. HOTSPOTS VIEW – Helps identify the area of code that takes the maximum CPU time. MINIMUM REQUIREMENTS OF SOFTWARE 1. HARDWARE REQUIREMENTS Processors Supported: Servers:  Quad-Core Intel(R) Xeon(R) Processor 5300 Series  Dual-Core Intel(R) Xeon(R) Processor 5100 Series  Dual-Core Intel(R) Xeon(R) Processor 5000 Sequence  Dual-Core Intel(R) Xeon(R) Processor 7100 Series  Dual-Core Intel(R) Xeon(R) Processor 7000 Sequence  Dual-Core Intel(R) Xeon Processor LV  Intel(R) Xeon(R) processor MP  Intel(R) Xeon(R) processor

Page 5 of 26  Dual-Core Intel(R) Itanium(R) 2 processor 9000 sequence  Low Voltage Intel(R) Itanium(R) 2 Processor  Intel(R) Itanium(R) 2 processor Desktop:  Intel(R) Core(TM)2 Quad processor  Intel(R) Core(TM)2 Extreme processor  Intel(R) Core(TM)2 Duo processor  Intel(R) Core(TM) Duo processor  Intel(R) Core(TM) Solo processor  Intel(R) Pentium(R) D processor 900 sequence  Intel(R) Pentium(R) D processor  Intel(R) Pentium(R) 4 processor Extreme Edition  Intel(R) Pentium(R) processor Extreme Edition  Intel(R) Pentium(R) 4 processor Mobile:  Mobile Intel(R) Pentium(R) 4 Processor - M  Intel(R) Pentium(R) M processor  Intel(R) Celeron(R) M processor  Intel(R) Celeron(R) D processor  Intel(R) Celeron(R) processor  Mobile Intel(R) Celeron processor

2. SOFTWARE REQUIREMENTS 32-bit operating systems supporting IA-32 processors:  Microsoft* Windows XP Professional Service Pack 2  Microsoft* Windows Server 2003 Enterprise Edition Service Pack 1  Microsoft* Windows Server 2003 R2 Enterprise Edition  Microsoft* Windows Vista*  Microsoft* Windows Server 2008 RC0 (build 6001) 64-bit operating systems supporting Intel(R) processors with Intel(R) EM64T:  Microsoft* Windows XP Professional x64 Edition  Microsoft* Windows Server 2003 Enterprise x64 Edition  Microsoft* Windows Server 2003 R2 Enterprise x64 Edition  Microsoft* Windows Vista*  Microsoft* Windows Server 2008 RC0 (build 6001) 64-bit operating systems supporting Intel(R) Itanium(R) architecture processors:  Microsoft* Windows Server 2003 Enterprise Edition Service Pack 1  Microsoft* Windows Server 2008 RC0 (build 6001)

Page 6 of 26 3. SYSTEM MEMORY REQUIREMENTS At least 128 Megabytes of RAM 4. DISK SPACE REQUIREMENTS  At least 105 Megabytes of available space on a local drive  20 Megabytes of disk space is required for system files on the drive containing the system directory (for example, C:\) The additional hard disk space is needed for updating and installing the DLLs and OCXs that the VTune analyzer requires to be in the system directory.

Sampling

INTRODUCTION

Sampling is the process of collecting a set of data for analysis and representing the analyzed data in a statistical format. Use the collected data to identify the critical processes, threads, modules, functions, and lines of code running on system.

During sampling, the VTune(TM) Performance Analyzer monitors all the software executing on your system including the operating system, JIT-compiled Java* applications, .NET* applications, and device drivers. Sampling does not modify binary files or executables in order to monitor the performance of application. The VTune analyzer analyzes the collected samples and helps you to identify:- 1. Hotspots- Is a section of code within a module that took long time to execute. This results in high amount of processor time spent executing that section, thus generating lot of samples for that module. 2. BottleNecks- Is an area in the code that is slowing down the execution of application. Bottlenecks appears as hotspots in hotspot view. Removing bottlenecks and hotspots optimize the application.

Page 7 of 26 TWO TYPES OF SAMPLING MECHANISM TO COLLECT SAMPLING DATA

1. TIME-BASED SAMPLING (TBS) : The VTune(TM) uses the operating system timer to interrupt and collect samples of all active instruction addresses at a regular time interval (1ms. by default). The collected samples provide the performance data of all the processes running on the system. Processes that took the longest time to execute have the highest number of samples.

2. EVENT-BASED SAMPLING (EBS) : Use to identify system-wide software performance problems caused by processor events, such as Cache Misses and Mispredicted Branches.

From the EBS data, one can determine which process, thread, module, function, and source line in program generated the most processor events, and if any of those events impacted the performance of program. The VTune analyzer provides predefined event ratios recommended for use by performance analysts at Intel.

FIGURE 1: Event based sampling

DIFFERENCES B/W TBS AND EBS

Page 8 of 26 EBS – Data is collected using Clocktick events. But when HLT instructions are executed by processor clock, the processor clock causes the clockticks events to stop occurring. This results in no samples being collected while the processor is in halt state. The VTune will report few samples than you were expected.

TBS- Data is collected using OS timer. And OS timer is not affected during HLT instructions. And the samples are collected accuratelty. TBS can potentially gives more accurate data.

WHAT HAPPENS DURING SAMPLING

When you run an Activity configured with the sampling collector, the VTune analyzer does the following:  Waits for the delay sampling time (if specified) to elapse and then starts collecting samples.  Interrupts the processor at the specified sampling interval and collects samples of instruction addresses. For every interrupt, the VTune analyzer collects one sample.  Stores the execution context of the software currently executing on system. FEATURES OF SAMPLING

The following are the main features of the sampling collector and views: 1. Collection  Multiple event sampling. Perform event based sampling with multiple events in one run. Depending on the type of processor using, the VTune analyzer can monitor and collect samples on two or more events in one run.  Remote sampling. Collect sampling data for an application running on a remote system. Your remote system can be a machine running on any operating system supported by the VTune analyzer.  Collect sampling data for applications running on systems enabled with Hyper- Threading Technology. 2. Views The following sampling views help you analyze the data:  Thread view. View the threads running within a process and select one or more threads to drill down to specific hotspots.  Summary view. Opens default for clocktick events.  Process view. Display a system-wide view of all the processes running on your system when sampling data was collected.  Module view. Display all the modules within selected threads.  Hotspot view. Display function names associated with selected modules. Group hotspots by function, related virtual address (RVA), source file, or class.

Page 9 of 26 3. Accessories The following panels and toolbar options are available from the sampling view:  Sampling toolbar. A sampling toolbar is available at the top of each sampling view. This toolbar includes buttons labeled Process, Thread, Module, Hotspot, and Source. Select items within a view and click one of the buttons to drill down.  Tabbed windows. When you open a specific sampling view, a tab is created at the bottom of the window labeled with the name of the view, for example, Process, Thread, Module or Hotspot. If you open several views, a tab for each open view is created at the bottom of the window. You can use the tabs to quickly move from one view to another.  Microsoft Excel. Display your sampling data in a Microsoft Excel 2000 spreadsheet. You can customize the appearance of the spreadsheet report as needed.  Selection Summary panel. View/hide a panel displaying the events configured in an Activity and the number of samples collected per event for the items you select in a view.  Legend. Display a detailed legend for all sampling views. Each Activity result, event, and event ratio is color-coded. The legend explains what each color represents.  Event summary panel. Display the total number of events collected for items you select in a view.  Multi-processor. Display the workload as distributed across multiple processors. SAMPLING OVER TIME

1. The Over time view displays the samples collected for single event.

2. It enables you to identify which thread are running serially and in parallel at any point of time.

3. Sampling Over Time view can be invoked for Thread,Process and Module views.

4. Sampling over time view consists of two panels. The left panel displays the names of the selected items and the right panel displays the samples collected over time. The right panel is divided into squares, each square representing a unit of time in seconds.

5. The color of the squares indicates the number of samples collected for that unit of time. A red square indicates a large number of samples, and a green square indicates a small number of samples.

Page 10 of 26 FIGURE 2: Sampling Over Time

The Over Time view can be used to gather the following information:

 Context Switching: One can determine if there is excessive context switching.

 Processor Utilization: Enables you to view whether processor is idle or not. If sytem process receives samples there is scope for improving processor utilization at that time.

 Temporal loction of hotspots: We can see the specific periods of time when a large number of events occurs.

 Thread Interaction: You can view the pattern of thread behavior and thread interaction.

 Viewing the footprint of each thread: You can view the footprint of each thread on Hyper-Threading technology enabled processors.

Page 11 of 26 Call Graphs

INTRODUCTION

The call graph collector of the VTune(TM) Performance Analyzer collects information about the program flow of an application, that is, the number of function calls to some other function and the amount of time each function spent executing its code and/or calling other functions. A function can be a- 1. CALLER – A parent function that calls the current function. 2. CALLEE – A child function that is called by the current function. In many cases, the caller may call the callee from several places (sites), so call graph also provides call information per site.

Page 12 of 26 FEATURES OF CALL GRAPH

The following are the main features of the call graph collector and views: 1. Collection  Manual launching mode. Manually launch your application from the desktop and select required modules of interest to analyze.  DLL-Level Data Collection. Configure the call graph collector to instrument and analyze first-level DLLs even when the application itself cannot be instrumented.  Instrumentation filtering. Select exactly which functions to instrument, improving the speed of the instrumented application by using improved filtering capabilities.  Multi-thread, multi-process. Collect data for more than one process with fully automated threading and fiber support.  COM Tracing. Profile COM interface methods using the call graph collector. 2. Views After you collect call graph data using the VTune analyzer, you can view the call graph profiling information at the following levels:  GRAPH: Provides visual graphical presentation of the application execution. It displays the selected function(s), the function's parents (callers), its child functions (callees), and timing information. Each node (box) in the graph represents a function. Each edge (line with an arrow) connecting two nodes represents the call from the parent to the child function. For every function you can traverse caller and callee functions. The call graph view uses the following conventions:  Nodes connected by thick red edges designate functions on the critical path from the root (thread).  The thicker the edge, the greater the Edge time. Uses of this view:  estimate the performance of your application  find potential performance bottlenecks  traverse the critical path, which is a path with the maximum Edge time.

Page 13 of 26 FIGURE 3: Graph view of Call Graph

 CALL LIST : Provides full information on the selected or focus function, its callers (parents) and callees (children) in the table format. The focus function is the function which is currently being viewed and the focus is on that function. It shows the threads and classes associated with it. The caller function is the function which calls the focused function. Along there are columns of contribution, Edge time, thread,class etc. The callee function is the function which is been called by the focused function. There are also columns almost same as that of caller function.

Page 14 of 26 FIGURE 4: Call List view of Call Graph

 FUNCTION SUMMARY: Provides full information on all the application functions in the table format. The rows in the function summary display functions with different background colors according to the hierarchical position. The default view shows the first four types of data as follows:

FIGURE 5: Function Summary view of Call Graph

3. Accessories Following are the various options available from the call graph view:

Page 15 of 26  Filtering options. Gain different perspectives on your data using the wide range of filtering options available.  Function detail. Conveniently view detailed function information using tooltips and the status bar.  Unified Java support. View Java function calls and Win32 function calls in the same call graph results.  Timing options. View enriched timing information with an expanded collection of wait times for functions and calls. Traverse Self Wait time, Total Wait time, Edge time, Edge Wait time, and Max path from node to root and from node to bottom.  Node state indicators. Adjust the color palette for any graph elements and control node length settings to support long function names. Node state indicators highlight three different types of node status, facilitating orientation within the graph view.  Command access. Control a wide range of options in the function summary view via the function summary pop-up menu. The toolbar contains enhanced features, provides quick and easy access to the most commonly used commands.  Multiple undo/redo. Make changes to the way you view data, then return or advance forward through several cycles of changes.

Page 16 of 26 Counter Monitor

INTRODUCTION

Counter Monitor identifies system-level issues in applications. It is used to track system activities when the application runs on the system.

Counter Monitor collects data for specific performance counter data, such as that of an application, an OS, or a hardware device at different intervals of time. The counter monitor collector monitors and graphically displays the performance counter data.

Performance counter is a feature that measure and gathers performance related data that represents the state of the system without affecting the performance of the program.

Counter monitor also helps you to understand the cause-and-effect relationship between an application and the sytem on which the application is running. If you develop application specific counters using performance dll’s ,the VTune analyzer will also monitor and display these counter values.

FEATURES OF COUNTER MONITOR

The following are the main features of the counter monitor collector and views: 1.Collection  Trigger mechanism. Create triggers to monitor hardware and software counters at predetermined intervals according to criteria that is set. A trigger is an event that tells the VTune™ Performance Analyzer when to collect counter data. The VTune analyzer uses the system timer as the default trigger. For the system timer, performance data is collected once per second when the default interval (1000 milliseconds) is used. 2. Views Following are the counter monitor views to help analyze the data:

 Runtime Data view. During runtime, the VTune analyzer generates a graph that shows changes as they happen. View data as you log it or review data after the run. This is the default view which runs on completion of an activity.

Page 17 of 26  Logged Data view. Displays data logged during an Activity. In the Logged Data view, data from each counter selected for logging is charted with a separate line and color. Each line on the chart represents data for a specific performance counter. The peak indicates the highest counter value. Moving the cursor over a counter on the chart displays a tool tip with the value of the counter at that point in time during data collection.

FIGURE 6: Logged Data View of Counter Monitor The peaks in each counter indicates the highest counter activity. For example, a peak in the counter that measures Page Faults per second indicates that the most page faults occurred at that point in time during data collection.

Legend view. Each line includes a distinct legend symbol for the corresponding counter, representing the point at which data was taken. The vertical Y axis represents counter values (scaled or actual), while the corresponding time is displayed on the horizontal X axis.

Page 18 of 26  Summary Data view. Displays a statistical view of the counter data. The Summary Data view provides statistical information for each counter you selected for display in the Logged Data view. This information includes:  minimum value  maximum value  average value This enables you to determine which values were the most active, or otherwise interesting, and drill down from a Logged Data view of those values.

FIGURE 7: Summary Data View of Counter Monitor

Page 19 of 26 The summary data for each counter is represented as a bar diagram:

where the upper part of the diagram is the maximum value for the counter (in the example: % Total Processor Time counter), the lower part is the minimum value, and the middle part (violet bar in the example) is the average counter value. 3. Accessories Following are some options available from the counter monitor view:

 Control charts. Choose a chart style best suited to the data you want to view using the Chart FX Properties. WORKING OF COUNTER MONITOR

When one select an Activity with the counter monitor collector in the Tuning Browser and click Run Activity to begin performance data collection, the VTune analyzer does the following: 1. Launches the specified application, if any. 2. Starts monitoring and logging the counter values. The VTune analyzer collects performance data for all the counters of a performance object but displays only the counters you select. 3. Displays the RunTime data view with a chart showing the counter data as it is being collected, if the runtime display option is selected. 4. If sampling data collection was turned on, it also starts collecting time based or event based sampling data. 5. At the end of an Activity run, if counter monitor data was logged, the VTune analyzer does the following: o Creates an Activity result with the counter monitor data and shows it in the Tuning Browser. o Displays the counter monitor Logged Data view if the counter monitor data is the only type of data that was collected, or prompts you to pick a view to open if multiple types of data were collected.

Page 20 of 26 Tuning Assistant

INTRODUCTION

The Intel(R) Tuning Assistant provides advice on tuning your system resources and application performance. Using its multiple knowledge bases, the Tuning Assistant analyzes the data collected by the VTune(TM) Performance Analyzer, identifies performance issues, and provides insights and tuning advice on the following types of data:  Sampling data collected on supported processors  Counter monitor data collected on supported operating systems.  C, C++, Fortran, or Java* source code  Disassembled assembly code TUNING ASSISTANT CONCEPTS The following are some key Tuning Assistant concepts:  Workload. All the software that was executing when data was collected.  Insight. An insight is an observation about the performance of your code. It indicates a potential performance problem that could be a bottleneck to your application’s performance.  Advice. Advice is a possible solution or recommended workaround (usually a suggestion to modify the code) to remove or avoid a performance problem.  Relevance Score. A relevance score is a heuristic to indicate how relevant a particular insight or advice is to the current context. For instance, an extremely high relevance score for an insight may indicate a high probability of a performance bottleneck. The Tuning Assistant provides tuning advice for code, processes/modules/functions, or time ranges that you select in source, sampling, or counter monitor views. If you provide symbol information, the Tuning Assistant window provides links from your function names directly to the corresponding code section in Source View. FEATURES OF TUNING ASSISTANT

Page 21 of 26 The Intel(R) Tuning Assistant has the following features to enable analyzing the performance of your application:  Provides insights and advice on potential performance problems by analyzing sampling data collected on supported processors (See the Release Notes for a complete list of processors for which the Tuning Assistant can provide insights and advice). You can use the insights and advice to make algorithmic changes to your application so the processor can execute your application more efficiently.  Contains knowledge bases to support Hyper-Threading Technology.  Enables you to compare two or three Activity results.  Provides links from function names directly to the corresponding code section in source view when you provide symbol information  Provides advice on performance counter data and disassembly code  Provides static assembly advice.  Guides you through the key steps of performance tuning methodology  Provides the ability to export the tuning advice report to a .csv (comma separated values) text file for viewing and editing using a different application, such as Microsoft* Excel. UNDERSTANDING TUNING METHODOLOGY

1. System-Level Tuning – The main objective of system-level tuning is to optimize the utilization of system resources. The tuning speeds up application performance by improving the way the application interacts with the sytem. This tuning is effective for I/O applications.

2. Application-Level Tuning- The main purpose of application-level tuning is to reduce the execution time of an application. This can be achieved by improving the algorithms of the applications, implementing threads, and by using APIs.

3. Microarchitecture-Level Tuning- Increases the performance of application by improving the way an application runs on a processor. This type of tuning is used with processor-intensive applications.

STRATEGIES FOR IMPROVING PERFORMANCE OF APPLICATION

 Balancing Input-Output- Enables to speed up application when processor utilization is low. Processor utilization drops when the processor is waiting for I/O to complete. Need to make changes in app. during system level and application-level tuning.

 Improving threading model- By adding multithreading to single-threaded app. Improve efficiency of app. by increasing processor utilization.

Page 22 of 26  Improving the efficiency of computation- Speed up application by making changes to the application to accomplish the same amount of work by using less computation.

TYPES OF ADVICE

. Sampling -based advice – Tuning assistant automatically analyzes the sampling data,identifies performance issues, and provides insights on the issues. When one click an insight, the More Information window provides additional information. This window contains Relevance scale that can be use to view the relevance of a particular insight to performance issues.

FIGURE 8: Advice Window (Showing Sampling-based advice) of Tuning Assistant Advice . Counter Monitor-based advice – Tuning assistant performs counter analysis based on all counters measured in activity. After analysis, TA displays insights into potential performance bottlenecks. . Source-based advice – TA uses a compiler technology for source-based advice, which enables you to speed up the execution of code. But it is limited to C,C++ and Java applications. . Static Assemble Penalties – VTune analyze code at assembly language level. The two categories of information that TA displays are: 1. Penalty – Indicates a specific problem and the effect of the problem on performance of code. 2. Warning – Indicates potential problems that might degrade the performance.

Page 23 of 26 INFORMATION THAT TUNING ASSISTANT PROVIDES INCLUDES:

 INSIGHTS – Indicates the problem that could be hindering the performance of the application. Various categories of insights are:-

1. Top Insights – That are estimated to have significant impacts on performance. Enables to identify the maximum optimization that one can achieve for the application.

2. Workload Insights- Are performance issues for all modules and processes. (See fig. 8)

3. Module Insights – Focus on performance issues for the modules in an application.(See fig. 8)

4. Hotspots Insights- Insights on performance issues based on functions that are sorted by percentage of CPU time.

5. System Info- Summarizes the features that the system uses such as sped of processor and the name of the operating system.

6. Static Analysis – View information about possible optimizations to improve app. performance.

Page 24 of 26 FIGURE 9: More Information Window of Tuning Assitant Advice  RELEVANCE SCALE – Indicates the relevance of the insight or advice to a particular performance issue. For example, a high relevance score indicates that the effect of the problem on the application is significant or 100%.(See fig. 9)  TUNING ASSISTANT ADVICE- Possible solution to remove or avoid a problem. One can click on links as shown in fig. 8 to get advice.

References

Page 25 of 26 The following are the references which have been used for documentation purpose.

© Help file – Intel VTune software help file is used.

© Books-

1. Intel VTune Performance Analyzer Essentials (Author: James Reinders)

2. 3rd Semester Intel VTune (By NIIT)

© Websites-

1. www.intel.com

2. www.hiperism.com

Page 26 of 26

Recommended publications