Optimizing Processor Performance for Wintel Applications: a Case Study

Optimizing Processor Performance for Wintel Applications: a Case Study

Optimizing processor performance for Wintel applications: a case study Demand Technology Software 1020 Eighth Avenue South, Suite 6, Naples, FL 34102 phone: (941) 261-8945 fax: (941) 261-5456 e-mail:markf@ demandtech.com http://www.demandtech.com 1998 Demand Technology, Inc. Application tuning case study ■ The target application is a C program written to perform interval performance data collection for Windows NT ♦ It is important that this program perform well because “If you are not part of the solution, you are part of the problem.” ♦ Performance Analysts use our product, and they are very demanding customers 1998 Demand Technology, Inc. Wintel application tuning: processor optimization 2 Application tuning case study ■ At a point in development where the code was reasonably mature and stable, I subjected it to performance analysis using several tools. ♦ Microsoft Visual C++ version 5 ♦ Rational Visual Quantify execution profiler ♦ Intel vTune version 2.5 optimization tool 1998 Demand Technology, Inc. Wintel application tuning: processor optimization 3 Windows NT on Intel hardware ■ In order to use vTune effectively, it helps to understand how Intel processor hardware works ♦ extensive Intel processor documentation ships on the product CD ■ Target environment: ♦ Microsoft Windows NT 4.0 ♦ Intel Pentium and Pentium Pro hardware 1998 Demand Technology, Inc. Wintel application tuning: processor optimization 4 NT performance monitoring ■ Performance SeNTrytm collection agent initialization loop until cycle end = TRUE; Win32 API calls to retrieve performance data; calculate; Write data to file; end loop; 1998 Demand Technology, Inc. Wintel application tuning: processor optimization 5 Win32 performance monitoring API ■ Familiar and well-documented interface ♦ The only programmatic way to enumerate the Processes running on an NT system ♦ NT Performance data is structured as ■ Objects (records) ■ Counters (fields) ♦ Collection agents are associated with Objects ■ NT base Objects, including kernel Objects ■ extended Objects require a Perflib dll 1998 Demand Technology, Inc. Wintel application tuning: processor optimization 6 Data C ollection sets ■ Define proper subsets of a Master C ollection set ♦ Defines all known Objects and Counters ♦ Some Objects are instanced: there can be multiple occurrences of instanced O bjects ♦ Two Parent:Child relationships defined ■ Process is the parent of Thread ■ Physical Disk is the parent of Logical Disk 1998 Demand Technology, Inc. Wintel application tuning: processor optimization 7 Data C ollection sets ■ Performance considerations ♦ Data collection is performed one Object at a time ■ This was necessary due to a bug in the Win32 collection services ■ An n:1 correspondence between Objects and their associated collection routines ♦ With the exception of C hild O bjects ■ They are collected at the same time as the Parent Objects ♦ There are many instances of some Objects ■ Process and Thread 1998 Demand Technology, Inc. Wintel application tuning: processor optimization 8 Data C ollection sets ■ Performance considerations ♦ There are compelling reasons why data collection should be done at frequent intervals ■ identified by Buzen and Shum, 1996 ♦ Performance data for processes that terminate before the end of the interval is lost ♦ one collection interval used for both Accumulator C ounters (processor time) and Instantaneous Counters (e.g., processor Queue length) 1998 Demand Technology, Inc. Wintel application tuning: processor optimization 9 Data C ollection sets ■ Performance considerations ♦ Ideally, collection should be performed at least once per minute; ♦ possibly, some Objects could be collected even more frequently in order to accumulate samples of Instantaneous C ounter values ♦ Can our code handle it? 1998 Demand Technology, Inc. Wintel application tuning: processor optimization 10 Goals of the tuning exercise ■ Profile our code execution path so that we can understand it better ♦ Profilers eliminate a lot of idle speculation about what your code is doing ■ Better understand the Win32 services and their interaction with our code ♦ We cannot changes these services, but perhaps we can interact with them in better ways 1998 Demand Technology, Inc. Wintel application tuning: processor optimization 11 Goals of the tuning exercise ■ Evaluate code optimization strategies ♦ optimizing Compiler options ■ Pentium and Pentium Pro specific optimizations ♦ In-line assembler ♦ Code restructuring ♦ etc. ■ Feed results forward into the development process 1998 Demand Technology, Inc. Wintel application tuning: processor optimization 12 VC++ code profiler ■ Built-in compiler option ■ Times program functions during run time ♦ Must run the application under the debugger ■ Creates a text report showing: ♦ F unction time ♦ Function+Child Function time ♦ Hit Count ■ Example: DefaultCollectionSet once per second 1998 Demand Technology, Inc. Wintel application tuning: processor optimization 13 VC++ code profiler output Module Statistics for dmperfss.exe ---------------------------------- Time in module: 283541.261 millisecond Percent of time in module: 100.0% Functions in module: 155 Hits in module: 11616795 Module function coverage: 72.3% Func Func+Child Hit Time % Time % Count Function --------------------------------------------------------- 248146.507 87.5 248146.507 87.5 249 _WaitOnEvent (dmwrdata.obj) 8795.822 3.1 8795.822 3.1 393329 _WriteDataToFile (dmwrdata. 4413.518 1.6 4413.518 1.6 2750 _GetPerfDataFromRegistry (dm 3281.442 1.2 8153.656 2.9 170615 _FormatWriteThisObjectCount 3268.991 1.2 12737.758 4.5 96912 _FindPreviousObjectInstanceC 2951.455 1.0 2951.455 1.0 3330628 _NextCounterDef (dmwrdata.ob 1998 Demand Technology, Inc. Wintel application tuning: processor optimization 14 VC++ profiler: Observations ■ Our program is “sleeping” 87.5% of the time! ■ Can only look at your program’s code ♦ If your function is spending all its time making Win32 API calls or calling other dlls, they are not visible ■ Parent-child relationships among modules are not readily apparent 1998 Demand Technology, Inc. Wintel application tuning: processor optimization 15 Rational Visual Quantify ■ Add-on product ♦ Visual Studio “integration” ■ Select profiling at the level of the function call or the line ■ Adds instrumentation to each module during the runtime session ♦ Includes all shareable and relocatable exes and dlls called by your program! 1998 Demand Technology, Inc. Wintel application tuning: processor optimization 16 Rational Visual Quantify ■ Reporting ♦ graphic view of your program’s critical execution path ■ breaks out dlls and some system services ♦ parent-child relationships among modules is explicit ♦ convenient navigation between views ■ Performs analysis of ∆ between two execution runs 1998 Demand Technology, Inc. Wintel application tuning: processor optimization 17 Rational Visual Quantify 1998 Demand Technology, Inc. Wintel application tuning: processor optimization 20 Rational VQ: Observations ■ Added instrumentation affects absolute function time values observed ♦ We only spent 32% of our time “Sleeping” ♦ relative timing relationship between functions appear unaffected ■ App is very intuitive and easy to use ♦ e.g., relationships between function calls ■ Ability to trace module execution through 3rd party functions can be very useful! 1998 Demand Technology, Inc. Wintel application tuning: processor optimization 23 Intel vTune ■ Standalone execution profiler ■ Relies on system-wide sampling ♦ maps the location of the Program C ounter to the module in memory ♦ catches every program, including the OS ■ Optionally, can also be used to report on the Pentium/Pentium Pro performance metrics during program execution 1998 Demand Technology, Inc. Wintel application tuning: processor optimization 24 Intel vTune ■ High percentage of samples showed NT running the Idle Thread! ■ Switched to Master Collection set once per second to generate more activity ♦ Rational VQ overhead was too high to perform a comparable test ♦ R esult: very different profile of activity 1998 Demand Technology, Inc. Wintel application tuning: processor optimization 26 Intel vTune ■ Hotspot analysis showed two functions accounted for > 70% of the activity inside our process address space ♦ NextInstanceDef ♦ IsPreviousAndParentSameInstance ■ vTune analyzes x86 assembler code to assist you in taking advantage of the superscalar features of the P5 1998 Demand Technology, Inc. Wintel application tuning: processor optimization 28 Intel processor performance overview ■ Complex Instruction set (CISC) ♦ Maintain upward compatibility with original 8-bit 8080 instruction set ■ With improvements in semiconductor fabrication, add ♦ pipelining, TLB, cache, branch prediction ■ 486 ♦ elements of RISC superscalar processors ■ Pentium, Pentium Pro 1998 Demand Technology, Inc. Wintel application tuning: processor optimization 29 Intel processor evolution Processor Year Clock Speed Bus Width Addressable Transistors (MHz) (bits) Mem or y 8080 1974 2 8 64K 6,000 8086 1978 5-10 16 1 MB 29,000 8088 1979 5-8 8 1 MB 29,000 80286 1982 8-12 16 16 MB 134,000 386 DX 1985 16-33 32 4 GB 275,000 486 DX 1989 25-50 32 4 GB 1,200,000 Pentium 1993 60-233 32 4 GB 3,100,000 Pentium Pro 1995 150-200 64 4 GB 5,500,000 Pentium II 1997 233-333 64 4 GB 7,500,000 1998 Demand Technology, Inc. Wintel application tuning: processor optimization 30 Intel processor evolution Processor Highlights 8080 1 chip microprocessor 8086 10X performance 8080 8088 8 bit version of 8086 80286 Virtual

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    46 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us