Openss>>Expview

Openss>>Expview

How to Analyze the Performance of Parallel Codes with Open|SpeedShop Jim Galarowicz: Krell Instute Don Maghrak, Krell Instute LLNL-PRES-503451 Oct 27-28, 2015 Presenters anD ExtenDeD Team v Jim Galarowicz, Krell v Donald Maghrak, Krell Open|SpeedShop extended team: v William Hachfeld, Dave Whitney, Dane Gardner: Argo Navis/Krell v Jennifer Green, David Montoya, Mike Mason, David Shrader: LANL v MarHn Schulz, MaJ Legendre and Chris Chambreau: LLNL v Mahesh Rajan, Anthony Agelastos: SNL v Dyninst group (Bart Miller: UW & Jeff Hollingsworth: UMD) v Phil Roth, Mike Brim: ORNL v Ciera Jaspan: CMU How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 2 Outline Welcome IntroducHon to Open|SpeedShop (O|SS) tools Overview of performance experiments Ø How to use Open|SpeedShop to gather anD Display Ø Sampling Experiments Ø Tracing Experiments Ø How to compare performance Data for Different applicaon runs Advanced and New FuncHonality v Component Based Tool Framework (CBTF) based O|SS Ø New lightweight iop anD mpip experiments Ø New: memory experiment Ø New: CUDA/GPU experiment Ø New: pthreads experiment What is new / Roadmap / Future Plans Supplemental, or Interest Ø CommanD Line Interface (CLI) tutorial anD examples How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 3 How to Analyze the Performance of Parallel Codes with Open|SpeedShop Secon 1 IntroDucNon into Tools anD Open|SpeeDShop Oct 27-28, 2015 Open|SpeeDShop Tool Set v Open Source Performance Analysis Tool Framework Ø Most common performance analysis steps all in one tool Ø Combines tracing anD sampling techniques Ø Extensible by plugins for Data collecNon anD representaon Ø Gathers anD Displays several types of performance informaon v Flexible and Easy to use Ø User access through: GUI, Command Line, Python Scrip@ng, convenience scripts v Scalable Data CollecHon Ø Instrumentaon of unmodified applica@on binaries Ø New opNon for hierarchical online data aggregaon v Supports a wide range of systems Ø Extensively useD anD testeD on a variety of Linux clusters Ø Cray, Blue Gene, ARM, Intel MIC, GPU support How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 5 Open|SpeedShop Workflow srun –n4 –N1 smg2000 –n 65 65 65 srun –n4 –N1 smg2000 –n 65 65 65 osspcsamp “srun –n4 –N1 smg2000 –n 65 65 65” MPI Applicaon O|SS Post-mortem hp://www.openspeedshop.org/ Oct 27-28, 2015 Alternave Interfaces v Scripng language Ø ImmeDiate commanD interface Experiment Commands Ø O|SS interacNve commanD line (CLI) expView • openss -cli expCompare expStatus List Commands v list –v exp Python module list –v hosts import openss list –v src my_filename=openss.FileList("myprog.a.out")Session Commands my_exptype=openss.ExpTypeList("pcsamp") openGui my_id=openss.expCreate(my_filename,my_exptype) openss.expGo() My_metric_list = openss.MetricList("exclusive") my_viewtype = openss.ViewTypeList("pcsamp”) result = openss.expView(my_id,my_viewtype,my_metric_list) How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 7 Central Concept: Experiments v Users pick experiments: Ø What to measure and from which sources? Ø How to select, view, and analyze the resulting data? v Two main classes: Ø Statistical Sampling • Periodically interrupt execution and record location • Useful to get an overview • Low and uniform overhead Ø Event Tracing • Gather and store individual application events • Provides detailed per event information • Can lead to huge data volumes v O|SS can be extended with additional experiments Oct 27-28, 2015 Sampling Experiments in O|SS v PC Sampling (pcsamp) Ø RecorD PC repeateDly at user DefineD Nme interval Ø Low overhead overview of Nme DistribuNon Ø GooD first step, lightweight overview v Call Path Profiling (userme) Ø PC Sampling anD Call stacks for each sample Ø ProviDes inclusive anD exclusive Nming Data Ø Use to finD hot call paths, whom is calling who v Hardware Counters (hwc, hwcHme, hwcsamp) Ø ProviDes profile of harDware counter events like cache & TLB misses Ø hwcsamp: • PerioDically sample to capture profile of the coDe against the chosen counter • Default events are PAPI_TOT_INS anD PAPI_TOT_CYC Ø hwc, hwcme: • Sample a harDware counter Nll a certain number of events ( calleD thresholD) is recorDeD anD get Call Stack • Default event is PAPI_TOT_CYC overflows How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 9 Tracing Experiments in O|SS v Input/Output Tracing (io, iot) Ø RecorD invocaon of all POSIX I/O events Ø ProviDes aggregate anD inDiviDual Nmings Ø Store funcNon arguments anD return coDe for each call (iot) v MPI Tracing (mpi, mpit, mpio) Ø RecorD invocaon of all MPI rouNnes Ø ProviDes aggregate anD inDiviDual Nmings Ø Store funcNon arguments anD return coDe for each call (mpit) Ø Create Open Trace Format (OTF) output (mpiol) v FloaHng Point ExcepHon Tracing (fpe) Ø TriggereD by any FPE causeD by the applicaon Ø Helps pinpoint numerical problem areas How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 10 Performance Analysis in Parallel v How to deal with concurrency? Ø Any experiment can be applieD to parallel applicaon • Important step: aggregaon or selecNon of Data Ø Special experiments targeNng parallelism/synchronizaon v O|SS supports MPI and threaded codes Ø Automacally applieD to all tasks/threads Ø Default views aggregate across all tasks/threads Ø Data from inDiviDual tasks/threads available Ø Thread support (incl. OpenMP) baseD on POSIX threads v Specific parallel experiments (e.g., MPI) Ø Wraps MPI calls anD reports • MPI rouNne Nme • MPI rouNne parameter informaon Ø The mpit experiment also store funcNon arguments anD return coDe for each call How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 11 How to Run a First Experiment in O|SS? 1. Picking the experiment Ø What Do I want to measure? Ø We will start with pcsamp to get a first overview 2. Launching the applicaon Ø How Do I control my applicaon unDer O|SS? Ø Enclose how you normally run your applica7on in quotes Ø osspcsamp “mpirun –np 256 smg2000 –n 65 65 65” 3. Storing the results Ø O|SS will create a database Ø Name: smg2000-pcsamp.openss 4. Exploring the gathered data Ø How Do I interpret the Data? Ø O|SS will print a default report Ø Open the GUI to analyze data in detail (run: “openss”) How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 12 Example Run with Output (1 of 2) v osspcsamp “mpirun –np 4 smg2000 –n 65 65 65” Bash> osspcsamp "mpirun -np 4 ./smg2000 -n 65 65 65" [openss]: pcsamp experiment using the pcsamp experiment Default sampling rate: "100". [openss]: Using OPENSS_PREFIX installeD in /opt/ossoffv2.1u4 [openss]: Seng up offline raw Data Directory in /opt/shareD/offline-oss [openss]: Running offline pcsamp experiment using the commanD: "mpirun -np 4 /opt/ossoffv2.1u4/bin/ossrun "./smg2000 -n 65 65 65" pcsamp" Running with these Driver parameters: (nx, ny, nz) = (65, 65, 65) … <SMG nave output> … Final Relave ResiDual Norm = 1.774415e-07 [openss]: ConverNng raw Data from /opt/shareD/offline-oss into temp file X.0.openss Processing raw Data for smg2000 Processing processes anD threads ... Processing performance Data ... Processing funcons anD statements ... Resolving symbols for /home/jeg/DEMOS/workshop_demos/mpi/smg2000/test/smg2000 How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 13 Example Run with Output (2 of 2) v osspcsamp “mpirun –np 4 smg2000 –n 65 65 65” [openss]: Restoring anD Displaying Default view for: /home/jeg/DEMOS/workshop_demos/mpi/smg2000/test/smg2000-pcsamp.openss [openss]: The restoreD experiment iDenNfier is: -x 1 Exclusive CPU Nme % of CPU Time FuncNon (Defining locaon) in seconds. 7.870000 43.265531 hypre_SMGResiDual (smg2000: smg_resiDual.c,152) 4.390000 24.134140 hypre_CyclicReDucNon (smg2000: cyclic_reDucNon.c,757) 1.090000 5.992303 mca_btl_vader_check_oxes (libmpi.so.1.4.0: btl_vader_ox.h,108) 0.510000 2.803738 unpack_preDefineD_data (libopen-pal.so.6.1.1: opal_datatype_unpack.h,41) 0.380000 2.089060 hypre_SemiInterp (smg2000: semi_interp.c,126) 0.360000 1.979109 hypre_SemiRestrict (smg2000: semi_restrict.c,125) 0.350000 1.924134 __memcpy_ssse3_back (libc-2.17.so) 0.310000 1.704233 pack_preDefineD_data (libopen-pal.so.6.1.1: opal_datatype_pack.h,38) 0.210000 1.154480 hypre_SMGAxpy (smg2000: smg_axpy.c,27) 0.140000 0.769654 hypre_StructAxpy (smg2000: struct_axpy.c,25) 0.110000 0.604728 hypre_SMGSetStructVectorConstantValues (smg2000: smg.c,379) .... v View with GUI: openss –f smg2000-pcsamp.openss How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 14 Default Output Report View Performance Data Default view: by Function Toolbar to switch (Data is sum from all Views processes and threads) Select “Functions”, click D-icon Graphical Representation How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 15 Statement Report Output View Performance Data View Choice: Statements Select “statements, click D-icon Statement in Program that took the most time How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 16 Associate Source & Performance Data Double click to open Use window controls to source window split/arrange windows Selected performance data point How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 17 Library (LinkedObject) View Select LinkedObject View type and Click on D-icon Shows time spent in libraries. Can indicate imbalance. Libraries in the application How to Analyze the Performance of Parallel CoDes with Open|SpeedShop Oct 27-28, 2015 18 Loop View Select Loops View type and Click on D-icon Shows time spent in loops.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    148 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us