The Debugger
Total Page:16
File Type:pdf, Size:1020Kb
Debugging, profiling and performance tuning with Arm Forge 03-03-2021 Confidential © Arm 2018 About this Presentation Who are we? Presentation objectives • Suyash Sharma, Senior AE, based in • Overview main features of DDT, MAP Manchester UK and Performance Reports • Arm Linux/HPC SW Support Team: • Introduction to using Arm debugging [email protected] and profiling tools for HPC applications • Ryan Hulguin, Senior AE, based in development Tennessee US 2 xxxx rev 00000 Agenda 4:15-4:45pm: • Forge Overview (5 mins) • Debugging with DDT & demo (10 mins) • Profiling with MAP & PR (15 mins) 4:45-5:00pm: • Forge demo (15 mins) • Further performance tuning with Performance Reports (5 mins) • Hands-on, Q&A: debugging, profiling your own application (10 mins) 3 xxxx rev 00000 Arm Forge Overview Confidential © Arm 2018 About Arm (ex-Allinea) interoperable tools for HPC Arm (ex-Allinea) Tools: leading toolkit for HPC application developers • Available on 65% of the top 100 HPC systems • Help maximise application efficiency with Performance Reports • Help the HPC community design the best applications with Forge In December 2016 Allinea joined Arm • Continue to be the trusted HPC Tools leader in tools across every platform • Our engineering roadmap is aligned with upcoming architectures from every vendor • We remain 100% committed to providing cross-platform tools for HPC • Our product and service team will continue to work with you, our customers and partners, and the wider HPC community 5 xxxx rev 00000 Server & HPC Development Solutions from Arm Best in class commercially supported tools for Linux and high-performance computing Code Generation Performance Engineering Server & HPC Solution for Arm servers for any architecture, at any scale for Arm servers COMPILER FOR LINUX Commercially Supported Toolkit C/C++ Compiler Debugger for applications development on Linux Fortran Compiler Profiler • C/C++ Compiler for Linux • Fortran Compiler for Linux Performance Libraries Reporting • Performance Libraries • Performance Reports • Debugger • Profiler 6 xxxx rev 00000 Server & HPC Development Solutions from Arm Best in class commercially supported tools for Linux and high-performance computing Code Generation Performance Engineering Server & HPC Solution for Arm servers for any architecture, at any scale for Arm servers COMPILER FOR LINUX Commercially Supported Toolkit C/C++ Compiler Debugger for applications development on Linux Fortran Compiler Profiler • C/C++ Compiler for Linux • Fortran Compiler for Linux Performance Libraries Reporting • Performance Libraries • Performance Reports • Debugger • Profiler 7 xxxx rev 00000 Arm Forge (DDT & MAP) An interoperable toolkit for debugging and profiling The de-facto standard for HPC development • Available on the vast majority of the Top500 machines in the world • Fully supported by Arm on x86, IBM Power, Nvidia GPUs and Arm v8-A. Commercially supported by Arm State-of-the art debugging and profiling capabilities • Powerful and in-depth error detection mechanisms (including memory debugging) • Low-overhead sampling-based profiler to identify and understand bottlenecks • Available at any scale (from serial to peta-flopic applications) Fully Scalable Easy to use by everyone • Unique capabilities to simplify remote interactive sessions • Innovative approach to present quintessential information to users Very user-friendly 8 xxxx rev 00000 Arm Performance Reports Characterize and understand the performance of HPC application runs Gathers a rich set of data • Analyses metrics around CPU, memory, IO, hardware counters, etc. • Possibility for users to add their own metrics Commercially supported by Arm Build a culture of application performance & efficiency awareness • Analyses data and reports the information that matters to users • Provides simple guidance to help improve workloads’ efficiency Accurate and astute insight Adds value to typical users’ workflows • Define application behaviour and performance expectations • Integrate outputs to various systems for validation (e.g. continuous integration) • Can be automated completely (no user intervention) Relevant advice to avoid pitfalls 9 xxxx rev 00000 Different ways to run Arm Forge… Here There (remote launch + (interactive mode + reverse connect) reverse connect) There (offline OR interactive mode) Ultimately, that’s where the tools will run. But what about the GUI? 10 xxxx rev 00000 Run and Ensure Application Correctness with DDT Scalable tool for interactive and automated debugging • One can: • Use DDT manually & interactively to debug the application • Generate debug report in offline mode that can be shared with others for co-development • Integrate Arm Forge to your CI workflows for automated & non-interactive debugging Examples: $ ddt --connect mpirun –n 48 ./example $ ddt --offline mpirun –n 48 ./example 11 xxxx rev 00000 Optimise the Application with MAP Identify bottlenecks and rewrite some code for better performance • One can: • Profile the application and measure all performance aspects • Generate profile report that can be analysed later or shared with others Examples: $ map --connect mpirun –n 48 ./example $ map --profile mpirun –n 48 ./example 12 xxxx rev 00000 Understand Application Behaviour with Performance Reports Set a reference for future work • One can: • Analyse performance with the PR generated stats and hints • Summarise from a .map file generated by MAP Example: $ perf-report mpirun –n 16 mmult_c $ perf-report profile.map 13 xxxx rev 00000 Resources Web, doc and support • Forge user guide, releasenotes/installation, downloads: https://developer.arm.com/tools-and-software/server-and-hpc/debug-and-profile/arm-forge • Forge product info, case studies, webinars: https://www.arm.com/products/development-tools/server-and-hpc/forge • Arm Licensing Server download and installation: https://developer.arm.com/tools-and-software/server-and-hpc/help/help-and-tutorials/system- administration/licensing/arm-licence-server For getting support, please send emails to [email protected] or submit a case directly from https://support.developer.arm.com/. 14 xxxx rev 00000 Arm DDT Debugging with Arm DDT Confidential © Arm 2018 OpenSource Debuggers’ Challenges Debugging with GDB/LLDB can be less user friendly due to the text-based interface GDB Workflow Alternate GDB GUI - DDD 16 xxxx rev 00000 Arm DDT – The Debugger Who had a rogue behaviour ? Run with Arm tools • Merges stacks from processes and threads Where did it happen? Identify a problem • leaps to source Gather info How did it happen? Who, Where, How, Why • Diagnostic messages • Some faults evident instantly from source Fix Why did it happen? • Unique “Smart Highlighting” • Sparklines comparing data across processes 17 xxxx rev 00000 DDT capabilities • Dedicated HPC debugger • Fortran, C & C++, Python • Designed for massively parallel applications • Designed for MPI applications • Support for OpenMP • Highly scalable • Shown to debug at hundreds of thousands of cores • Fast reduction algorithms • Memory debugging • Variable comparison • Distributed arrays • GPU support • For NVIDIA CUDA (9, 10 and 11) 18 xxxx rev 00000 The DDT user interface (on Arm) 19 xxxx rev 00000 Breakpoints, Watchpoints and Tracepoints Breakpoints: stop at a code line and check Watchpoints: allow observing variables or expressions with conditions. DDT will stop with a notification about the value of the variable or the expression Tracepoints: allow tracing variables in a selected code line without stopping the application * Use Tracepoints with cautious, as it can be resource consuming 20 xxxx rev 00000 Version Control Information To track new bugs from latestchanges View -> Version Control Information 22 xxxx rev 00000 Disassembler View Tools -> Disassemble or 24 xxxx rev 00000 Check memory usage Tools -> Overall Memory Stats Tools -> Current Memory Usage 25 xxxx rev 00000 Memory debugging menu in Arm DDT 26 xxxx rev 00000 Launching DDT Confidential © Arm 2018 Preparing Code for Use withDDT As with any debugger, code must be compiled with the debug flag typically -g It is recommended to use low optimization level i.e. -O0 during debugging for better debug info - • To avoid compiler code generation errors, • More errors with more aggressive optimizations, • Optimizes out some variables and functions 30 xxxx rev 00000 Express launch (from where Forge is installed) With X11 forwarding to launch the ddt GUI (might be slow, or X11 not supported): $ ddt $ ddt srun/aprun/mpirun/mpiexec –n 4 example.exe Without X11 forwarding (faster): • Offline mode, without interactive debugging: $ ddt --offline srun/aprun/mpirun/mpiexec –n 4 example.exe • Remote connect, with interactive debugging: $ ddt --connect $ ddt --connect srun/aprun/mpirun/mpiexec –n 4 example.exe 31 xxxx rev 00000 Express launch GUI 32 xxxx rev 00000 Remote connect • Saves connecting over X11 session • Communicates over ssh • Much faster • Start local (e.g. laptop) instance of Forge • See remote files as normal • Configure a remote connection • Supports multi-hop and ssh configurations • Specify the location of the Forge install • Lets you open ‘remote’ files ‘locally’ • Start jobs – through scheduler • Supports reverse connect 33 xxxx rev 00000 Remote connect with Forge Remote Client Install the Forge client on your remote laptop/workstation. Download the package from: https://developer.arm.com/tools-and-software/server-and-hpc/downloads/arm-forge 34 xxxx rev 00000 Arm DDT cheat sheet • Load the environment module • $ module load armforge/21.0