Best Practice Guide - Generic X86 Vegard Eide, NTNU Nikos Anastopoulos, GRNET Henrik Nagel, NTNU 02-05-2013
Total Page:16
File Type:pdf, Size:1020Kb
Best Practice Guide - Generic x86 Vegard Eide, NTNU Nikos Anastopoulos, GRNET Henrik Nagel, NTNU 02-05-2013 1 Best Practice Guide - Generic x86 Table of Contents 1. Introduction .............................................................................................................................. 3 2. x86 - Basic Properties ................................................................................................................ 3 2.1. Basic Properties .............................................................................................................. 3 2.2. Simultaneous Multithreading ............................................................................................. 4 3. Programming Environment ......................................................................................................... 5 3.1. Modules ........................................................................................................................ 5 3.2. Compiling ..................................................................................................................... 6 3.2.1. Compilers ........................................................................................................... 6 3.2.2. General Compiler Flags ......................................................................................... 6 3.2.2.1. GCC ........................................................................................................ 6 3.2.2.2. Intel ......................................................................................................... 7 3.2.2.3. Portland ................................................................................................... 8 3.2.2.4. PathScale .................................................................................................. 9 3.3. MPI .............................................................................................................................. 9 3.3.1. MVAPICH2 ....................................................................................................... 10 3.3.2. Open MPI ......................................................................................................... 10 3.3.3. Intel MPI .......................................................................................................... 12 3.3.4. Platform MPI ..................................................................................................... 13 3.4. OpenMP ...................................................................................................................... 14 3.4.1. Compiler Flags ................................................................................................... 14 3.5. Hybrid Programming ..................................................................................................... 15 3.6. Math Libraries .............................................................................................................. 16 3.6.1. GSL (GNU Scientific Library) .............................................................................. 16 3.6.2. MKL (Intel Math Kernel Library) .......................................................................... 18 3.6.3. ACML (AMD Core Math Library) ........................................................................ 19 3.6.4. FFTW ............................................................................................................... 20 4. Tuning ................................................................................................................................... 20 4.1. Optimisation ................................................................................................................. 20 4.1.1. IEEE 754 Compliant Optimisation ......................................................................... 20 4.1.2. Compiler Optimisation Flags ................................................................................ 21 4.2. Runtime Environment Settings ........................................................................................ 25 4.2.1. MPI Runtime Environment Variables ..................................................................... 25 4.2.2. OpenMP Runtime Environment Variables ............................................................... 28 5. Debugging .............................................................................................................................. 30 5.1. Compiler Debug Flags and Environment Variables .............................................................. 30 5.2. Debuggers .................................................................................................................... 32 5.2.1. GNU GDB ........................................................................................................ 32 5.2.2. Totalview .......................................................................................................... 34 5.2.3. DDT ................................................................................................................. 36 6. Performance Analysis ............................................................................................................... 40 6.1. MPI Statistics ............................................................................................................... 40 6.2. Tools .......................................................................................................................... 43 6.2.1. GNU gprof ........................................................................................................ 43 6.2.2. Oprofile ............................................................................................................ 43 6.2.3. IPM .................................................................................................................. 47 6.2.4. Scalasca ............................................................................................................ 48 6.2.5. TAU ................................................................................................................. 50 6.2.6. VTune .............................................................................................................. 52 7. Batch Systems ........................................................................................................................ 54 7.1. PBS Pro ...................................................................................................................... 54 7.2. Platform LSF ............................................................................................................... 57 7.3. LoadLeveler ................................................................................................................. 59 2 Best Practice Guide - Generic x86 1. Introduction This guide provides an overview of best practices on using x86 HPC cluster systems. Topics discussed include programming environment, e.g. choice of compiler options and numerical libraries, runtime environment settings, debuggers, performance analysis tools and running jobs in batch systems. The guide is not system-specific, i.e. it will not cover information that is targeted for one specific machine system but instead focus on topics that will be common to systems based on the x86 architecture. To get details about architectures and configurations for specific systems users are referred to system user guides, some of which can be found at http://www.prace-ri.eu/Best-Practice-Guides. The guide is organised into these parts: – Chapter 2 gives a general overview of the x86 architecture and its properties. – Chapter 3 describes the environment setup through the modules system used when porting and developing codes. A selection of compilers and a list of common and useful options are described. The chapter discusses four of the available MPI implementations and their general usage when building parallel applications, and job launcher commands with options supported. Included is also information for compiling OpenMP codes, a section describing hybrid programming, and a coverage of math libraries used when building applications. – Chapter 4 describes ways of improving application performance using compiler optimisation flags, and runtime tuning through MPI and OpenMP environment variables. – Chapter 5 discusses the process of debugging codes using compiler debug flags and environment variables, and using command line and GUI debuggers. – Chapter 6 describes how to analyse code behaviour through MPI summary statistics, and to find the source of performance problems using profilers and analysis tools. – Chapter 7 covers some of the systems used for running batch jobs describing commands, job resources, and including sample batch scripts. 2. x86 - Basic Properties 2.1. Basic Properties x86 is a instruction set architecture first introduced by Intel in 1978 in their 8086 CPU. The term x86 derives from the main CPU in the first succeeding processor generations that had names ending in "86". The instruction set length of the 8086 CPU was 16 bits. This was extended to 32 bits in the 80386 CPU by Intel in 1985, and later to 64 bits by AMD in their Opteron processor in 2003. The 64-bit extension architecture is also known as x86-64. The x86 designs are superscalar allowing for instruction-level parallelism (ILP) where several independent in- structions can be executed per clock cycle. The efficiency of superscalar CPUs can be additionally improved using simultaneous multithreading