Best Practice Guide - Knights Landing Vali Codreanu, Surfsara Jorge Rodríguez, BSC Ole Widar Saastad (Editor), University of Oslo 31-01-2017
Total Page:16
File Type:pdf, Size:1020Kb
Best Practice Guide - Knights Landing Vali Codreanu, SURFsara Jorge Rodríguez, BSC Ole Widar Saastad (Editor), University of Oslo 31-01-2017 1 Best Practice Guide - Knights Landing Table of Contents 1. Introduction .............................................................................................................................. 5 2. System Architecture / Configuration ............................................................................................. 5 2.1. Processor Architecture / MCM Architecture ......................................................................... 6 2.1.1. Instruction Set Architecture .................................................................................... 7 2.2. Memory Architecture ....................................................................................................... 9 2.2.1. Flat mode .......................................................................................................... 10 2.2.2. Cache mode ....................................................................................................... 10 2.2.3. Hybrid mode ...................................................................................................... 13 2.2.4. Optimal Settings ................................................................................................. 13 2.2.5. Example: Programming with the memkind library ..................................................... 13 2.3. Clustering modes .......................................................................................................... 13 2.3.1. All to all mode ................................................................................................... 14 2.3.2. Quadrant mode ................................................................................................... 14 2.3.3. Sub NUMA mode ............................................................................................... 14 2.3.4. Performance using the different modes ................................................................... 15 2.3.5. Example: Programming with sub-NUMA clusters ..................................................... 15 2.3.6. BIOS settings for clustering, memory etc ................................................................ 15 2.4. (Node) Interconnect ....................................................................................................... 16 2.5. I/O Subsystem Architecture ............................................................................................ 16 3. Programming Environment / Basic Porting ................................................................................... 16 3.1. Default System settings .................................................................................................. 16 3.2. Available Compilers ...................................................................................................... 16 3.2.1. Compiler Flags ................................................................................................... 17 3.3. Available (Vendor Optimized) Numerical Libraries ............................................................. 19 3.3.1. Math Kernel Library, MKL .................................................................................. 19 3.3.2. Intel Performance Primitives, IPP .......................................................................... 21 3.3.3. Math library, libimf ............................................................................................. 22 3.3.4. Short vector math library, libsvml .......................................................................... 22 3.4. Available MPI Implementations ....................................................................................... 23 3.5. OpenMP ...................................................................................................................... 24 3.5.1. Compiler Flags ................................................................................................... 24 4. Benchmark performance ........................................................................................................... 24 4.1. STREAM benchmark ..................................................................................................... 25 4.2. NAS kernel benchmark, NPB .......................................................................................... 26 4.2.1. Scaling and Speedup test ..................................................................................... 27 4.2.2. Performance ....................................................................................................... 28 4.3. HYDRO benchmark ...................................................................................................... 29 4.3.1. Scaling and Speedup test ..................................................................................... 29 4.3.2. Performance ....................................................................................................... 30 5. Application Performance ........................................................................................................... 31 5.1. ALYA ......................................................................................................................... 31 5.1.1. Building and compiling ....................................................................................... 31 5.1.2. Scaling and speedup ............................................................................................ 32 5.1.3. Performance ....................................................................................................... 32 5.2. GROMACS .................................................................................................................. 33 5.2.1. Building and compiling ....................................................................................... 33 5.2.2. Scaling and speedup ............................................................................................ 34 5.2.3. Performance ....................................................................................................... 35 5.3. Bifrost (Stellar atmosphere simulation code) ...................................................................... 35 5.3.1. Building and compiling ....................................................................................... 36 5.3.2. Vectorization and scaling ..................................................................................... 36 5.3.3. Performance ....................................................................................................... 38 5.4. Dalton and LS-Dalton (molecular electronic structure program) ............................................. 38 5.4.1. Building and compiling ....................................................................................... 38 5.4.2. Effect of vectorization ......................................................................................... 39 2 Best Practice Guide - Knights Landing 5.4.3. Scaling and speedup ............................................................................................ 40 5.4.4. Performance ....................................................................................................... 41 6. Performance Analysis ............................................................................................................... 42 6.1. Performance Monitoring in the Knights Landing Tile ........................................................... 42 6.2. Available Performance Analysis Tools .............................................................................. 43 6.2.1. Intel Application Performance Snapshot .................................................................. 43 6.2.2. Allienea Performance Reports ............................................................................... 44 6.2.3. Intel Software Development Emulator .................................................................... 46 6.3. Hints for Interpreting Results .......................................................................................... 47 7. Tuning ................................................................................................................................... 47 7.1. Guidelines for tuning ..................................................................................................... 47 7.1.1. Top down .......................................................................................................... 47 7.1.2. Bottom up ......................................................................................................... 48 7.2. Intel tuning tools ........................................................................................................... 49 7.2.1. Intel compiler .................................................................................................... 49 7.2.2. Intel MPI library ................................................................................................ 52 7.2.3. Intel XE-Advisor ................................................................................................ 54 7.2.4. Intel XE-Inspector .............................................................................................. 55 7.2.5. Intel VTune Amplifier ......................................................................................... 55 7.2.6. Intel Trace Analyzer ...........................................................................................