Best Practice Guide Modern Processors

Best Practice Guide Modern Processors

Best Practice Guide Modern Processors Ole Widar Saastad, University of Oslo, Norway Kristina Kapanova, NCSA, Bulgaria Stoyan Markov, NCSA, Bulgaria Cristian Morales, BSC, Spain Anastasiia Shamakina, HLRS, Germany Nick Johnson, EPCC, United Kingdom Ezhilmathi Krishnasamy, University of Luxembourg, Luxembourg Sebastien Varrette, University of Luxembourg, Luxembourg Hayk Shoukourian (Editor), LRZ, Germany Updated 5-5-2021 1 Best Practice Guide Modern Processors Table of Contents 1. Introduction .............................................................................................................................. 4 2. ARM Processors ....................................................................................................................... 6 2.1. Architecture ................................................................................................................... 6 2.1.1. Kunpeng 920 ....................................................................................................... 6 2.1.2. ThunderX2 .......................................................................................................... 7 2.1.3. NUMA architecture .............................................................................................. 9 2.2. Programming Environment ............................................................................................... 9 2.2.1. Compilers ........................................................................................................... 9 2.2.2. Vendor performance libraries ................................................................................ 10 2.2.3. Scalable Vector Extension (SVE) software support ................................................... 11 2.3. Benchmark performance ................................................................................................. 12 2.3.1. STREAM - memory bandwidth benchmark - Kunpeng 920 ......................................... 12 2.3.2. STREAM - memory bandwidth benchmark - Thunder X2 .......................................... 13 2.3.3. High Performance Linpack ................................................................................... 14 2.4. MPI Ping-pong performance using RoCE .......................................................................... 15 2.5. HPCG - High Performance Conjugated Gradients ............................................................... 15 2.6. Simultaneous Multi Threading (SMT) performance impact ................................................... 17 2.7. IOR ............................................................................................................................ 19 2.8. European ARM processor based systems ........................................................................... 19 2.8.1. Fulhame (EPCC) ................................................................................................ 19 3. Processors Intel Skylake ........................................................................................................... 21 3.1. Architecture ................................................................................................................. 21 3.1.1. Memory Architecture .......................................................................................... 22 3.1.2. Power Management ............................................................................................. 22 3.2. Programming Environment ............................................................................................. 23 3.2.1. Compilers .......................................................................................................... 23 3.2.2. Available Numerical Libraries ............................................................................... 24 3.3. Benchmark performance ................................................................................................. 25 3.3.1. MareNostrum system ........................................................................................... 25 3.3.2. SuperMUC-NG system ........................................................................................ 28 3.4. Performance Analysis .................................................................................................... 33 3.4.1. Intel Application Performance Snapshot .................................................................. 33 3.4.2. Scalasca ............................................................................................................ 42 3.4.3. Arm Forge Reports ............................................................................................. 44 3.4.4. PAPI ................................................................................................................ 45 3.5. Tuning ........................................................................................................................ 48 3.5.1. Compiler Flags ................................................................................................... 48 3.5.2. Serial Code Optimisation ..................................................................................... 50 3.5.3. Shared Memory Programming-OpenMP .................................................................. 53 3.5.4. Distributed memory programming -MPI .................................................................. 56 3.5.5. Environment Variables for Process Pinning OpenMP+MPI ......................................... 61 3.6. European SkyLake processor based systems ....................................................................... 63 3.6.1. MareNostrum 4 (BSC) ......................................................................................... 63 3.6.2. SuperMUC-NG (LRZ) ......................................................................................... 67 4. AMD Rome Processors ............................................................................................................ 69 4.1. System Architecture ....................................................................................................... 70 4.1.1. Cores - «real» vs. virtual/logical ............................................................................ 75 4.1.2. Memory Architecture .......................................................................................... 76 4.1.3. NUMA ............................................................................................................. 79 4.1.4. Balance of AMD/Rome system ............................................................................. 80 4.2. Programming Environment ............................................................................................. 80 4.2.1. Available Compilers ............................................................................................ 80 4.2.2. Compiler Flags ................................................................................................... 81 4.2.3. AMD Optimizing CPU Libraries (AOCL) ............................................................... 82 4.2.4. Intel Math Kernel Library .................................................................................... 83 2 Best Practice Guide Modern Processors 4.2.5. Library performance ............................................................................................ 84 4.3. Benchmark performance ................................................................................................. 84 4.3.1. Stream - memory bandwidth benchmark ................................................................. 84 4.3.2. High Performance Linpack ................................................................................... 85 4.4. Performance Analysis .................................................................................................... 86 4.4.1. perf (Linux utility) .............................................................................................. 86 4.4.2. perfcatch ........................................................................................................... 87 4.4.3. AMD µProf ....................................................................................................... 88 4.4.4. Roof line model ................................................................................................. 91 4.5. Tuning ........................................................................................................................ 93 4.5.1. Introduction ....................................................................................................... 93 4.5.2. Intel MKL pre 2020 version ................................................................................. 93 4.5.3. Intel MKL 2020 version ...................................................................................... 93 4.5.4. Memory bandwidth per core ................................................................................. 94 4.6. European AMD processor based systems ........................................................................... 95 4.6.1. HAWK system (HLRS) ....................................................................................... 95 4.6.2. Betzy system (Sigma2) ........................................................................................ 96 A. Acronyms and Abbreviations ................................................................................................... 100 1. Units ...........................................................................................................................

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    109 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us