AMD ™ 7xx2-series Processors Compiler Options Quick Reference Guide

AOCC compiler (with Flang - Fortran Front-End) Latest release: 2.3, Dec 2020 https://developer.amd.com/amd-aocc/

Architecture Other options

Generate instructions that run -march=znver2 Enable faster, less precise -ffast-math on 2nd Gen EPYC/ math operations -freciprocal-math Generate instructions for the -march=native OpenMP® threads and affinity export OMP_NUM_THREADS=N local machine (N number of cores) export GOMP_CPU_AFFINITY=“0- {N-1)” Optimization Levels Enabling Vector library -vector-library=LIBMVEC Disable all optimizations -O0 Link to Vector library -L/libm-install-dir/lib -lmvec Minimal level speed and code -O1 optimization Link to AMD library -L/libm-install-dir/lib -lamdlibm Moderate level optimization -O2/ -O (default) For Fortran workloads Aggressive optimizations -O3 Compile free form FORTRAN -ffree-form Maximize performance -Ofast Enable precise math operations -kieee in FORTRAN Enable Link Time Optimization -flto

Enable unrolling -funroll-loops AMD Optimized Libraries Enable aggressive loop optimi- -enable-loop-versioning-licm Latest release: 2.2, Jun 2020 zations -enable-partial-unswitch -unroll-aggressive https://developer.amd.com/amd-aocl/ Enable aggressive inline opti- -function-specialize mizations -finline-aggressive AMD µProf (Performance & Power Profiler) Enable aggressive vectoriza- -enable-strided-vectorization Latest release: 3.3, Jul 2020 tion -enable-epilog-vectorization https://developer.amd.com/amd-uprof/ Enable memory layout optimi- -fremap-arrays (use with -flto) zations

Profile Guided optimizations -fprofile-instr-generate (1st invocation) -fprofile-instr-use (2nd invocation) OpenMP® -fopenmp

For enabling streaming stores, -fnt-store memory bandwidth workloads

Enable removal of un-used -reduce-array-computation=3 array computation

Advanced Micro Devices © 2019 –20 , Inc. All rights reserved. AMD, the AMD Arrow logo, AMD EPYC, AMD One AMD Place, P.O. Box 3453 RYZEN and combinations thereof are trademarks of Advanced Micro Devices, Inc. is a registered trade- Sunnyvale, CA 94088-3453 mark of Linus Torvalds. Other names are for informational purposes only and may be trademarks of their AMD EPYC™ 7xx2-series Processors

GNU compiler collection (gcc, g++, gfortran) Visual Studio 2019 Latest release: 10.3, Jul 2020 Latest stable release : 16.8, Nov 2020 Recommended version : 9.3 https://www.visualstudio.com/ http://gcc.gnu.org User Guide

Architecture Architecture Generate instructions that run -march=znver2 Generate instructions that run /arch:[AVX|AVX2] on 2nd Gen EPYC/ RYZEN on 2nd Gen EPYC/ RYZEN Generate instructions for the -march=native Optimize for 64-bit AMD pro- /favor:AMD64 /d2vzeroupper local machine cessors Optimization Levels Optimization Levels Disable all optimizations -O0 Disable optimizations /Od (default) Maximum optimizations (favor /O1 Minimal level speed and code -O1/ -O space) optimizations Maximum optimizations (favor /O2 Moderate level optimizations -O2 speed) Aggressive optimizations -O3 [link.exe] Eliminate unrefer- /OPT:REF Maximize performance -Ofast enced function and/ or data Additional Optimizations [link.exe] Perform identical /OPT:ICF Link time optimization -flto COMDAT folding Enable unrolling -funroll-all-loops Output an informational mes- /Qvec-report:[1|2] sage for loops that are auto- Generate memory preload in- -fprefetch-loop-arrays --param vectorized structions prefetch-latency=300 Enable automatic parallelization /Qpar Profile-guided optimization -fprofile-generate (1st invocation) of loops, used in conjunction -fprofile-use (2nd invocation) with #pragma loop() directive OpenMP® -fopenmp Output an informational mes- /Qpar-report:[1|2] Other options sage for loops that are auto- parallelized Enable generation of code that -mieee-fp follows IEEE arithmetic Additional Optimizations Enable faster, less precise math -ffast-math Maintain the precision for /fp:precise operations floating-point operations through proper rounding Compile free form FORTRAN -ffree-form Optimize floating-point code for /fp:fast OpenMP® threads and affinity export OMP_NUM_THREADS=N speed at the expense of floating (N number of cores) export GOMP_CPU_AFFINITY=“0-{N-1)” -point accuracy and correctness Link to AMD library -L/libm-install-dir/lib -lamdlibm Whole Program Optimization /GL GlibC (link-time code generation)

Latest release: 2.32, Aug 2020 Profile-guided optimization LTCG:PGI and /LTCG:PGO Recommendation : 2.26 or later https://www.gnu.org/software/libc/ Binutils Recommendation: 2.26.1 or later https://www.gnu.org/software/binutils/

Advanced Micro Devices © 2019 –20 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD EPYC, AMD One AMD Place, P.O. Box 3453 RYZEN and combinations thereof are trademarks of Advanced Micro Devices, Inc. Linux is a registered trade- Sunnyvale, CA 94088-3453 mark of Linus Torvalds. Other names are for informational purposes only and may be trademarks of their AMD EPYC™ 7xx2-series Processors Compiler Options Quick Reference Guide

Intel compilers (icc, icpc, ifort) Latest release: 19.1 http://software.intel.com

Architecture Generate instructions that run on -march=core-avx2 (preferred) OR 2nd Gen EPYC/RYZEN -axCORE-AVX2 Optimization Levels Disable all optimizations -O0

Speed optimization without code -O1 growth Enable optimization for speed -O2 including vectorization Aggressive optimization -O3 Maximize performance -Ofast

Additional Optimizations Aggressive unrolling -unroll-aggressive Disable improved precision -no-prec-div floating divides Enable vectorization -vec Inter procedural Optimization -ipo OpenMP® -qopenmp Prefetch optimization -qopt-prefetch Profile generated optimization -prof-gen and -prof-use Use optimized header definitions -use-intel-optimized-headers Other options Floating point accuracy tuning -fp-model

Compile free form FORTRAN -free

Advanced Micro Devices © 2019 –20 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD EPYC, AMD One AMD Place, P.O. Box 3453 RYZEN and combinations thereof are trademarks of Advanced Micro Devices, Inc. Linux is a registered trade- mark of Linus Torvalds. Other names are for informational purposes only and may be trademarks of their Sunnyvale, CA 94088-3453