<<

Porng Guide for ICC NextGen

Contents Compiler Porng Guide for ICC NextGen ...... 1 Nomenclature ...... 2 Guiding Principles for ICC NextGen ...... 2 Major Changes in Compiler Defaults ...... 2 Important New Opons ...... 3 Opons to aid Intel Analyzers and other profiling tools. Use all of these to assist Analyzers...... 3 ICC NextGen OpenMP Opons ...... 3 Compiler Versioning ...... 3 Data Parallel ++ Version String ...... 3 ICC NextGen Version String ...... 4 Important Compiler Opons Mapping ...... 4 Pragma Support ...... 5 Predefined Macro Support ...... 6 Built-In Funcons ...... 6 Support for Pre-Compiled Header Files ...... 7 Changes in Diagnoscs Opons and Diagnosc Message Numbering ...... 7 Linking, IPO and PGO changes ...... 9 Language Features ...... 9 Intrinsic Usage Model Change ...... 9 Example of LLVM Intrinsics Handling Differences ...... 10 Intrinsics Via Funcon Definion __aribute__((target())) In the above example we used a compiler opon to target a specific instrucon set ( -mavx512f ). This can be used if just one instrucon set exists in the source file. Oen, source files will contain mulple instrucon sets represented in intrinsic data declaraons and intrinsic instrucons. This is done to call specific funcons or code secons based on the runme processor discovery. Typically, these funcons or code secons are protected by #IFDEFs with specific target architectures and the user code does processor dispatch to these secons or funcons...... 13 Intel Proprietary Processsor Targeng Pragmas and Funcons Support ...... 14 Legacy Intrinsics Promoon with Opon intrinsic-promote: ...... 14 Floang Point Reproducibility Controls ...... 15 Brutus or Biseconal Opmizaon Support ...... 15 Appendix I: ICC Classic Compiler Opons Status ...... 15

March 6, 2020 1 Nomenclature • For simplicity and clarity, we will informally refer to the Intel® C++ Compiler Next Generaon framework as “ICC NextGen” and the classic or “tradional ICC” Intel® C++ Compiler as “ICC Classic” throughout this document. Guiding Principles for ICC NextGen • ICC Classic and ICC NextGen use the same name for many of the compiler drivers ( icc, icpc, icl ). In addion to providing a core C++ compiler, ICC NextGen is the base compiler for the Intel® Data Parallel C++ compiler and its new driver, ‘dpcpp’. ICC NextGen is our newer “Next Generaon” compiler. Keep this in mind: ICC NextGen is not a drop-in replacement for ICC Classic. Although considerable effort is being made to make the transion from ICC Classic to ICC NextGen as smooth and as effortless as possible, customers can expect that some effort may be required to port and tune exisng applicaons from ICC Classic to ICC NextGen.

o For Data Parallel C++ applicaons: Our underlying compiler is ICC NextGen. The driver is named ‘dpcpp’.

o For standard C++ applicaons: ICC Classic (icc/icpc/icl drivers) will remain Intel’s recommended producon compiler AND THE DEFAULT compiler unl ICC NextGen has performance and features superior to ICC Classic. In the future, when ICC NextGen funconality and performance equivalence or beer against ICC Classic is achieved, Intel may change the default for ICC to be ICC NextGen, instead of, ICC Classic. We have no set meline for this transion as it is condional on reaching equivalence or beer than current ICC Classic technology. Going forward BOTH ICC Classic and ICC NextGen will be provided in Intel Compiler packages. Compiler opons will be used to select one or the other compilers. Major Changes in Compiler Defaults 1. C++ Users: The compiler driver name is sll icc/icpc (*) or icl (Windows*), but requires the – qnextgen (Linux*) or /Qnextgen (Windows*) opon to invoke the ICC NextGen compiler. Without the [-q|/Q]nextgen opon the ICC Classic compiler will be invoked. 2. DPC++ Users: Use the ‘dpcpp’ driver invokes ICC NextGen with DPC++ extensions.

3. Default floang point model is -fp:precise –no-fma. 4. Macro __INTEL_LLVM_COMPILER is defined for ICC NextGen, instead of, the __INTEL_COMPILER macro used in ICC Classic. To see all defined values for the compiler, use -E -dM opons which will either create a .ii file with this informaon or write the values to stderr. For example: for Data Parallel C++ dpcpp -E -dM ./hello.cpp more hello.ii for ICC NextGen the output will be sent to stderr icc -qnextgen -E -dM ./hello.cpp 5. No diagnoscs numbers are listed for remarks/warnings/notes. Every diagnosc is emied with the corresponding compiler opon to disable the same.

March 6, 2020 2 6. Compiler intrinsics will not be automacally recognized without processor targeng opons, unlike the behavior in ICC Classic. If you use intrinsics, please read more on this intrinsic behavior change later in this document. Important New Opons Opons to aid Intel Analyzers and other profiling tools. Use all of these to assist Analyzers. • -gline-tables-only This opon is helpful for profiling tools. This opon generates line table debug informaon only. This allows symbolic back traces with inlining informaon, but does not include any informaon about variables, their locaons, or types.

• -fdebug-info-for-profiling adds extra debug info for more accurate profile ICC NextGen OpenMP Opons • -fiopenmp Compile and recognize OpenMP parallel and SIMD pragmas/direcves and clauses and use the Intel OpenMP runme libraries.

• -fopenmp-targets=spir64 is needed when OpenMP 4.5/5.0 TARGET pragmas/direcves are used.

• OpenMP 4.5/5.0 TARGET direcves are only recognized by the ICC NextGen compiler that is included in the Intel® oneAPI HPC Toolkit. Use the two compiler opons above together:

o icc -qnextgen -fiopenmp -fopenmp-targets=spir64

o icpc -qnextgen -fiopenmp -fopenmp-targets=spir64

Compiler Versioning

Data Parallel C++ Version String The Data Parallel C++ compiler version string is new. An example: dpcpp --version DPC++ Compiler 2021.1-betaXX (2019.8.x.1010) The format is:

VVVV.MINOR -[beta]XX (build string) where • VVVVV is the product VERSION in the format of year. • MINOR is a single-digit minor version number, starng at “1” for inial release and incremented as needed for minor releases. • [beta] may or may not appear. If the compiler is a Beta compiler you will see the string “beta”, else this will not be present for producon compilers. • XX is the beta release • The (build string) can be decoded:

(YYYY..PULL.UPDATE . )

March 6, 2020 3 where • YYYY is the year • is the version of Clang incorporated into ICC NextGen front-end. • PULL is 1- or 2-digit and indicates the month we pulled Clang and merged into our build. • UPDATE is a single-digit minor version number. Typically it is “x” for beta, unless two or more compilers are released in a given month. • is of the form yyyymmdd.

For example, (2019.8.x.1108) This compiler was built in 2019. The Clang front end is version 8. The PULL in this beta is “x”, but in a later compiler this will indicate the pull date or minor version. The exact date of the build is 2019 1108 in YYYY MMDD

Another example: If, hypothecally, Clang 9 releases and we create a new version of the Intel compiler in March 2020, then the ICC NextGen version released in March 2020 would be versioned 2020.9.3.0. If there is a June 2020 release, the version will be 2020.9.6.0.

ICC NextGen Version String The ICC NextGen compiler version string is new. The format is: VVVV.MINOR [Beta] Where • VVVVV is the product VERSION, in the format of year. • MINOR is a single-digit minor version number – starng at “1” for inial release and incremented as needed for minor releases. • [Beta] may or may not appear. If the compiler is a Beta compiler you will see the string “Beta”, else this will not be present for producon compilers • is of the form yyyymmdd.

An example: icpc –qnextgen --version icx (ICX) 2021.1 Beta 20191108 Copyright (C) 1985-2019 Intel Corporation. All rights reserved. This compiler is the 2021 Version compiler, inial minor version and is a BETA compiler, build date was 20191108 .

Important Compiler Opons Mapping • With the [-q|/Q]nextgen opon, ICC NextGen drivers icc, icpc or icl will accept ICC Classic compiler opons OR Clang*/LLVM compiler opons.

o Clang*/LLVM compiler opons are interpreted directly. o Classic ICC compiler opons passed to ICC NextGen are translated to their Clang*/LLVM equivalents, where possible.

• Without the [-q|/Q]nextgen opon (ICC Classic mode) ONLY ICC Classic opons are accepted and the ICC Classic compiler is invoked (no ICC NextGen).

March 6, 2020 4 • Not all ICC Classic opons are accepted and/or implemented in ICC NextGen. • Undocumented opons from ICC Classic are NOT implemented and there are no plans to do so. Remember, this is a very different compiler – the old internal, undocumented ICC Classic opons have no meaning or mapping to the ICC NextGen compiler. If there is funconality in an undocumented opon that you think you need, submit a bug report via Online Service Center (OSC), hps://soware.intel.com/en-us/support, explain the behavior you expect and how ICC NextGen is not providing what you need. “Because ICC accepted this opon and it’s in my makefile” is not jusficaon. This is a different compiler with different opmizaons and behavior. Try ICC NextGen without the opon.

• ICC Classic opons: Diagnosc warnings are emied for ICC Classic opons and are CURRENTLY not planned to be implemented in ICC NextGen.

command line warning #10430: Unsupported command line options encountered These options as listed are not supported with the compiler selected. For more information, use '-qnextgen-diag'. o ICC NextGen opon –qnextgen-diag causes the ICC NextGen compiler to emit a long list of ICC Classic opons that are NOT accepted by ICC NextGen.

o If you have an unsupported opon that is important to your applicaon AND removing it from your build flags with ICC NextGen leads to an error please REPORT AN ISSUE via OSC, hps://soware.intel.com/en-us/support. We would like to know which opons you need implemented in ICC NextGen.

• ICC Classic opons that ARE IMPLEMENTED or will be implemented soon are accepted quietly. • All Clang*/LLVM opons for the Clang version included in ICC NextGen are accepted and implemented. However, somemes it maybe be necessary to pass opons to Clang. If you need to or want to pass Clang opons directly, use the opon

o -Xclang o If the opon has arguments, use mulple -Xclang opons. For example, to pass:

▪ -target-feature +aes use -Xclang -target-feature -Xclang +aes o This -Xclang opon is for both Linux and Windows. • GNU* and Microso* compable opons are accepted by ICC Classic and ICC NextGen.

Pragma Support Do NOT assume ICC or GCC pragmas are supported by ICC NextGen! ICC Classic had many proprietary Intel pragmas. Excluding OpenMP pragmas, only a subset of these Intel pragmas are supported in ICC NextGen. Thus,

March 6, 2020 5 WE RECOMMEND AS A FIRST STEP THAT YOU CHECK FOR UNSUPPORTED PRAGMAS! this can be done with the ICC NextGen supported option -Wunknown-pragmas: icc -qnextgen -Wunknown-pragmas

Consider this example: cat unknown-pragmas.c int main(void) { float arr[1000];

#pragma totallybogus #pragma simd #pragma vector for (int k=0; k<1000; k++) { arr[k] = 42.0; } }

icc -c -qnextgen -Wunknown-pragmas unknown-pragmas.c unknown-pragmas.c:4:9: warning: unknown pragma ignored [-Wunknown-pragmas] #pragma totallybogus ^ unknown-pragmas.c:5:9: warning: unknown pragma ignored [-Wunknown-pragmas] #pragma simd ^ 2 warnings generated.

Notice two things in this example. First “#pragma totallybogus” is a pragma that does not exist in ICC Classic, GCC, or ICC NextGen. It makes sense this pragma is called out as a warning. In the second case, “#pragma simd” WAS a supported pragma for ICC Classic. This pragma is NOT support in ICC NextGen. ICC NextGen will ignore this pragma and not do what the user expects from ICC Classic ( pragma SIMD should be replaced with OpenMP SIMD pragmas ). In the final case, “#pragma vector” is recognized and implemented by ICC NextGen and therefore there is no warning.

Predefined Macro Support Predefined macros documentaon is a work in progress. Macros are being added dynamically. For any given version of ICC NextGen use command: icc -qnextgen -x c /dev/null -dM –E to output the currently defined macros.

Built-In Funcons Clang* Built-In funcons are documented in the open source Clang documentaon. Intel-added Built-Ins for ICC NextGen is a work in progress.

March 6, 2020 6 Support for Pre-Compiled Header Files ICC NextGen supports creation of “relocatable” precompiled headers: These are built with a given path into your build directory, to be used later from an installed location. The option: --relocatable-pch enables this feature. More information on this can be found at https://clang.llvm.org/docs/ UsersManual.html#relocatable-pch-files.

This is a big improvement over ICC Classic, which had limitations with pre-compiled headers. The limitations of ICC Classic are documented at https://software.intel.com/en-us/articles/ unable-to-obtain-mapped-memory.

ICC NextGen uses the Clang method of creang and using Pre-Compiled Headers (PCH). It is a 2-step process, one to create and one to use. To create PCH: ( linux icc example, similar for Windows ) icc -qnextgen -x c-header file.h // creates file.h.gch To use PCH: icc -qnextgen -include-pch file.h.gch file.c // uses PCH file when compiling file.c

Changes in Diagnoscs Opons and Diagnosc Message Numbering Current supported compiler diagnosc opons: Linux opon Windows opon Replacement -diag- /Qdiag Not supported, details below -diag-dump /Qdiag-dump Not supported -diag- /Qdiag- Not supported but under enable=power enable:power consideraon -diag-error-limit /Qdiag-error-limit -fmax-errors= -diag-file /Qdiag-file -serialize-diagnoscs <filename arg> -diag-file-append /Qdiag-file-append Not supported -diag-id-numbers /Qdiag-id-numbers Not supported -diag-once /Qdiag-once Not supported

The diag- opon is not supported and same for the numeric diagnosc messages. The Intel C++ NextGen compiler, based on LLVM technology, classifies diagnosc messages using descripve phrases. The clang manual shows you the list of descripve phrases which can be used to enable or disable the diagnosc. See hps://clang.llvm.org/docs/DiagnoscsReference.html. Note: As you can see from the above table, equivalent diagnosc control opons exist in both the Linux compiler and the Windows compiler. This secon will use the Linux spelling of the opon name. The same technique can be used to migrate the Windows diagnosc control opons. For example, consider this test case, the file unknown-pragma.c contains this line:

March 6, 2020 7 #pragma unknown_pragma Compiling with icc gives this warning message:

icc -c unknown-pragma.c unknown-pragma.c(1): warning #161: unrecognized #pragma #pragma unknown_pragma

The diagnosc warning #161 can be silenced by disabling that warning via the ICC Classic opon – diag-disable:161 to disable the unrecognized pragma diagnosc.

However, ICC NextGen does not have numbered diagnosc message. Instead, it will print a hint about which diagnosc opon can be used to control the diagnosc. You can use –Wall to enable all warning

icc -qnextgen -Wall -c ~/unknown-pragma.c unknown-pragma.c:1:9: warning: unknown pragma ignored [-Wunknown-pragmas] #pragma unknown_pragma diagnoscs that pertain to your program. The warning message suggests the opon that you can use to enable or disable that diagnosc. In ICC NextGen, the unknown pragma diagnosc is silent by default. To enable it, you can use –Wunknown-pragmas. To disable it, use –Wno-unknown-pragmas. Incidentally, you can always use the –Wno- prefix to disable any diagnosc.

To migrate your applicaon’s diagnosc control opons from exisng ICC Classic projects, you will need to build your source with ICC NextGen and read through the diagnosc output, look for the suggested -Wno- opons which disable the diagnoscs that you don’t wish to see and modify your build procedures to use those opons.

To increase the severity of the diagnosc from warning to error, use –Werror=unknown-pragmas. This corresponds to ICC Classic opon –diag-error:161. ICC NextGen provides no method to decrease the severity of error messages.

About Clang Enhanced Diagnoscs The creators of the Clang compiler have put substanal effort in creang more useful diagnosc messages. You will find that the Clang diagnoscs are an improvement in several ways: 1. Colorized diagnoscs to make the diagnosc more readable, clearly disnguishing between program source text and diagnosc text. 2. Precise source locaon informaon, including line and column number, as well as, range highlighng for related text. 3. Fix-it hints, suggesng how to correct the issue being reported. 4. Enhanced syntax error recovery, so that the issue can be reported exactly, as well as allowing compilaon to connue to find further issues. 5. And much more.

March 6, 2020 8 More informaon about clang’s expressive diagnoscs can be found at hps://clang.llvm.org/ diagnoscs.html.

Linking, IPO and PGO changes The ICC NextGen compiler has different methods for both Interprocedural Opmizaons (IPO) and Profile Guided Opmizaons (PGO). If you are using these features, be aware of the following: • PGO: LLVM uses a completely different method for PGO. See the Arcle Here for more informaon.

• IPO: LLVM uses “Link Time Opmizaon” (LTO) technology for what ICC Classic termed “Interprocedural Opmizaon” (IPO).

o More informaon on LLVM LTO is found in the Arcle Here o Intel linker “xi*” tools removed: Intel tools ‘xilink’ ‘xild’ and ‘xiar’ have thus been removed from ICC NextGen as they serve no purpose – they are specific to the proprietary object file formats used by the Intel® Compilers.

o In your Makefiles or Project Sengs if you have used ‘xilink’ or ‘xild’ replace these with the equivalent nave linkers. Similarly replace ‘xiar’ with ‘ar’ or similar archive tool.

Language Features 1. Intel® Cilk™ Plus will not be supported in ICC NextGen compiler. Customers are expected to port their program from Intel Cilk Plus to OpenMP or Intel® TBB. You can use this KB arcle on IDZ as reference: https://software.intel.com/en-us/articles/migrate-your-application-to-use- openmp-or-intelr-tbb-instead-of-intelr-cilktm-plus a. This include #PRAGMA SIMD, which appears in many ICC Classic-tuned codes from that era. PRAGMA SIMD should be replaced with OpenMP SIMD pragmas.

Intrinsic Usage Model Change • ICC NextGen does type checking for arguments to intrinsics when inlining whereas ICC Classic did not. Thus you may get warnings or errors from ICC NextGen about arguments to intrinsics that did not appear in ICC Classic. • ICC Classic does not demand using immintrin.h header file, as long as, we define __INTEL_COMPILER_USE_INTRINSIC_PROTOTYPES macro. • ICC Classic does not demand enabling specific processor/architecture specific compiler opon to use corresponding intrinsic. • These two above behaviors ARE NOT PROVIDED BY ICC NEXTGEN!! • How to use intrinsic based code with ICC NextGen • Use the compiler opon –march or –m, -x -ax for the compiler to recognize the processor/architecture specific intrinsic. Geng ICC Classic compability with respect to intrinsics is being evaluated; keep checking the Release Notes. • Include immintrin.h header file which comes with the intrinsic declaraons.

March 6, 2020 9 Example of LLVM Intrinsics Handling Differences Below is a simple example of type checking that ICC NextGen will do correctly compared to ICC Classic which will let a argument error pass: cat sample_mm_prefetch.c #include

#define CACHE_LINE_SIZE 64

__attribute__((always_inline)) inline void Prefetch_Block(const void* addr, size_t sz, int hint) { char* pref_addr = (char*)addr; size_t pref_iters = (sz + CACHE_LINE_SIZE - 1) / CACHE_LINE_SIZE;

for (int i = 0; i < pref_iters; i++) { _mm_prefetch(pref_addr, hint /*_MM_HINT_T1*/); pref_addr += CACHE_LINE_SIZE; }

$ icc -c sample_mm_prefetch.c

$ icc -qnextgen -c sample_mm_prefetch.c sample_mm_prefetch.c:13:9: error: argument to '__builtin_prefetch' must be a constant integer _mm_prefetch(pref_addr, hint /*_MM_HINT_T1*/); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /nfs/pdx/disks/cts2/tools/compiler/cpro/Compiler/19.1/initial/ compilers_and_libraries_2020.0.166/linux/lib/clang/10.0.0/include/ xmmintrin.h:2103:31: note: expanded from macro '_mm_prefetch' #define _mm_prefetch(a, sel) (__builtin_prefetch((void *)(a), \ ^ 1 error generated. compilation aborted for sample_mm_prefetch.c (code 1)

In this case the argument to_mm_prefetch should be a CONST, although the documentaon to intrinsic mm_prefetch does not specify this, this intrinsic is defined for a const argument. Noce that ICC Classic did not do the type checking whereas ICC NextGen did (correctly).

Below is a simple example demonstrang the above change. (Note: The error diagnoscs are currently wrong when it comes to ISA recommendaon which is reported to open source community for fixing.)

$ cat intrinsic.cpp #include #include //ICC NextGen needs the include using namespace std; void add_sse(float *a, int N){ __m128 x, y;

March 6, 2020 10 y = _mm_set_ps1(1.f); for(int i = 0; i < N/4; i++) { x = _mm_load_ps(a); x = _mm_add_ps(x, y); _mm_store_ps(a, x); a+=4; } } void add_avx(float *a, int N){ __m256 x, y; y = _mm256_set_ps(1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f); for(int i = 0; i < N/8; i++) { x = _mm256_load_ps(a); x = _mm256_add_ps(x, y); _mm256_store_ps(a, x); a+=8; } } void add_avx512(float *a, int N){ __m512 x, y; y = _mm512_set_ps(1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f); for(int i = 0; i < N/16; i++) { x = _mm512_load_ps(a); x = _mm512_add_ps(x, y); _mm512_store_ps(a, x); a+=16; } } int main(){ float a[32]; for(int i = 0; i < 32; i++) a[i] = i; #ifdef SSE add_sse(a,32); #elif AVX add_avx(a,32); #else add_avx512(a,32); #endif std::cout<<"a[15] = "<

The above code just compiles fines in ICC Classic, but not with ICC NextGen compiler as seen below:

$ icpc -qnextgen intrinsic.cpp -DSSE intrinsic.cpp:19:13: error: always_inline function '_mm256_set_ps' requires target feature 'sse4.2', but would be inlined into function 'add_avx' that is compiled without support for 'sse4.2'

March 6, 2020 11 y = _mm256_set_ps(1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f); ^ intrinsic.cpp:22:21: error: always_inline function '_mm256_load_ps' requires target feature 'sse4.2', but would be inlined into function 'add_avx' that is compiled without support for 'sse4.2' x = _mm256_load_ps(a); ^ intrinsic.cpp:23:21: error: always_inline function '_mm256_add_ps' requires target feature 'sse4.2', but would be inlined into function 'add_avx' that is compiled without support for 'sse4.2' x = _mm256_add_ps(x, y); ^ intrinsic.cpp:24:17: error: always_inline function '_mm256_store_ps' requires target feature 'sse4.2', but would be inlined into function 'add_avx' that is compiled without support for 'sse4.2' _mm256_store_ps(a, x); ^ 4 errors generated. compilation aborted for intrinsic.cpp (code 1)

Once we use the –mavx to enable Intel® AVX ISA, then the error pops up for AVX512 intrinsics.

$ icpc -qnextgen intrinsic.cpp -DSSE -mavx intrinsic.cpp:31:13: error: always_inline function '_mm512_set_ps' requires target feature 'avx2', but would be inlined into function 'add_avx512' that is compiled without support for 'avx2' y = _mm512_set_ps(1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f, 1.f); ^ intrinsic.cpp:34:21: error: always_inline function '_mm512_load_ps' requires target feature 'avx2', but would be inlined into function 'add_avx512' that is compiled without support for 'avx2' x = _mm512_load_ps(a); ^ intrinsic.cpp:35:21: error: always_inline function '_mm512_add_ps' requires target feature 'avx2', but would be inlined into function 'add_avx512' that is compiled without support for 'avx2' x = _mm512_add_ps(x, y); ^ intrinsic.cpp:36:17: error: always_inline function '_mm512_store_ps' requires target feature 'avx2', but would be inlined into function 'add_avx512' that is compiled without support for 'avx2' _mm512_store_ps(a, x); ^ 4 errors generated. compilation aborted for intrinsic.cpp (code 1)

Enable the Intel® AVX-512 ISA using –mavx512f compiler opon and the error goes away.

March 6, 2020 12 Intrinsics Via Funcon Definion __attribute__((target())) In the above example we used a compiler opon to target a specific instrucon set ( -mavx512f ). This can be used if just one instrucon set exists in the source file. Oen, source files will contain mulple instrucon sets represented in intrinsic data declaraons and intrinsic instrucons. This is done to call specific funcons or code secons based on the runme processor discovery. Typically, these funcons or code secons are protected by #IFDEFs with specific target architectures and the user code does processor dispatch to these secons or funcons.

The Clang/LLVM community highly encourages users to mark funcon definions using the gcc-style aribute target:

__attribute__((target())) to mark funcons containing intrinsics that are intended to be executed on specific target architectures instead of relying on the default processor targeng. Use of this aribute will provide significantly beer compile me error checking. This requires pung code for each specific target architecture into separate funcons and applying the target aribute to the funcon definion. The aribute promotes documenng the intrinsics level for the funcon and the set of intrinsics that should be allowed within that funcon. For more informaon on aribute target and gcc-style funcon mul-versioning please read these references: • Aribute target: Clang aribute target is documented here • Mul-versioning: GCC-style mulversioning is documented here and also HERE

An example of Mul-versioning:

#include

__attribute__ ((target("avx2"))) void dispatch_func() { printf("\nCode for Intel Core processors supporting Intel AVX2 goes here\n"); }

__attribute__ ((target("sse4.2"))) void dispatch_func() { printf("\nCode for Intel Core processors supporting SSE4.2 goes here\n"); }

__attribute__ ((target("sse3"))) void dispatch_func() { printf("\nCode for Intel Core 2 Duo processors supporting SSSE3 goes here\n"); }

__attribute__ ((target("default"))) void dispatch_func() { printf("\nCode for default implementation goes here\n");

March 6, 2020 13 }; int main() { dispatch_func(); printf("Return from dispatch_func\n"); return 0; }

Intel Proprietary Processsor Targeng Pragmas and Funcons Support

• Intel proprietary pragmas “opmizaon_parameter *”: #pragma [intel] optimization_parameter target_arch= #pragma [intel] optimization_parameter inline-max-total-size=n #pragma [intel] optimization_parameter inline-max-per-routine=n These pragmas are not supported in ICC NextGen. Please replace with __attribute__((target())) as previously described in this document. • Intel proprietary intrinsic funcon _may_i_use_cpu_feature() is supported and may be used.

• Intel proprietary intrinsic funcon _allow_cpu_features(): As of 2019 we currently do not support _allow_cpu_features(), but it may be added someme in 2020. Keep checking for availability of this funcon.

• Intel proprietary “#pragma simd” was removed from ICC Classic 19.1 and is not supported in ICC NextGen. Replace with OpenMP SIMD “#pragma omp simd” and it’s clauses and compile with -fiopenmp or -fiopenmp-simd

Legacy Intrinsics Promoon with Opon intrinsic-promote: Not recommended but available. Finally, for legacy applicaons with ICC Classic style intrinsics, the ICC NextGen compiler provides a new opon. We discourage use of this opon as it is error-prone. This opon aempts to automacally promote funcons containing intrinsics to the maximum target architecture of the intrinsics inside that funcon. A funcon containing secons with differing targeng (user processor dispatched, for example) can cause runme faults. For this reason, we do not encourage this opon. We are working on beer long-term soluons for legacy ICC Classic intrinsics behavior.

Windows syntax: /Qintrinsic-promote

Linux syntax: -mintrinsic-promote

If this opon is used, funcons containing calls to intrinsics that require a specific CPU feature will have their target architecture automacally promoted to allow the required feature. All code within the funcon will be compiled with that target architecture and the resulng code for such funcons will not

March 6, 2020 14 execute correctly on processors that do not support the required feature. The user is responsible for guarding the execuon path at run me so that such funcons are not dynamically reachable when the program is run on processors that do not support the required feature.

This opon is provided as a convenience for compiling legacy code. Users are strongly encouraged to use __attribute__((target())) to mark funcons that are intended to be executed on specific target architectures instead. Use of this aribute will provide significantly beer compile me error checking.

Floang Point Reproducibility Controls The current state of the floang point (FP) model support in ICC NextGen: • The default FP model is value-safe, equivalent to -fp-model precise -no-fma. This is major behavioral difference between ICC Classic and ICC NextGen. We believe it to be a good customer friendly difference. (note: April 9 2020: this is sll the default for beta05. But in a future beta we may change this to -fp-model precise -fma. Please check Release Notes for the latest default.)

• -fp-model fast is supported. There is no difference between fast=1 and fast=2 currently.

• FP Strictness: Nothing stricter than the default is supported. There is no support for -fp-model strict, -fp-speculation=safe, #pragma fenv_access, etc. Implemenng support for these is a work-in-progress in the open source community.

• The math related features in ICC Classic are currently being ported to ICC NextGen. We have implemented the IMF (Intel® Math Library) aributes in ICC NextGen. Brutus or Biseconal Opmizaon Support If you have never heard of “Brutus” in ICC Classic or Biseconal Opmizaons in Clang/LLVM, you may skip this secon.

ICC NextGen has support for Clang/LLVM –opt-bisect-limit=N for biseconal opmizaon debug. This is instead of ICC Classic Brutus opons. A community effort is underway to enhance opmizaon debugging capabilies in Clang/LLVM via the open source community. More informaon at hps://llvm.org/docs/OptBisect.html.

Appendix I: ICC Classic Compiler Opons Status

Opons which are not planned to be implemented, but could be considered if requested or needed by an applicaon Linux Windows -finline-limit /Qinline-factor

-keep-inline-funcons /Qinline-max-per-compile

March 6, 2020 15 -finstrument-funcons /Qinline-max-per-roune

-gcc-extern-inline /Qinline-max-size

-inline-level /Qinline-max-total-size

-inline-factor /Qinline-min-size

-inline-max-per-compile /Qinstrument-funcons

-inline-max-per-roune /Qipo

-inline-max-size /Qprof-data-order

-inline-max-total-size /Qprof-order

-inline-min-size /Qprof-value-profiling

-inline-min-caller-growth

-ipo

-prof-data-order

-prof-value-profiling

Opons under consideraon but not implemented currently

Linux Windows -auto-ilp32 /Qauto-ilp32 -auto-p32 -Ofast -diag-enable=power /Qdiag-enable:power

Opons which will not be implemented Linux Windows -comp-obj /Qcomp-obj -from-obj /Q_ipo-list -from-rtn /Q_prof-loop -i_ipo_list /Qfrom-obj -i_prof-loop /Qto-obj -inl-heur /Qhprof-dir -ip /Qhprof-exe

March 6, 2020 16 -ip-no-inlining /Qhprof-file -ip-no-pinlining /Qinl-heur -no-ipo /Qip -ipo-c /Qip-no-inlining -ipo-il /Qip-no-pinlining -ipo-jobs /Qipo-c -ipo-S /Qipo-il -ipo-save /Qipo-jobs -ipo-separate /Qipo-S -num-case /Qipo-save -num-obj /Qipo-separate -num-opt /Qnum-case -num-rtn /Qnum-obj -override-limits /Qnum-opt -prof-file /Qnum-rtn -prof-gen /Qopt-overridelimits -prof-hpi /Qprof-file -prof-use /Qprof-gen -prof-gen-sampling /Qprof-gen-sampling -prof-use-sampling /Qprof-hpi -profile-funcons /Qprofile-funcons -profile-loops /Qprofile-loops -profile-loops-report /Qprofile-loops-report -tcollect /Qprof-use -tcollect-filter /Qprof-use-sampling -to-obj /Qtcollect -to-rtn /Qtcollect-filter /Qto-obj /Qto-rtn -diag- /Qdiag -diag-dump /Qdiag-dump -diag-file-append /Qdiag-file-append -diag-id-numbers /Qdiag-id-numbers

March 6, 2020 17 -diag-once /Qdiag-once

New Opons to ICC NextGen (not supported by Clang currently) Linux Windows -mintrinsic-promote /Qinstrinsic-promote

March 6, 2020 18