Intrinsics Lecture 1
Manfred Liebmann Technische Universit¨at M¨unchen Chair of Optimal Control Center for Mathematical Sciences, M17 [email protected]
January 12, 2016 Manfred Liebmann January 12, 2016
Programming with Intrinsics
What are intrinsics?
Intrinsics are functions that the compiler replaces with the proper assembly instructions. Intrinsics are primarily used to access the vector processing capabilities of modern CPUs.
Long history of Intrinsics • – MMX : Multi Media Extensions 8 x 64bit (1997) – SSE/SSE2/SSE3/SSSE3/SSE4.x : Streaming SIMD Extensions 8 x 128bit (1999) – AVX/AVX2/FMA : Advanced Vector Extensions 16 x 256 bit (2008) – AVX-512/KNC : Advanced Vector Extensions 32 x 512 bit (2012)
Intrinsics 1 Manfred Liebmann January 12, 2016
Choose the Right Header!
Intrinsics are supported by all modern C/C++ compilers.
Every generation has its own header! • – #include
Intrinsics 2 Manfred Liebmann January 12, 2016
Advanced Vector Extensions (AVX)
Intel Advanced Vector Extensions (AVX) is a set of instructions for doing Single Instruction Multiple Data (SIMD) operations on Intel architecture CPUs. These instructions extend the previous SIMD o↵erings, MMX instructions and Intel Streaming SIMD Extensions (SSE).
Intel Intrinsics Guide
https://software.intel.com/sites/landingpage/IntrinsicsGuide/
Complete interactive reference for all intrinsic functions!
Instruction Set Architecture (ISA) Extensions
https://software.intel.com/en-us/isa-extensions
Intrinsics 3 Manfred Liebmann January 12, 2016
Intel AVX Su x Markings
All modern C++ compilers support the same intrinsic operations to simplify using Intel AVX from C or C++ code. Intrinsics are functions that the compiler replaces with the proper assembly instructions. Most Intel AVX intrinsic names follow the following format:
_mm256_op_suffix(data_type param1, data_type param2, data_type param3) where mm256 is the prefix for working on the new 256-bit registers; op is the operation, like add for addition or sub for subtraction; and su x denotes the type of data to operate on, with the first letters denoting packed (p), extended packed (ep), or scalar (s). The remaining letters are the types given in the table below.
Su x Markings • [s/d] : Single- or double-precision floating point [i/u]nnn : Signed or unsigned integer of bit size nnn, where nnn is 128, 64, 32, 16, or 8 [ps/pd/sd] : Packed single, packed double, or scalar double epi32 : Extended packed 32-bit signed integer si256 : Scalar 256-bit integer
Intrinsics 4 Manfred Liebmann January 12, 2016
Intel AVX Intrinsics Data Types
Data Types • m256 : 256-bit as eight single-precision floating-point values m256d : 256-bit as four double-precision floating-point values m256i : 256-bit as integers, (bytes, words, etc.) m128 : 128-bit single precision floating-point (32 bits each) m128d : 128-bit double precision floating-point (64 bits each)
Figure 1: Intel AVX and Intel SSE data types
Intrinsics 5 Manfred Liebmann January 12, 2016
Mandelbrot Set Code Example
Pseudocode for calculating the Mandelbrot set.
z,p are complex numbers for each point p on the complex plane z=0 for count = 0 to max_iterations if abs(z) > 2.0 break z = z*z+p set color at p based on count reached
Intrinsics 6 Manfred Liebmann January 12, 2016
Mandelbrot Set Visualization
Figure 2: Mandelbrot set 0.29768 + 0.48354i to 0.29778 + 0.48364i with 4096 max iterations
Intrinsics 7 Manfred Liebmann January 12, 2016
Simple Mandelbrot C++ STL Code
#include
float dx = (x2-x1)/width, dy = (y2-y1)/height; for (int j = 0; j < height; ++j) { for (int i = 0; i < width; ++i) { complex
Intrinsics 8 Manfred Liebmann January 12, 2016
Mandelbrot Set Benchmark
Cores STL FPU AVX 1 63.5186 11.9445 1.64415 2 50.1687 9.42479 1.26957 4 42.7716 8.02288 1.05672 8 23.2062 4.34219 0.569152 16 13.9921 2.62823 0.345063
Table 1: Total runtimes in seconds for the Mandelbrot set benchmark with a 2048 x 2048 grid on 2x Intel Xeon E5-2650 @ 2.00GHz.
Intrinsics 9