
Intel instruction set extensions Continue From WikiChip extensions - x86 X86 ISA has gone through numerous iterations that have added new instructions for specific tasks. These collections of new instructions are grouped into extensions. Different microprocessor models have different levels of support for certain extensions. The X86 ISA review has been developed over forty years. Various extensions have been proposed and implemented by various vendors to improve the functionality of the basic set of instructions. Timeline (edit) Experimental extensions (edit) Reverse compatibility (edit) Generally speaking, all extensions are supported from their introductory date to today. The extensions introduced by AMD K6-2 (i.e. 3DNow! and E3DNow!) and those introduced by the AMD bulldozer (i.e. FMA4, XOP, LWP, and then TBM) are out of date. Note that zen, at least for the first step, still offer FMA4 support, even if it is not specified by CPUID. See also extensions to the x86 microprocessor set architecture for Microprocessors from Intel and AMD Advanced Vector Extensions (AVX, also known as Sandy Bridge New Extensions) are extensions to the x86 microprocessor set architecture for microprocessors from Intel and AMD, proposed by Intel in March 2008 and first supported by Intel with the Sandy Bridge processor in the first quarter of 2011, followed by AMD with the delivery of the Bulldozer processor in the third quarter of 2011. AVX provides new features, new instructions and a new coding scheme. AVX2 (also known as Haswell New Instructions) expands most additional commands to 256 bits and introduces fusion multiplication (FMA) operations. They were first supported by Intel with the Haswell processor, which comes in 2013. The AVX-512 extends the AVX to 512-bit support with a new EVEX prefix proposed by Intel in July 2013 and first supported by Intel with the Knights Landing processor, which was shipped in 2016. AVX uses sixteen YMM registers to perform a single instruction on multiple parts of the data (see SIMD). Each YMM register can hold and do simultaneous operations (mathematics) on: eight 32-bit single-point floating dots or four 64-bit two-point floating point numbers. The width of THE SIMD registers has been increased from 128 bits to 256 bits and renamed from XMM0-XMM7 to YMM0-YMM7 (in x86-64 mode, from XMM0-XMM15 to YMM0-YMM15). Outdated SSE instructions can still be used via the VEX set-top box to work on the lower 128 bits of YMM registers. The AVX-512 registration scheme as an extension from AVX (YMM0-YMM15) and SSE (XMM0-XMM15) registers 511 256 255 128 127 0 MMM0 YMM0 MMMM1 YMM1 YMM1 XMM1 MM2 YMM2 MMM3 YMM3 X3MMMM XMM4 MM5 YMM5 XMM5 MMM5 MMM6 YMM6 XMM6 MM7 YMM7 XMM7 MM8 YMM8 MM8 MM9 XMM9 MMMM9 MMM10 YMM10 MMM10 MM11 YMM11 XMM11 XMM1 1MM1MM1 YMM12 YMM12 XMM12MM12MM12MM12MM12MM12MM12MM12MM12MM1 МММ14 YMM14 XMM14 МММ15 YMM15 XMM15 МММ16 YMM16 YMM16 MMM17 YMM17 XMM17 MM18 YMM18 XMM18 MM19 YMM19 XMM19 MM20 YMM20 XMM20 MM21 HM21 YMM21 XMM21 MM22MM22 XMM222MM22 YMM23 XMM23 XMM23 MM23 MM23 MMM24 YMM24 XMM22MM22 YMM25 XMM25 MMM26 YMM26 XMM26 MM27 YMM27 XMM27 MMM28 YMM28 XMM28 MM29 YMM29 YMM29 XMM2 9 YMM30 YMM30 XMM30 NOMM31 YMM1 XMM31 AVX presents a three-series SIMD training format where the appointments register differ from two sources. For example, the SSE instruction using the usual two-go form a a b can now use an indestructible three-strong form c q a b, preserving both sources of operands. The three-part AVX format is limited to SIMD operands (YMM) and does not include instructions with general registers (e.g. EAX). This support will be available for the first time in AVX2. The SIMD memory alignment requirement has been relaxed. The new VEX coding scheme introduces a new set of code prefixes that expands the opcode space, allows instructions to have more than two opernds and allows SIMD vector registers to be longer than 128 bits. VeX prefix can also be used on outdated SSE instructions, giving them a three-go form, and forcing them to interact more effectively with AVX instructions without the need for VSERUPPER and VEEROALL. The AVX instructions support both 128-bit and 256-bit SIMD. 128-bit versions can be useful for improving old code without the need to expand vectorization, and avoid punishment for moving from SSE to AVX, they are also faster on some early AMD AVX implementations. This mode is sometimes known as the AVX-128. New Instructions These AVX instructions are in addition to those that are 256-bit extensions of outdated 128-bit SSE instructions; Most of them can be worked on as 128-bit and 256-bit operands. Instruction Description VBROADCASTSSS, VBROADCASTSD, VBROADCASTF128 Copy 32-bit, 64-bit or 128-bit ram operands for all XMM or YMM vector register elements. VINSERTF128 replaces either the bottom half or the top half of the 256-bit YMM register with a 128-bit source. The other half of the destination remains unchanged. VEXTRACTF128 Removes either the bottom half or the top half of the 256-bit YMM register and replicates the value to a 128-bit opera. VMASKMOVPS, VMASKMOVPD Conditionally reads any number of items from SIMD vector memory to the destination register, leaving the remaining vector elements unread and placing the relevant elements in the destination register to zero. In addition, conditionally records any number of elements from the SIMD operand vector register to the operand vector memory, leaving the rest of the memory opera intact. On the AMD Jaguar processor architecture, this operand memory guide takes more than 300 hour cycles when the mask is zero, in which case the instruction should do nothing. This is be a design flaw. VPERMILPS, VPERMILPD Permute in-lane. Shuffle 32-bit or 64-bit vector elements of one entrance opera. It's in a 256-bit instruction band, which means they work on all 256 bits with two separate 128-bit shuffling, so they can't shuffle through 128-bit bands. VPERM2F128 Shuffle four 128-bit vector elements from two 256-bit original operas in the opera 256-bit destination, with an immediate constant as a selector. SEE ALSO Set all YMM registers to zero and mark them as unused. Used when switching between 128-bit usage and 256-bit use. TAKERURPER Set the top half of all YMM registers to zero. Used when switching between 128-bit usage and 256-bit use. Processors with Intel Sandy Bridge AVX processors, Sandy Bridge E 1 2011 processors, Ivy Bridge E processors, Ivy Bridge E processors, Haswell processors for the third quarter of 2013, Haswell E processors for 2013, Broadwell processors for 2014, Skylake processors for the fourth quarter of 2014, Broadwell E processors for the 3rd quarter of 2015, Broadwell E processors for the 3rd quarter of 2015, Broadlake Kaby Lake processors for 2016, Skylake-X processors for 2016/ No.1 2017 (desktop/mobile) Skylake-X processors, Coffee Lake processors for 2nd quarter 2017, Cannon Lake processors for 4 2017 quarter, Whisky Lake Processors for 2018, Cascade Lake processors for the third quarter of 2018, Ice Lake processors for the fourth quarter of 2018, 3rd quarter 201 201 Comet Lake Processor (branded Core only) , Tiger Lake 2019 processor, Rocket Lake 2020 processor, 2021 Alder Lake processor, 2021 Not all processors from the listed families support AVX. Typically, Core i3/i5/i7 processors support them, while Pentium and Celeron processors don't. AMD: Jaguar-based processors and new Puma-based processors and new Heavy Equipment bulldozer-based processors, 4 2011-based 2011 processors, Steamroller 2012- based processors, 2014 excavator-based processors and newer zen-based processors in 2015, zen-1 2017 processors, 2018-based processors, zen-3 processors, 2020 issues related to compatibility between future Intel and AMD processors are discussed as part of the XOP set of instructions. VIA: Nano quadcore Eden X4 zhaoxing: WuDaoKou-based processors (KX-5000 and KH-20000) compiler and support collector Absoft supports with the mavx flag. The Free Pascal compiler supports THE AVX and AVX2 with -CfAVX and -CfAVX2 switches from version 2.7.1. GNU Assembler 'inline assembly (GAS) features support these instructions (available through GCC), as do Intel's primitives and Intel's inline builder (closely compatible with GAS, albeit more common in handling local links within the inline code). GCC, starting with 4.6 there was a 4.3 branch with some support) and the Intel Compiler Suite, starting with version 11.1 support for AVX. The Open64 4.5.1 version supports the AVX with the mavx flag. PathScale is supported through the -mavx flag. The Vector Pascal compiler supports AVX through the -cpuAVX32 flag. Visual Visual 2010/2012 compiler supports AVX through internal and /arch:AVX switch. Other collectors such as the MASM VS2010 version, YASM, AVX operating system support adds a new register state through the 256-bit wide YMM register file, so clear operating system support is required to properly save and restore extended AVX registers between context switches. The following versions of the operating system support AVX: DragonFly BSD: support was added in early 2013. FreeBSD: Support added to the January 21, 2012 patch, which was included in the 9.1 Linux stable release: supported from the 2.6.30 kernel version, 17, released on June 9, 2009. macOS: Support is added to the 10.6.8 (Snow Leopard) update released on June 23, 2011. OpenBSD: Support added on March 21, 2015. Solaris: supported in Solaris 10 Update 10 and Solaris 11 Windows: supported in Windows 7 SP1, Windows Server 2008 R2 SP1, Windows 8, Windows 10 Windows Server 2008 R2 SP1 with Hyper-V requires a hot fix to support AMD AVX (Opteron 6200 and 4200 series), KB2568088 Advanced Vector Extensions 2 Advanced Vector Extensions 2 (AVX2), also known as Haswell New Instructions, is an extension of the AVX set of instructions presented in the micro-architecture Intel Haswell.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages2 Page
-
File Size-