Intel(R) Architecture Instruction Set Extensions Programming Reference

Intel® Architecture Instruction Set Extensions Programming Reference 319433-012 FEBRUARY 2012 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANT- ED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "MISSION CRITICAL APPLICATION" IS ANY APPLICATION IN WHICH FAILURE OF THE INTEL PRODUCT COULD RESULT, DIRECTLY OR INDIRECTLY, IN PERSONAL INJURY OR DEATH. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFIC- ERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LI- ABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICA- TION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "un- defined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The Intel® 64 architecture processors may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Intel® Hyper-Threading Technology (Intel® HT Technology) is available on select Intel® Core™ processors. Requires an Intel® HT Technology-enabled system. Consult your PC manufacturer. Performance will vary depending on the specific hardware and software used. For more information including details on which processors support HT Technology, visit http://www.intel.com/info/hyperthreading. Intel® Virtualization Technology requires a computer system with an enabled Intel® processor, BIOS, and virtual machine monitor (VMM). Functionality, performance or other benefits will vary depending on hardware and software configurations. Software applications may not be compatible with all operating systems. Consult your PC manufacturer. For more information, visit http://www.intel.com/go/virtualization. Intel® 64 architecture requires a system with a 64-bit enabled processor, chipset, BIOS and software. Per- formance will vary depending on the specific hardware and software you use. Consult your PC manufacturer for more information. For more information, visit http://www.intel.com/info/em64t. Intel, Pentium, Intel Atom, Intel Xeon, Intel NetBurst, Intel Core, Intel Core Solo, Intel Core Duo, Intel Core 2 Duo, Intel Core 2 Extreme, Intel Pentium D, Itanium, Intel SpeedStep, MMX, and VTune are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. *Other names and brands may be claimed as the property of others. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm Copyright © 1997-2012 Intel Corporation ii Ref. # 319433-012 CONTENTS PAGE CHAPTER 1 INTEL® ADVANCED VECTOR EXTENSIONS 1.1 About This Document . 1-1 1.2 Overview . 1-1 1.3 Intel® Advanced Vector Extensions Architecture Overview . 1-2 1.3.1 256-Bit Wide SIMD Register Support . 1-2 1.3.2 Instruction Syntax Enhancements . 1-3 1.3.3 VEX Prefix Instruction Encoding Support . 1-4 1.4 Overview AVX2. 1-4 1.5 Functional Overview . 1-5 1.5.1 256-bit Floating-Point Arithmetic Processing Enhancements . 1-5 1.5.2 256-bit Non-Arithmetic Instruction Enhancements. 1-5 1.5.3 Arithmetic Primitives for 128-bit Vector and Scalar processing . 1-6 1.5.4 Non-Arithmetic Primitives for 128-bit Vector and Scalar Processing . 1-6 1.5.5 AVX2 and 256-bit Vector Integer Processing. 1-7 1.6 General Purpose Instruction Set Enhancements. 1-8 1.7 Intel® Transactional Synchronization Extensions . 1-8 CHAPTER 2 APPLICATION PROGRAMMING MODEL 2.1 Detection of PCLMULQDQ and AES Instructions. 2-1 2.2 Detection of AVX and FMA Instructions . 2-1 2.2.1 Detection of FMA . 2-3 2.2.2 Detection of VEX-Encoded AES and VPCLMULQDQ. 2-4 2.2.3 Detection of AVX2 . 2-6 2.2.4 Detection of VEX-encoded GPR Instructions . 2-7 2.3 Fused-Multiply-ADD (FMA) Numeric Behavior . 2-7 2.3.1 FMA Instruction Operand Order and Arithmetic Behavior . 2-11 2.4 Accessing YMM Registers. 2-12 2.5 Memory alignment . 2-13 2.6 SIMD floating-point ExCeptions . 2-15 2.7 Instruction Exception Specification. 2-15 2.7.1 Exceptions Type 1 (Aligned memory reference) . 2-21 2.7.2 Exceptions Type 2 (>=16 Byte Memory Reference, Unaligned) . 2-22 2.7.3 Exceptions Type 3 (<16 Byte memory argument). 2-23 2.7.4 Exceptions Type 4 (>=16 Byte mem arg no alignment, no floating-point exceptions) . 2-24 2.7.5 Exceptions Type 5 (<16 Byte mem arg and no FP exceptions). 2-25 2.7.6 Exceptions Type 6 (VEX-Encoded Instructions Without Legacy SSE Analogues). 2-26 2.7.7 Exceptions Type 7 (No FP exceptions, no memory arg). 2-27 2.7.8 Exceptions Type 8 (AVX and no memory argument) . 2-27 2.7.9 Exception Type 11 (VEX-only, mem arg no AC, floating-point exceptions) . 2-28 2.7.10 Exception Type 12 (VEX-only, VSIB mem arg, no AC, no floating-point exceptions). 2-29 2.7.11 Exception Conditions for VEX-Encoded GPR Instructions . 2-30 2.8 Programming Considerations with 128-bit SIMD Instructions . 2-31 i Ref. # 319433-012 2.8.1 Clearing Upper YMM State Between AVX and Legacy SSE Instructions . 2-32 2.8.2 Using AVX 128-bit Instructions Instead of Legacy SSE instructions . 2-33 2.8.3 Unaligned Memory Access and Buffer Size Management . 2-33 2.9 CPUID Instruction . 2-34 CPUID—CPU Identification . 2-34 CHAPTER 3 SYSTEM PROGRAMMING MODEL 3.1 YMM State, VEX Prefix and Supported Operating Modes . 3-1 3.2 YMM State Management . 3-2 3.2.1 Detection of YMM State Support. 3-2 3.2.2 Enabling of YMM State . 3-2 3.2.3 Enabling of SIMD Floating-Exception Support . 3-3 3.2.4 The Layout of XSAVE Area . 3-4 3.2.5 XSAVE/XRSTOR Interaction with YMM State and MXCSR. 3-5 3.2.6 Processor Extended State Save Optimization and XSAVEOPT . 3-7 3.2.6.1 XSAVEOPT Usage Guidelines . 3-8 3.3 Reset Behavior . 3-9 3.4 Emulation. 3-9 3.5 Writing AVX floating-point exception handlers. 3-9 CHAPTER 4 INSTRUCTION FORMAT 4.1 Instruction Formats . 4-1 4.1.1 VEX and the LOCK prefix . 4-2 4.1.2 VEX and the 66H, F2H, and F3H prefixes. 4-2 4.1.3 VEX and the REX prefix . 4-2 4.1.4 The VEX Prefix . 4-2 4.1.4.1 VEX Byte 0, bits[7:0]. ..

Intel(R) Architecture Instruction Set Extensions Programming Reference

Elementary Functions: Towards Automatically Generated, Efficient

Andre Heidekrueger

Hierarchical Roofline Analysis for Gpus: Accelerating Performance

NMOS 6510 Unintended Opcodes No More Secrets (V0.95 - 24/12/20)

Malware Detection Advances in Information Security

Theoretical Peak FLOPS Per Instruction Set on Modern Intel Cpus

Introduction to Intel Scalable Architectures

The RISC-V Instruction Set Manual, Volume II: Privileged Architecture, Version 1.10”, Editors Andrew Waterman and Krste Asanovi´C,RISC-V Foundation, May 2017

Intel® Architecture Instruction Set Extensions and Future Features

Intel(R) Advanced Vector Extensions Programming Reference

Effective Vectorization with Openmp 4.5

Vasm Assembler System