Intel(R) Advanced Vector Extensions Programming Reference

Intel® Advanced Vector Extensions Programming Reference 319433-011 JUNE 2011 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANT- ED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS. Intel may make changes to specifications and product descriptions at any time, without notice. Developers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Improper use of reserved or undefined features or instructions may cause unpre- dictable behavior or failure in developer's software code when running on an Intel processor. Intel reserves these features or instructions for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from their unauthorized use. The Intel® 64 architecture processors may contain design defects or errors known as errata. Current char- acterized errata are available on request. Hyper-Threading Technology requires a computer system with an Intel® processor supporting Hyper- Threading Technology and an HT Technology enabled chipset, BIOS and operating system. Performance will vary depending on the specific hardware and software you use. For more information, see http://www.intel.com/technology/hyperthread/index.htm; including details on which processors support HT Technology. Intel® Virtualization Technology requires a computer system with an enabled Intel® processor, BIOS, virtual machine monitor (VMM) and for some uses, certain platform software enabled for it. Functionality, performance or other benefits will vary depending on hardware and software configurations. Intel® Virtualization Technology-enabled BIOS and VMM applications are currently in development. 64-bit computing on Intel architecture requires a computer system with a processor, chipset, BIOS, operating system, device drivers and applications enabled for Intel® 64 architecture. Processors will not operate (including 32-bit operation) without an Intel® 64 architecture-enabled BIOS. Performance will vary depending on your hardware and software configurations. Consult with your system vendor for more information. Intel, Pentium, Intel Atom, Intel Xeon, Intel NetBurst, Intel Core, Intel Core Solo, Intel Core Duo, Intel Core 2 Duo, Intel Core 2 Extreme, Intel Pentium D, Itanium, Intel SpeedStep, MMX, and VTune are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. *Other names and brands may be claimed as the property of others. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an ordering number and are referenced in this document, or other Intel literature, may be obtained from: Intel Corporation P.O. Box 5937 Denver, CO 80217-9808 or call 1-800-548-4725 or visit Intel’s website at http://www.intel.com Copyright © 1997-2011 Intel Corporation ii Ref. # 319433-011 CONTENTS PAGE CHAPTER 1 INTEL® ADVANCED VECTOR EXTENSIONS 1.1 About This Document . 1-1 1.2 Overview . 1-1 1.3 Intel® Advanced Vector Extensions Architecture Overview . 1-2 1.3.1 256-Bit Wide SIMD Register Support . 1-2 1.3.2 Instruction Syntax Enhancements . 1-3 1.3.3 VEX Prefix Instruction Encoding Support . 1-4 1.4 Overview AVX2. 1-4 1.5 Functional Overview . 1-5 1.5.1 256-bit Floating-Point Arithmetic Processing Enhancements . 1-5 1.5.2 256-bit Non-Arithmetic Instruction Enhancements. 1-5 1.5.3 Arithmetic Primitives for 128-bit Vector and Scalar processing . 1-6 1.5.4 Non-Arithmetic Primitives for 128-bit Vector and Scalar Processing . 1-6 1.5.5 AVX2 and 256-bit Vector Integer Processing. 1-7 1.6 General Purpose Instruction Set Enhancements. 1-8 CHAPTER 2 APPLICATION PROGRAMMING MODEL 2.1 Detection of PCLMULQDQ and AES Instructions. 2-1 2.2 Detection of AVX and FMA Instructions . 2-1 2.2.1 Detection of FMA . 2-3 2.2.2 Detection of VEX-Encoded AES and VPCLMULQDQ. 2-4 2.2.3 Detection of AVX2 . 2-6 2.2.4 Detection VEX-encoded GPR Instructions . 2-7 2.3 Fused-Multiply-ADD (FMA) Numeric Behavior . 2-7 2.3.1 FMA Instruction Operand Order and Arithmetic Behavior . 2-11 2.4 Accessing YMM Registers. 2-12 2.5 Memory alignment . 2-13 2.6 SIMD floating-point ExCeptions . 2-15 2.7 Instruction Exception Specification. 2-15 2.7.1 Exceptions Type 1 (Aligned memory reference) . 2-21 2.7.2 Exceptions Type 2 (>=16 Byte Memory Reference, Unaligned) . 2-22 2.7.3 Exceptions Type 3 (<16 Byte memory argument). 2-23 2.7.4 Exceptions Type 4 (>=16 Byte mem arg no alignment, no floating-point exceptions) . 2-24 2.7.5 Exceptions Type 5 (<16 Byte mem arg and no FP exceptions). 2-25 2.7.6 Exceptions Type 6 (VEX-Encoded Instructions Without Legacy SSE Analogues). 2-26 2.7.7 Exceptions Type 7 (No FP exceptions, no memory arg). 2-27 2.7.8 Exceptions Type 8 (AVX and no memory argument) . 2-27 2.7.9 Exception Type 11 (VEX-only, mem arg no AC, floating-point exceptions) . 2-28 2.7.10 Exception Type 12 (VEX-only, VSIB mem arg, no AC, no floating-point exceptions). 2-29 2.7.11 Exception Conditions for VEX-Encoded GPR Instructions . 2-30 2.8 Programming Considerations with 128-bit SIMD Instructions . 2-31 2.8.1 Clearing Upper YMM State Between AVX and Legacy SSE Instructions . 2-32 i Ref. # 319433-011 2.8.2 Using AVX 128-bit Instructions Instead of Legacy SSE instructions . 2-33 2.8.3 Unaligned Memory Access and Buffer Size Management . 2-33 2.9 CPUID Instruction . 2-34 CPUID—CPU Identification . 2-34 CHAPTER 3 SYSTEM PROGRAMMING MODEL 3.1 YMM State, VEX Prefix and Supported Operating Modes . 3-1 3.2 YMM State Management . 3-2 3.2.1 Detection of YMM State Support. 3-2 3.2.2 Enabling of YMM State . 3-2 3.2.3 Enabling of SIMD Floating-Exception Support . 3-3 3.2.4 The Layout of XSAVE Area . 3-4 3.2.5 XSAVE/XRSTOR Interaction with YMM State and MXCSR. 3-5 3.2.6 Processor Extended State Save Optimization and XSAVEOPT . 3-7 3.2.6.1 XSAVEOPT Usage Guidelines . 3-8 3.3 Reset Behavior . 3-9 3.4 Emulation. 3-9 3.5 Writing AVX floating-point exception handlers. 3-9 CHAPTER 4 INSTRUCTION FORMAT 4.1 Instruction Formats . 4-1 4.1.1 VEX and the LOCK prefix . 4-2 4.1.2 VEX and the 66H, F2H, and F3H prefixes. 4-2 4.1.3 VEX and the REX prefix . 4-2 4.1.4 The VEX Prefix . 4-2 4.1.4.1 VEX Byte 0, bits[7:0]. 4-4 4.1.4.2 VEX Byte 1, bit [7] - ‘R’ . 4-4 4.1.4.3 3-byte VEX byte 1, bit[6] - ‘X’. ..

Intel(R) Advanced Vector Extensions Programming Reference

07 Vectorization for Intel C++ & Fortran Compiler .Pdf

Effective Virtual CPU Configuration with QEMU and Libvirt

Analysis of SIMD Applicability to SHA Algorithms O

New Instruction Set Extensions

Elementary Functions: Towards Automatically Generated, Efficient

Operating Guide

Undocumented CPU Behavior: Analyzing Undocumented Opcodes on Intel X86-64 Catherine Easdon Why Investigate Undocumented Behavior? the “Golden Screwdriver” Approach

Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 3C: System Programming Guide, Part 3

Hyper-Threading Performance with Intel Cpus for Linux SAP Deployment on Proliant Servers

Andre Heidekrueger

SIMD Extensions

Hierarchical Roofline Analysis for Gpus: Accelerating Performance