Intel(R) Architecture Instruction Set Extensions Programming Reference

Total Page:16

File Type:pdf, Size:1020Kb

Intel(R) Architecture Instruction Set Extensions Programming Reference Intel® Architecture Instruction Set Extensions Programming Reference 319433-012 FEBRUARY 2012 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANT- ED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "MISSION CRITICAL APPLICATION" IS ANY APPLICATION IN WHICH FAILURE OF THE INTEL PRODUCT COULD RESULT, DIRECTLY OR INDIRECTLY, IN PERSONAL INJURY OR DEATH. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFIC- ERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LI- ABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICA- TION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "un- defined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The Intel® 64 architecture processors may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Intel® Hyper-Threading Technology (Intel® HT Technology) is available on select Intel® Core™ processors. Requires an Intel® HT Technology-enabled system. Consult your PC manufacturer. Performance will vary depending on the specific hardware and software used. For more information including details on which processors support HT Technology, visit http://www.intel.com/info/hyperthreading. Intel® Virtualization Technology requires a computer system with an enabled Intel® processor, BIOS, and virtual machine monitor (VMM). Functionality, performance or other benefits will vary depending on hard- ware and software configurations. Software applications may not be compatible with all operating systems. Consult your PC manufacturer. For more information, visit http://www.intel.com/go/virtualization. Intel® 64 architecture requires a system with a 64-bit enabled processor, chipset, BIOS and software. Per- formance will vary depending on the specific hardware and software you use. Consult your PC manufacturer for more information. For more information, visit http://www.intel.com/info/em64t. Intel, Pentium, Intel Atom, Intel Xeon, Intel NetBurst, Intel Core, Intel Core Solo, Intel Core Duo, Intel Core 2 Duo, Intel Core 2 Extreme, Intel Pentium D, Itanium, Intel SpeedStep, MMX, and VTune are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. *Other names and brands may be claimed as the property of others. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel lit- erature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm Copyright © 1997-2012 Intel Corporation ii Ref. # 319433-012 CONTENTS PAGE CHAPTER 1 INTEL® ADVANCED VECTOR EXTENSIONS 1.1 About This Document . 1-1 1.2 Overview . 1-1 1.3 Intel® Advanced Vector Extensions Architecture Overview . 1-2 1.3.1 256-Bit Wide SIMD Register Support . 1-2 1.3.2 Instruction Syntax Enhancements . 1-3 1.3.3 VEX Prefix Instruction Encoding Support . 1-4 1.4 Overview AVX2. 1-4 1.5 Functional Overview . 1-5 1.5.1 256-bit Floating-Point Arithmetic Processing Enhancements . 1-5 1.5.2 256-bit Non-Arithmetic Instruction Enhancements. 1-5 1.5.3 Arithmetic Primitives for 128-bit Vector and Scalar processing . 1-6 1.5.4 Non-Arithmetic Primitives for 128-bit Vector and Scalar Processing . 1-6 1.5.5 AVX2 and 256-bit Vector Integer Processing. 1-7 1.6 General Purpose Instruction Set Enhancements. 1-8 1.7 Intel® Transactional Synchronization Extensions . 1-8 CHAPTER 2 APPLICATION PROGRAMMING MODEL 2.1 Detection of PCLMULQDQ and AES Instructions. 2-1 2.2 Detection of AVX and FMA Instructions . 2-1 2.2.1 Detection of FMA . 2-3 2.2.2 Detection of VEX-Encoded AES and VPCLMULQDQ. 2-4 2.2.3 Detection of AVX2 . 2-6 2.2.4 Detection of VEX-encoded GPR Instructions . 2-7 2.3 Fused-Multiply-ADD (FMA) Numeric Behavior . 2-7 2.3.1 FMA Instruction Operand Order and Arithmetic Behavior . 2-11 2.4 Accessing YMM Registers. 2-12 2.5 Memory alignment . 2-13 2.6 SIMD floating-point ExCeptions . 2-15 2.7 Instruction Exception Specification. 2-15 2.7.1 Exceptions Type 1 (Aligned memory reference) . 2-21 2.7.2 Exceptions Type 2 (>=16 Byte Memory Reference, Unaligned) . 2-22 2.7.3 Exceptions Type 3 (<16 Byte memory argument). 2-23 2.7.4 Exceptions Type 4 (>=16 Byte mem arg no alignment, no floating-point exceptions) . 2-24 2.7.5 Exceptions Type 5 (<16 Byte mem arg and no FP exceptions). 2-25 2.7.6 Exceptions Type 6 (VEX-Encoded Instructions Without Legacy SSE Analogues). 2-26 2.7.7 Exceptions Type 7 (No FP exceptions, no memory arg). 2-27 2.7.8 Exceptions Type 8 (AVX and no memory argument) . 2-27 2.7.9 Exception Type 11 (VEX-only, mem arg no AC, floating-point exceptions) . 2-28 2.7.10 Exception Type 12 (VEX-only, VSIB mem arg, no AC, no floating-point exceptions). 2-29 2.7.11 Exception Conditions for VEX-Encoded GPR Instructions . 2-30 2.8 Programming Considerations with 128-bit SIMD Instructions . 2-31 i Ref. # 319433-012 2.8.1 Clearing Upper YMM State Between AVX and Legacy SSE Instructions . 2-32 2.8.2 Using AVX 128-bit Instructions Instead of Legacy SSE instructions . 2-33 2.8.3 Unaligned Memory Access and Buffer Size Management . 2-33 2.9 CPUID Instruction . 2-34 CPUID—CPU Identification . 2-34 CHAPTER 3 SYSTEM PROGRAMMING MODEL 3.1 YMM State, VEX Prefix and Supported Operating Modes . 3-1 3.2 YMM State Management . 3-2 3.2.1 Detection of YMM State Support. 3-2 3.2.2 Enabling of YMM State . 3-2 3.2.3 Enabling of SIMD Floating-Exception Support . 3-3 3.2.4 The Layout of XSAVE Area . 3-4 3.2.5 XSAVE/XRSTOR Interaction with YMM State and MXCSR. 3-5 3.2.6 Processor Extended State Save Optimization and XSAVEOPT . 3-7 3.2.6.1 XSAVEOPT Usage Guidelines . 3-8 3.3 Reset Behavior . 3-9 3.4 Emulation. 3-9 3.5 Writing AVX floating-point exception handlers. 3-9 CHAPTER 4 INSTRUCTION FORMAT 4.1 Instruction Formats . 4-1 4.1.1 VEX and the LOCK prefix . 4-2 4.1.2 VEX and the 66H, F2H, and F3H prefixes. 4-2 4.1.3 VEX and the REX prefix . 4-2 4.1.4 The VEX Prefix . 4-2 4.1.4.1 VEX Byte 0, bits[7:0]. ..
Recommended publications
  • Elementary Functions: Towards Automatically Generated, Efficient
    Elementary functions : towards automatically generated, efficient, and vectorizable implementations Hugues De Lassus Saint-Genies To cite this version: Hugues De Lassus Saint-Genies. Elementary functions : towards automatically generated, efficient, and vectorizable implementations. Other [cs.OH]. Université de Perpignan, 2018. English. NNT : 2018PERP0010. tel-01841424 HAL Id: tel-01841424 https://tel.archives-ouvertes.fr/tel-01841424 Submitted on 17 Jul 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Délivré par l’Université de Perpignan Via Domitia Préparée au sein de l’école doctorale 305 – Énergie et Environnement Et de l’unité de recherche DALI – LIRMM – CNRS UMR 5506 Spécialité: Informatique Présentée par Hugues de Lassus Saint-Geniès [email protected] Elementary functions: towards automatically generated, efficient, and vectorizable implementations Version soumise aux rapporteurs. Jury composé de : M. Florent de Dinechin Pr. INSA Lyon Rapporteur Mme Fabienne Jézéquel MC, HDR UParis 2 Rapporteur M. Marc Daumas Pr. UPVD Examinateur M. Lionel Lacassagne Pr. UParis 6 Examinateur M. Daniel Menard Pr. INSA Rennes Examinateur M. Éric Petit Ph.D. Intel Examinateur M. David Defour MC, HDR UPVD Directeur M. Guillaume Revy MC UPVD Codirecteur À la mémoire de ma grand-mère Françoise Lapergue et de Jos Perrot, marin-pêcheur bigouden.
    [Show full text]
  • Andre Heidekrueger
    AMD Heterogenous Computing X86 in development AMD new CPU and Accelerator Designs Building blocks for Heterogenous computing with the GPU Accelerators and the Latest x86 Platform Innovations 1 | Hot Chips | August, 2010 Server Industry Trends China has Seismic The performance of the 265m datasets online gamers fastest supercomputer typically exceed a terabyte grew 500x in the last decade The top 8 systems Accelerator-based on the Green 500 list use accelerators 800 servers on the Green 500 images list are 3x as energy are uploaded to Facebook efficient as those without every second 2 | Hot Chips | August,accelerators 2010 2 Top500.org Performance Projections: Can the Current Trajectory Achieve Exascale? 1 EFlops Might get there on current trajectory, but… • Too late for major government programs leading to 2018 • System power in traditional x86 architecture would be unmanageable Source for chart: Top500.org; annotations by AMD 3 | Hot Chips | August, 2010 Three Eras of Processor Performance Multi-Core Heterogeneous Single-Core Systems Era Era Era Enabled by: Enabled by: Enabled by: Moore’s Law Moore’s Law Moore’s Law Voltage Scaling Desire for Throughput Abundant data parallelism Micro-Architecture 20 years of SMP arch Power efficient GPUs Constrained by: Constrained by: Temporarily constrained by: Power Power Programming models Complexity Parallel SW availability Communication overheads Scalability o o ? we are we are here o here we are Performance here thread Performance - Time Time Application Targeted Time Throughput Throughput Performance (# of Processors) (Data-parallel exploitation) Single 4 | Driving HPC Performance Efficiency Fusion Architecture The Benefits of Heterogeneous Computing x86 CPU owns GPU Optimized for the Software World Modern Workloads .
    [Show full text]
  • Hierarchical Roofline Analysis for Gpus: Accelerating Performance
    Hierarchical Roofline Analysis for GPUs: Accelerating Performance Optimization for the NERSC-9 Perlmutter System Charlene Yang, Thorsten Kurth Samuel Williams National Energy Research Scientific Computing Center Computational Research Division Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory Berkeley, CA 94720, USA Berkeley, CA 94720, USA fcjyang, [email protected] [email protected] Abstract—The Roofline performance model provides an Performance (GFLOP/s) is bound by: intuitive and insightful approach to identifying performance bottlenecks and guiding performance optimization. In prepa- Peak GFLOP/s GFLOP/s ≤ min (1) ration for the next-generation supercomputer Perlmutter at Peak GB/s × Arithmetic Intensity NERSC, this paper presents a methodology to construct a hi- erarchical Roofline on NVIDIA GPUs and extend it to support which produces the traditional Roofline formulation when reduced precision and Tensor Cores. The hierarchical Roofline incorporates L1, L2, device memory and system memory plotted on a log-log plot. bandwidths into one single figure, and it offers more profound Previously, the Roofline model was expanded to support insights into performance analysis than the traditional DRAM- the full memory hierarchy [2], [3] by adding additional band- only Roofline. We use our Roofline methodology to analyze width “ceilings”. Similarly, additional ceilings beneath the three proxy applications: GPP from BerkeleyGW, HPGMG Roofline can be added to represent performance bottlenecks from AMReX, and conv2d from TensorFlow. In so doing, we demonstrate the ability of our methodology to readily arising from lack of vectorization or the failure to exploit understand various aspects of performance and performance fused multiply-add (FMA) instructions. bottlenecks on NVIDIA GPUs and motivate code optimizations.
    [Show full text]
  • NMOS 6510 Unintended Opcodes No More Secrets (V0.95 - 24/12/20)
    NMOS 6510 Unintended Opcodes no more secrets (v0.95 - 24/12/20) (w) 2013-2020 groepaz/solution, all rights reversed Contents Preface...................................................................................................................................................I Scope of this Document....................................................................................................................I Intended Audience............................................................................................................................I License..............................................................................................................................................I What you get...................................................................................................................................II Naming Conventions.....................................................................................................................III Address-Mode Abbreviations...................................................................................................III Mnemonics................................................................................................................................III Processor Flags.........................................................................................................................IV Opcode Matrix......................................................................................................................................1 Unintended
    [Show full text]
  • Malware Detection Advances in Information Security
    Malware Detection Advances in Information Security Sushil Jajodia Consulting Editor Center for Secure Information Systems George Mason University Fairfax, VA 22030-4444 email: ja jodia @ smu.edu The goals of the Springer International Series on ADVANCES IN INFORMATION SECURITY are, one, to establish the state of the art of, and set the course for future research in information security and, two, to serve as a central reference source for advanced and timely topics in information security research and development. The scope of this series includes all aspects of computer and network security and related areas such as fault tolerance and software assurance. ADVANCES IN INFORMATION SECURITY aims to publish thorough and cohesive overviews of specific topics in information security, as well as works that are larger in scope or that contain more detailed background information than can be accommodated in shorter survey articles. The series also serves as a forum for topics that may not have reached a level of maturity to warrant a comprehensive textbook treatment. Researchers, as well as developers, are encouraged to contact Professor Sushil Jajodia with ideas for books under this series. Additional titles in the series: ELECTRONIC POSTAGE SYSTEMS: Technology, Security, Economics by Gerrit Bleumer; ISBN: 978-0-387-29313-2 MULTIVARIATE PUBLIC KEY CRYPTOSYSTEMS by Jintai Ding, Jason E. Gower and Dieter Schmidt; ISBN-13: 978-0-378-32229-2 UNDERSTANDING INTRUSION DETECTION THROUGH VISUALIZATION by Stefan Axelsson; ISBN-10: 0-387-27634-3 QUALITY OF PROTECTION: Security Measurements and Metrics by Dieter Gollmann, Fabio Massacci and Artsiom Yautsiukhin; ISBN-10; 0-387-29016-8 COMPUTER VIRUSES AND MALWARE by John Aycock; ISBN-10: 0-387-30236-0 HOP INTEGRITY IN THE INTERNET by Chin-Tser Huang and Mohamed G.
    [Show full text]
  • Theoretical Peak FLOPS Per Instruction Set on Modern Intel Cpus
    Theoretical Peak FLOPS per instruction set on modern Intel CPUs Romain Dolbeau Bull – Center for Excellence in Parallel Programming Email: [email protected] Abstract—It used to be that evaluating the theoretical and potentially multiple threads per core. Vector of peak performance of a CPU in FLOPS (floating point varying sizes. And more sophisticated instructions. operations per seconds) was merely a matter of multiplying Equation2 describes a more realistic view, that we the frequency by the number of floating-point instructions will explain in details in the rest of the paper, first per cycles. Today however, CPUs have features such as vectorization, fused multiply-add, hyper-threading or in general in sectionII and then for the specific “turbo” mode. In this paper, we look into this theoretical cases of Intel CPUs: first a simple one from the peak for recent full-featured Intel CPUs., taking into Nehalem/Westmere era in section III and then the account not only the simple absolute peak, but also the full complexity of the Haswell family in sectionIV. relevant instruction sets and encoding and the frequency A complement to this paper titled “Theoretical Peak scaling behavior of current Intel CPUs. FLOPS per instruction set on less conventional Revision 1.41, 2016/10/04 08:49:16 Index Terms—FLOPS hardware” [1] covers other computing devices. flop 9 I. INTRODUCTION > operation> High performance computing thrives on fast com- > > putations and high memory bandwidth. But before > operations => any code or even benchmark is run, the very first × micro − architecture instruction number to evaluate a system is the theoretical peak > > - how many floating-point operations the system > can theoretically execute in a given time.
    [Show full text]
  • Introduction to Intel Scalable Architectures
    Introduction to Intel scalable architectures Fabio Affinito (SCAI - Cineca) Available options... Right here, right now… two kind of solutions are available on the market: ● IBM+ nVIDIA (Coral-like) ● Intel-based (Xeon/Xeon Phi) IBM+NVIDIA Each node will be based on a Power CPU + 4/6/8 nVIDIA TESLA GPUs connected using an nVIDIA NVlink interconnect Intel Xeon and Xeon Phi Intel will keep on with the production of server processors on the Xeon line, together with the introduction of the Xeon Phi many-core chips Intel Xeon Phi will not be a co-processor anymore, but a self-standing CPU with a very high number of cores Such systems are integrated by several vendors in many different configurations (Cray, HP, Lenovo, E4..) MARCONI FERMI, the IBM BlueGene/Q deployed in Cineca ended its lifecycle in 2016 We needed a new HPC machine that could - increase the computational power - respect the agreements with PRACE - satisfy the needs of the italian computing community MARCONI MARCONI NeXtScale architecture nx360M5 nodes: Supporting Intel HSW & BDW Able to host both IB network Mellanox EDR & Intel Omni-Path Twelve nodes are grouped into a Chassis (6 chassis per rack) The compute node is made of: 2 x Intel Broadwell (Xeon processor E5-2697 v4) 18 cores, 2,3 HGz 8 x 16GB DIMM memory (RAM DDR4 2400 MHz), 128 GB total 1 x 129 GB SATA MLC S3500 Enterprise Value SSD Further details 1 x link OPA 100GBs 2*18*2,3*16 = 1.325 GFs peak 24 rack in total: 21 rack compute 1 rack service nodes 2 racks core switch MARCONI - Network MARCONI - Network MARCONI - Storage
    [Show full text]
  • The RISC-V Instruction Set Manual, Volume II: Privileged Architecture, Version 1.10”, Editors Andrew Waterman and Krste Asanovi´C,RISC-V Foundation, May 2017
    The RISC-V Instruction Set Manual Volume II: Privileged Architecture Privileged Architecture Version 1.10 Document Version 1.10 Warning! This draft specification may change before being accepted as standard by the RISC-V Foundation. While the editors intend future changes to this specification to be forward compatible, it remains possible that implementations made to this draft specification will not conform to the future standard. Editors: Andrew Waterman1, Krste Asanovi´c1;2 1SiFive Inc., 2CS Division, EECS Department, University of California, Berkeley [email protected], [email protected] May 7, 2017 Contributors to all versions of the spec in alphabetical order (please contact editors to suggest corrections): Krste Asanovi´c,Rimas Aviˇzienis,Jacob Bachmeyer, Allen J. Baum, Paolo Bonzini, Ruslan Bukin, Christopher Celio, David Chisnall, Anthony Coulter, Palmer Dabbelt, Monte Dal- rymple, Dennis Ferguson, Mike Frysinger, John Hauser, David Horner, Olof Johansson, Yunsup Lee, Andrew Lutomirski, Jonathan Neusch¨afer,Rishiyur Nikhil, Stefan O'Rear, Albert Ou, John Ousterhout, David Patterson, Colin Schmidt, Wesley Terpstra, Matt Thomas, Tommy Thorn, Ray VanDeWalker, Megan Wachs, Andrew Waterman, and Reinoud Zandijk. This document is released under a Creative Commons Attribution 4.0 International License. This document is a derivative of the RISC-V privileged specification version 1.9.1 released under following license: c 2010{2017 Andrew Waterman, Yunsup Lee, Rimas Aviˇzienis,David Patterson, Krste Asanovi´c.Creative Commons Attribution 4.0 International License. Please cite as: \The RISC-V Instruction Set Manual, Volume II: Privileged Architecture, Version 1.10", Editors Andrew Waterman and Krste Asanovi´c,RISC-V Foundation, May 2017. Preface This is version 1.10 of the RISC-V privileged architecture proposal.
    [Show full text]
  • Intel® Architecture Instruction Set Extensions and Future Features
    Intel® Architecture Instruction Set Extensions and Future Features Programming Reference May 2021 319433-044 Intel technologies may require enabled hardware, software or service activation. No product or component can be absolutely secure. Your costs and results may vary. You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. All product plans and roadmaps are subject to change without notice. The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. Code names are used by Intel to identify products, technologies, or services that are in development and not publicly available. These are not “commercial” names and not intended to function as trademarks. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be ob- tained by calling 1-800-548-4725, or by visiting http://www.intel.com/design/literature.htm. Copyright © 2021, Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries.
    [Show full text]
  • Intel(R) Advanced Vector Extensions Programming Reference
    Intel® Advanced Vector Extensions Programming Reference 319433-011 JUNE 2011 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANT- ED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS. Intel may make changes to specifications and product descriptions at any time, without notice. Developers must not rely on the absence or characteristics of any features or instructions marked “re- served” or “undefined.” Improper use of reserved or undefined features or instructions may cause unpre- dictable behavior or failure in developer's software code when running on an Intel processor. Intel reserves these features or instructions for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from their unauthorized use. The Intel® 64 architecture processors may contain design defects or errors known as errata. Current char- acterized errata are available on request. Hyper-Threading Technology requires a computer system with an Intel® processor supporting Hyper- Threading Technology and an HT Technology enabled chipset, BIOS and operating system. Performance will vary depending on the specific hardware and software you use. For more information, see http://www.in- tel.com/technology/hyperthread/index.htm; including details on which processors support HT Technology.
    [Show full text]
  • Effective Vectorization with Openmp 4.5
    ORNL/TM-2016/391 Effective Vectorization with OpenMP 4.5 Joseph Huber Oscar Hernandez Graham Lopez Approved for public release. Distribution is unlimited. March 2017 DOCUMENT AVAILABILITY Reports produced after January 1, 1996, are generally available free via US Department of Energy (DOE) SciTech Connect. Website: http://www.osti.gov/scitech/ Reports produced before January 1, 1996, may be purchased by members of the public from the following source: National Technical Information Service 5285 Port Royal Road Springfield, VA 22161 Telephone: 703-605-6000 (1-800-553-6847) TDD: 703-487-4639 Fax: 703-605-6900 E-mail: [email protected] Website: http://classic.ntis.gov/ Reports are available to DOE employees, DOE contractors, Energy Technology Data Ex- change representatives, and International Nuclear Information System representatives from the following source: Office of Scientific and Technical Information PO Box 62 Oak Ridge, TN 37831 Telephone: 865-576-8401 Fax: 865-576-5728 E-mail: [email protected] Website: http://www.osti.gov/contact.html This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any in- formation, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial prod- uct, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof.
    [Show full text]
  • Vasm Assembler System
    vasm assembler system Volker Barthelmann, Frank Wille June 2021 i Table of Contents 1 General :::::::::::::::::::::::::::::::::::::::::: 1 1.1 Introduction ::::::::::::::::::::::::::::::::::::::::::::::::::: 1 1.2 Legal :::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 1 1.3 Installation :::::::::::::::::::::::::::::::::::::::::::::::::::: 1 2 The Assembler :::::::::::::::::::::::::::::::::: 3 2.1 General Assembler Options ::::::::::::::::::::::::::::::::::::: 3 2.2 Expressions :::::::::::::::::::::::::::::::::::::::::::::::::::: 5 2.3 Symbols ::::::::::::::::::::::::::::::::::::::::::::::::::::::: 7 2.4 Predefined Symbols :::::::::::::::::::::::::::::::::::::::::::: 7 2.5 Include Files ::::::::::::::::::::::::::::::::::::::::::::::::::: 8 2.6 Macros::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 8 2.7 Structures:::::::::::::::::::::::::::::::::::::::::::::::::::::: 8 2.8 Conditional Assembly :::::::::::::::::::::::::::::::::::::::::: 8 2.9 Known Problems ::::::::::::::::::::::::::::::::::::::::::::::: 9 2.10 Credits ::::::::::::::::::::::::::::::::::::::::::::::::::::::: 9 2.11 Error Messages :::::::::::::::::::::::::::::::::::::::::::::: 10 3 Standard Syntax Module ::::::::::::::::::::: 13 3.1 Legal ::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 13 3.2 Additional options for this module :::::::::::::::::::::::::::: 13 3.3 General Syntax ::::::::::::::::::::::::::::::::::::::::::::::: 13 3.4 Directives ::::::::::::::::::::::::::::::::::::::::::::::::::::: 14 3.5 Known Problems::::::::::::::::::::::::::::::::::::::::::::::
    [Show full text]