Execution Unit ISA (Ivy Bridge)

Total Page:16

File Type:pdf, Size:1020Kb

Execution Unit ISA (Ivy Bridge) Intel® OpenSource HD Graphics Programmer’s Reference Manual (PRM) Volume 4 Part 3: Execution Unit ISA (Ivy Bridge) For the 2012 Intel® Core™ Processor Family May 2012 Revision 1.0 NOTICE: This document contains information on products in the design phase of development, and Intel reserves the right to add or remove product features at any time, with or without changes to this open source documentation. Doc Ref #: IHD-OS-V4 Pt 3 – 05 12 Creative Commons License You are free to Share — to copy, distribute, display, and perform the work Under the following conditions: Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). No Derivative Works. You may not alter, transform, or build upon this work. INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Implementations of the I2C bus/protocol may require licenses from various entities, including Philips Electronics N.V. and North American Philips Corporation. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Copyright © 2012, Intel Corporation. All rights reserved. 2 5/29/2012 Doc Ref #: IHD-OS-V4 Pt 3 – 05 12 Contents 1. Introduction............................................................................................................................................ 7 1.1 Introducing the Execution Unit ............................................................................................................. 7 1.2 EU Terms and Acronyms ..................................................................................................................... 9 1.3 EU Changes by Processor Generation .............................................................................................. 13 1.4 EU Notation ........................................................................................................................................ 13 2. EU Data Types ..................................................................................................................................... 15 2.1 Fundamental Data Types ................................................................................................................... 15 2.2 Numeric Data Types .......................................................................................................................... 15 2.2.1 Integer Numeric Data Types ......................................................................................................... 16 2.2.2 Floating-Point Numeric Data Types.............................................................................................. 17 2.2.3 Packed Signed Half-Byte Integer Vector ...................................................................................... 18 2.2.4 Packed UnSigned Half-Byte Integer Vector ................................................................................. 18 2.2.5 Packed Restricted Float Vector .................................................................................................... 19 2.3 Floating Point Modes ......................................................................................................................... 21 2.3.1 IEEE Floating Point Mode ............................................................................................................ 21 2.3.2 Alternative Floating Point Mode.................................................................................................... 25 2.4 Type Conversion ................................................................................................................................ 26 2.4.1 Float to Integer ............................................................................................................................. 26 2.4.2 Integer to Integer with Same or Higher Precision......................................................................... 27 2.4.3 Integer to Integer with Lower Precision ........................................................................................ 27 2.4.4 Integer to Float ............................................................................................................................. 27 2.4.5 Double Precision Float to Single Precision Float ......................................................................... 27 2.4.6 Single Precision Float to Double Precision Float ......................................................................... 28 3. Execution Environment ...................................................................................................................... 29 3.1 EU Overview ...................................................................................................................................... 29 3.2 Primary Usage Models ....................................................................................................................... 30 3.2.1 AOS and SOA Data Structures .................................................................................................... 30 3.2.2 SIMD4 Mode of Operation ............................................................................................................ 32 3.2.3 SIMD4x2 Mode of Operation ........................................................................................................ 33 3.2.4 SIMD16 Mode of Operation .......................................................................................................... 34 3.2.5 SIMD8 Mode of Operation ............................................................................................................ 36 3.3 Registers and Register Regions ........................................................................................................ 36 3.3.1 Register Files ................................................................................................................................ 36 3.3.2 GRF Registers .............................................................................................................................. 36 3.3.3 ARF Registers .............................................................................................................................. 37 3.3.4 Immediate ..................................................................................................................................... 57 3.3.5 Region Parameters ....................................................................................................................... 57 3.3.6 Region Addressing Modes ........................................................................................................... 62 3.3.7 Access Modes .............................................................................................................................. 67 3.3.8 Execution Data Type .................................................................................................................... 68 3.3.9 Register Region Restrictions ........................................................................................................ 68 3.3.10 Destination Operand Description................................................................................................ 72 3.4 SIMD Execution Control ..................................................................................................................... 73 3.4.1 Predication ...................................................................................................................................
Recommended publications
  • NMOS 6510 Unintended Opcodes No More Secrets (V0.95 - 24/12/20)
    NMOS 6510 Unintended Opcodes no more secrets (v0.95 - 24/12/20) (w) 2013-2020 groepaz/solution, all rights reversed Contents Preface...................................................................................................................................................I Scope of this Document....................................................................................................................I Intended Audience............................................................................................................................I License..............................................................................................................................................I What you get...................................................................................................................................II Naming Conventions.....................................................................................................................III Address-Mode Abbreviations...................................................................................................III Mnemonics................................................................................................................................III Processor Flags.........................................................................................................................IV Opcode Matrix......................................................................................................................................1 Unintended
    [Show full text]
  • Malware Detection Advances in Information Security
    Malware Detection Advances in Information Security Sushil Jajodia Consulting Editor Center for Secure Information Systems George Mason University Fairfax, VA 22030-4444 email: ja jodia @ smu.edu The goals of the Springer International Series on ADVANCES IN INFORMATION SECURITY are, one, to establish the state of the art of, and set the course for future research in information security and, two, to serve as a central reference source for advanced and timely topics in information security research and development. The scope of this series includes all aspects of computer and network security and related areas such as fault tolerance and software assurance. ADVANCES IN INFORMATION SECURITY aims to publish thorough and cohesive overviews of specific topics in information security, as well as works that are larger in scope or that contain more detailed background information than can be accommodated in shorter survey articles. The series also serves as a forum for topics that may not have reached a level of maturity to warrant a comprehensive textbook treatment. Researchers, as well as developers, are encouraged to contact Professor Sushil Jajodia with ideas for books under this series. Additional titles in the series: ELECTRONIC POSTAGE SYSTEMS: Technology, Security, Economics by Gerrit Bleumer; ISBN: 978-0-387-29313-2 MULTIVARIATE PUBLIC KEY CRYPTOSYSTEMS by Jintai Ding, Jason E. Gower and Dieter Schmidt; ISBN-13: 978-0-378-32229-2 UNDERSTANDING INTRUSION DETECTION THROUGH VISUALIZATION by Stefan Axelsson; ISBN-10: 0-387-27634-3 QUALITY OF PROTECTION: Security Measurements and Metrics by Dieter Gollmann, Fabio Massacci and Artsiom Yautsiukhin; ISBN-10; 0-387-29016-8 COMPUTER VIRUSES AND MALWARE by John Aycock; ISBN-10: 0-387-30236-0 HOP INTEGRITY IN THE INTERNET by Chin-Tser Huang and Mohamed G.
    [Show full text]
  • Efficient Parallel Algorithms for 3D Laplacian Smoothing on The
    applied sciences Article Efficient Parallel Algorithms for 3D Laplacian Smoothing on the GPU Lei Xiao 1, Guoxiang Yang 1,*, Kunyang Zhao 2,3 and Gang Mei 1,* 1 School of Engineering and Technology, China University of Geosciences (Beijing), Beijing 100083, China; [email protected] 2 Airport Engineering Civil Aviation R&D Base, China Airport Construction Group Co., Ltd., Beijing 100621, China; [email protected] 3 China Super-Creative Airport Technical Co., Ltd., Beijing 100621, China * Correspondence: [email protected] (G.Y.); [email protected] (G.M.); Tel.: +86-10-8232-2627 (G.Y.) Received: 13 November 2019; Accepted: 6 December 2019; Published: 11 December 2019 Abstract: In numerical modeling, mesh quality is one of the decisive factors that strongly affects the accuracy of calculations and the convergence of iterations. To improve mesh quality, the Laplacian mesh smoothing method, which repositions nodes to the barycenter of adjacent nodes without changing the mesh topology, has been widely used. However, smoothing a large-scale three dimensional mesh is quite computationally expensive, and few studies have focused on accelerating the Laplacian mesh smoothing method by utilizing the graphics processing unit (GPU). This paper presents a GPU-accelerated parallel algorithm for Laplacian smoothing in three dimensions by considering the influence of different data layouts and iteration forms. To evaluate the efficiency of the GPU implementation, the parallel solution is compared with the original serial solution. Experimental results show that our parallel implementation is up to 46 times faster than the serial version. Keywords: computational mesh processing; 3D laplacian smoothing; parallel algorithm; graphics processing unit 1.
    [Show full text]
  • Understanding the Performance of HPC Applications
    Understanding the Performance of HPC Applications Brian Gravelle March 2019 Abstract High performance computing is an important asset to scientific research, enabling the study of phe- nomena such as nuclear physics or climate change, that are difficult or impossible to be studied in traditional experiments or allowing researchers to utilize large amounts of data from experiments such as the Large Hadron Collider. No matter the use of HPC, the need for performance is always present; however, the fast-changing nature of computer systems means that software must be continually updated to run efficiently on the newest machines. In this paper, we discuss methods and tools used to understand the performance of an application running on HPC systems and how this understanding can translate into improved performance. We primarily focus on node-level issues, but also mention some of the basic issues involved with multi-node analysis as well. 1 Introduction In the modern world, supercomputing is used to approach many of the most important questions in science and technology. Computer simulations, machine learning programs, and other applications can all take ad- vantage of the massively parallel machines known as supercomputers; however, care must be taken to use these machines effectively. With the advent of multi- and many- core processors, GPGPUs, deep cache struc- tures, and FPGA accelerators, getting the optimal performance out of such machines becomes increasingly difficult. Although many modern supercomputers are theoretically capable of achieving tens or hundreds of petaflops, applications often only reach 15-20% of that peak performance. In this paper, we review recent research surrounding how users can understand and improve the performance of HPC applications.
    [Show full text]
  • Hardware Architectures for Software Security
    Hardware Architectures for Software Security Joshua N. Edmison Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Engineering Dr. Mark T. Jones, Chair Dr. Lynn Abbott Dr. Peter M. Athanas Dr. Ezra Brown Dr. Thomas L. Martin Dr. Cameron D. Patterson June 29, 2006 Bradley Department of Electrical and Computer Engineering Blacksburg, Virginia Keywords: Security, architecture, FPGA, configurable, graph theory, information flow Copyright 2005 c , Joshua N. Edmison Hardware Architectures for Software Security Joshua N. Edmison (ABSTRACT) The need for hardware-based software protection stems primarily from the increasing value of software coupled with the inability to trust software that utilizes or manages shared resources. By correctly utilizing security functions in hardware, trust can be removed from software. Existing hardware-based software protection solutions generally suffer from utiliza- tion of trusted software, lack of implementation, and/or extreme measures such as processor redesign. In contrast, the research outlined in this document proposes that substantial, hardware-based software protection can be achieved, without trusting software or redesign- ing the processor, by augmenting existing processors with security management hardware placed outside of the processor boundary. Benefits of this approach include the ability to add security features to nearly any processor, update security features without redesigning the processor, and provide maximum transparency to the software development and distribution processes. The major contributions of this research include the the augmentation method- ology, design principles, and a graph-based method for analyzing hardware-based security systems.
    [Show full text]
  • SAS/C Library Reference, Third Edition, Release 6.00 Volume 2
    SAS/C Library Reference, Third Edition, Release 6.00 Volume 2 SAS Institute Inc. SAS Campus Drive Cary, NC 27513 The correct bibliographic citation for this manual is as follows: SAS Institute Inc., SAS/C Library Reference, Third Edition, Volume 2, Release 6.00, Cary, NC: SAS Institute Inc., 1995. 623 pp. SAS/C Library Reference, Third Edition, Volume 2, Release 6.00 Copyright 1995 by SAS Institute Inc., Cary, NC, USA. ISBN 1-55544-667-1 All rights reserved. Printed in the United States of America. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc. Restricted Rights Legend. Use, duplication, or disclosure by the U.S. Government is subject to restrictions as set forth in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013. SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513. 1st printing, October 1995 The SAS System is an integrated system of software providing complete control over data access, management, analysis, and presentation. Base SAS software is the foundation of the SAS System. Products within the SAS System include SAS/ACCESS, SAS/AF, SAS/ASSIST, SAS/CALC, SAS/CONNECT, SAS/CPE, SAS/DMI, SAS/EIS, SAS/ENGLISH, SAS/ETS, SAS/FSP, SAS/GRAPH, SAS/IMAGE, SAS/IML, SAS/IMS-DL/I, SAS/INSIGHT, SAS/LAB, SAS/NVISION, SAS/OR, SAS/PH-Clinical, SAS/QC, SAS/REPLAY-CICS, SAS/SESSION, SAS/SHARE, SAS/SPECTRAVIEW, SAS/STAT, SAS/TOOLKIT, SAS/TRADER, SAS/TUTOR, SAS/DB2, SAS/GEO, SAS/GIS, SAS/PH-Kinetics, SAS/SHARE*NET, and SAS/SQL-DS software.
    [Show full text]
  • The RISC-V Instruction Set Manual, Volume II: Privileged Architecture, Version 1.10”, Editors Andrew Waterman and Krste Asanovi´C,RISC-V Foundation, May 2017
    The RISC-V Instruction Set Manual Volume II: Privileged Architecture Privileged Architecture Version 1.10 Document Version 1.10 Warning! This draft specification may change before being accepted as standard by the RISC-V Foundation. While the editors intend future changes to this specification to be forward compatible, it remains possible that implementations made to this draft specification will not conform to the future standard. Editors: Andrew Waterman1, Krste Asanovi´c1;2 1SiFive Inc., 2CS Division, EECS Department, University of California, Berkeley [email protected], [email protected] May 7, 2017 Contributors to all versions of the spec in alphabetical order (please contact editors to suggest corrections): Krste Asanovi´c,Rimas Aviˇzienis,Jacob Bachmeyer, Allen J. Baum, Paolo Bonzini, Ruslan Bukin, Christopher Celio, David Chisnall, Anthony Coulter, Palmer Dabbelt, Monte Dal- rymple, Dennis Ferguson, Mike Frysinger, John Hauser, David Horner, Olof Johansson, Yunsup Lee, Andrew Lutomirski, Jonathan Neusch¨afer,Rishiyur Nikhil, Stefan O'Rear, Albert Ou, John Ousterhout, David Patterson, Colin Schmidt, Wesley Terpstra, Matt Thomas, Tommy Thorn, Ray VanDeWalker, Megan Wachs, Andrew Waterman, and Reinoud Zandijk. This document is released under a Creative Commons Attribution 4.0 International License. This document is a derivative of the RISC-V privileged specification version 1.9.1 released under following license: c 2010{2017 Andrew Waterman, Yunsup Lee, Rimas Aviˇzienis,David Patterson, Krste Asanovi´c.Creative Commons Attribution 4.0 International License. Please cite as: \The RISC-V Instruction Set Manual, Volume II: Privileged Architecture, Version 1.10", Editors Andrew Waterman and Krste Asanovi´c,RISC-V Foundation, May 2017. Preface This is version 1.10 of the RISC-V privileged architecture proposal.
    [Show full text]
  • Best Practice Guide Modern Processors
    Best Practice Guide Modern Processors Ole Widar Saastad, University of Oslo, Norway Kristina Kapanova, NCSA, Bulgaria Stoyan Markov, NCSA, Bulgaria Cristian Morales, BSC, Spain Anastasiia Shamakina, HLRS, Germany Nick Johnson, EPCC, United Kingdom Ezhilmathi Krishnasamy, University of Luxembourg, Luxembourg Sebastien Varrette, University of Luxembourg, Luxembourg Hayk Shoukourian (Editor), LRZ, Germany Updated 5-5-2021 1 Best Practice Guide Modern Processors Table of Contents 1. Introduction .............................................................................................................................. 4 2. ARM Processors ....................................................................................................................... 6 2.1. Architecture ................................................................................................................... 6 2.1.1. Kunpeng 920 ....................................................................................................... 6 2.1.2. ThunderX2 .......................................................................................................... 7 2.1.3. NUMA architecture .............................................................................................. 9 2.2. Programming Environment ............................................................................................... 9 2.2.1. Compilers ........................................................................................................... 9 2.2.2. Vendor performance libraries
    [Show full text]
  • Kguard: Lightweight Kernel Protection Against Return-To-User Attacks
    kGuard: Lightweight Kernel Protection against Return-to-user Attacks VasileiosP.Kemerlis GeorgiosPortokalidis AngelosD.Keromytis Network Security Lab Department of Computer Science Columbia University, New York, NY, USA {vpk, porto, angelos}@cs.columbia.edu Abstract Such return-to-user (ret2usr) attacks have affected all major OSs, including Windows [60], Linux [16, 18], Return-to-user (ret2usr) attacks exploit the operating sys- and FreeBSD [19, 59, 61], while they are not limited to tem kernel, enabling local users to hijack privileged ex- x86 systems [23], but have also targeted the ARM [30], ecution paths and execute arbitrary code with elevated DEC [31], and PowerPC [25] architectures. privileges. Current defenses have proven to be inade- There are numerous reasons to why attacks against the quate, as they have been repeatedly circumvented, in- kernel are becoming more common. First and foremost, cur considerable overhead, or rely on extended hypervi- processes running with administrative privileges have be- sors and special hardware features. We present kGuard, come harder to exploit due to the various defense mech- a compiler plugin that augments the kernel with com- anisms adopted by modern OSs [52, 34]. Second, NULL pact inline guards, which prevent ret2usr with low per- pointer dereference errors had not received significant at- formance and space overhead. kGuard can be used with tention, until recently, exactly because they were thought any operating system that features a weak separation be- impractical and too difficult to exploit. In fact, 2009 has tween kernel and user space, requires no modifications been proclaimed, by some security researchers, as “the to the OS, and is applicable to both 32- and 64-bit ar- year of the kernel NULL pointer dereference flaw”[15].
    [Show full text]
  • Vasm Assembler System
    vasm assembler system Volker Barthelmann, Frank Wille June 2021 i Table of Contents 1 General :::::::::::::::::::::::::::::::::::::::::: 1 1.1 Introduction ::::::::::::::::::::::::::::::::::::::::::::::::::: 1 1.2 Legal :::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 1 1.3 Installation :::::::::::::::::::::::::::::::::::::::::::::::::::: 1 2 The Assembler :::::::::::::::::::::::::::::::::: 3 2.1 General Assembler Options ::::::::::::::::::::::::::::::::::::: 3 2.2 Expressions :::::::::::::::::::::::::::::::::::::::::::::::::::: 5 2.3 Symbols ::::::::::::::::::::::::::::::::::::::::::::::::::::::: 7 2.4 Predefined Symbols :::::::::::::::::::::::::::::::::::::::::::: 7 2.5 Include Files ::::::::::::::::::::::::::::::::::::::::::::::::::: 8 2.6 Macros::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 8 2.7 Structures:::::::::::::::::::::::::::::::::::::::::::::::::::::: 8 2.8 Conditional Assembly :::::::::::::::::::::::::::::::::::::::::: 8 2.9 Known Problems ::::::::::::::::::::::::::::::::::::::::::::::: 9 2.10 Credits ::::::::::::::::::::::::::::::::::::::::::::::::::::::: 9 2.11 Error Messages :::::::::::::::::::::::::::::::::::::::::::::: 10 3 Standard Syntax Module ::::::::::::::::::::: 13 3.1 Legal ::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 13 3.2 Additional options for this module :::::::::::::::::::::::::::: 13 3.3 General Syntax ::::::::::::::::::::::::::::::::::::::::::::::: 13 3.4 Directives ::::::::::::::::::::::::::::::::::::::::::::::::::::: 14 3.5 Known Problems::::::::::::::::::::::::::::::::::::::::::::::
    [Show full text]
  • Triplicated Instruction Set Randomization in Parallel
    TRIPLICATED INSTRUCTION SET RANDOMIZATION IN PARALLEL HETEROGENOUS SOFT-CORE PROCESSORS by Trevor James Gahl A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering MONTANA STATE UNIVERSITY Bozeman, Montana April 2019 ©COPYRIGHT by Trevor James Gahl 2019 All Rights Reserved ii ACKNOWLEDGEMENTS I would like to acknowledge my friends and family for standing by me and helping me throughout my time in grad school. My friend and lab-mate Skylar Tamke for always letting me bounce ideas off of him and helping me blow off steam. My advisor Dr. Brock LaMeres for helping me refocus in a field I love. More than anybody though, I'd like to acknowledge my wife Kaitlyn for helping me with anything I needed, whether that was bringing me food late at night or listening to rants about compiler errors and code bugs, I couldn't have asked for a better support system. iii TABLE OF CONTENTS 1. INTRODUCTION ........................................................................................1 Introduction.................................................................................................1 2. BACKGROUND AND MOTIVATION...........................................................5 Monoculture Environment.............................................................................5 Opcodes and Instruction Sets ........................................................................7 Cybersecurity Threats................................................................................
    [Show full text]
  • SAMS Multi-Layout Memory: Providing Multiple Views of Data to Boost SIMD Performance
    SAMS Multi-Layout Memory: Providing Multiple Views of Data to Boost SIMD Performance Chunyang Gou, Georgi Kuzmanov, Georgi N. Gaydadjiev Computer Engineering Lab Faculty of Electrical Engineering, Mathematics and Computer Science Delft University of Technology, The Netherlands {C.Gou, G.K.Kuzmanov, G.N.Gaydadjiev}@tudelft.nl ABSTRACT layout and its continuity in the linear memory address space. We propose to bridge the discrepancy between data represen- It has been found that most SIMDized applications are in fa- tations in memory and those favored by the SIMD proces- vor of operating on SoA format for better performance[18, sor by customizing the low-level address mapping.Toachieve 20]. However, data representation in the system memory is this, we employ the extended Single-Affiliation Multiple-Stride mostly in the form of AoS because of two reasons. First, AoS (SAMS) parallel memory scheme at an appropriate level in the is the natural data representation in many scientific and en- memory hierarchy. This level of memory provides both Ar- gineering applications. Secondly, indirections to structured ray of Structures (AoS) and Structure of Arrays (SoA) views data, such as pointer or indexed array accesses, are in favor for the structured data to the processor, appearing to have of the AoS format. Therefore, a pragmatic problem in the maintained multiple layouts for the same data.Withsuch SIMDization arises: the need for dynamic data format trans- multi-layout memory, optimal SIMDization can be achieved. form between AoS and SoA, which results in significant perfor- Our synthesis results using TSMC 90nm CMOS technology mance degradation. To our best knowledge, no trivial solution indicate that the SAMS Multi-Layout Memory system has ef- for this problem has been previously proposed.
    [Show full text]