Optimal Software Pipelining: Integer Linear Programming Approach
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Mipspro C++ Programmer's Guide
MIPSproTM C++ Programmer’s Guide 007–0704–150 CONTRIBUTORS Rewritten in 2002 by Jean Wilson with engineering support from John Wilkinson and editing support from Susan Wilkening. COPYRIGHT Copyright © 1995, 1999, 2002 - 2003 Silicon Graphics, Inc. All rights reserved; provided portions may be copyright in third parties, as indicated elsewhere herein. No permission is granted to copy, distribute, or create derivative works from the contents of this electronic documentation in any manner, in whole or in part, without the prior written permission of Silicon Graphics, Inc. LIMITED RIGHTS LEGEND The electronic (software) version of this document was developed at private expense; if acquired under an agreement with the USA government or any contractor thereto, it is acquired as "commercial computer software" subject to the provisions of its applicable license agreement, as specified in (a) 48 CFR 12.212 of the FAR; or, if acquired for Department of Defense units, (b) 48 CFR 227-7202 of the DoD FAR Supplement; or sections succeeding thereto. Contractor/manufacturer is Silicon Graphics, Inc., 1600 Amphitheatre Pkwy 2E, Mountain View, CA 94043-1351. TRADEMARKS AND ATTRIBUTIONS Silicon Graphics, SGI, the SGI logo, IRIX, O2, Octane, and Origin are registered trademarks and OpenMP and ProDev are trademarks of Silicon Graphics, Inc. in the United States and/or other countries worldwide. MIPS, MIPS I, MIPS II, MIPS III, MIPS IV, R2000, R3000, R4000, R4400, R4600, R5000, and R8000 are registered or unregistered trademarks and MIPSpro, R10000, R12000, R1400 are trademarks of MIPS Technologies, Inc., used under license by Silicon Graphics, Inc. Portions of this publication may have been derived from the OpenMP Language Application Program Interface Specification. -
MIPS IV Instruction Set
MIPS IV Instruction Set Revision 3.2 September, 1995 Charles Price MIPS Technologies, Inc. All Right Reserved RESTRICTED RIGHTS LEGEND Use, duplication, or disclosure of the technical data contained in this document by the Government is subject to restrictions as set forth in subdivision (c) (1) (ii) of the Rights in Technical Data and Computer Software clause at DFARS 52.227-7013 and / or in similar or successor clauses in the FAR, or in the DOD or NASA FAR Supplement. Unpublished rights reserved under the Copyright Laws of the United States. Contractor / manufacturer is MIPS Technologies, Inc., 2011 N. Shoreline Blvd., Mountain View, CA 94039-7311. R2000, R3000, R6000, R4000, R4400, R4200, R8000, R4300 and R10000 are trademarks of MIPS Technologies, Inc. MIPS and R3000 are registered trademarks of MIPS Technologies, Inc. The information in this document is preliminary and subject to change without notice. MIPS Technologies, Inc. (MTI) reserves the right to change any portion of the product described herein to improve function or design. MTI does not assume liability arising out of the application or use of any product or circuit described herein. Information on MIPS products is available electronically: (a) Through the World Wide Web. Point your WWW client to: http://www.mips.com (b) Through ftp from the internet site “sgigate.sgi.com”. Login as “ftp” or “anonymous” and then cd to the directory “pub/doc”. (c) Through an automated FAX service: Inside the USA toll free: (800) 446-6477 (800-IGO-MIPS) Outside the USA: (415) 688-4321 (call from a FAX machine) MIPS Technologies, Inc. -
Pluggable Interface Relays CR-M Miniature Relays
Data sheet Pluggable interface relays CR-M Miniature relays Pluggable interface relays are used for electrical isolation, amplification and signal matching between the electronic controlling, e.g. PLC (programmable logic controller), PC or field bus systems and the sensor / actuator level. They don’t use additional internal protective circuits and thus are overload-proof against short-time variations like current or voltage peaks. 2CDC 291 002 S0015 Characteristics Approvals – Standard miniature relays with mechanical status indication H ANSI/UL 508, CAN/CSA C22.2 No.14 – 13 different rated control supply voltages: F CAN/CSA C22.2 No.14 DC versions: 12 V, 24 V, 48 V, 60 V, 110 V, 125 V, 220 V J VDE (except 125 V DC devices) AC versions: 24 V, 48 V, 60 V, 110 V, 120 V, 230 V EAC – Output: 2 c/o (SPDT) contacts (12 A), 3 c/o (SPDT) R contacts (10 A) or 4 c/o (SPDT) contacts (6 A) P Lloyds Register (only devices with 4 c/o (SPDT) – Available with or without LED contacts) CCC – 4 c/o (SPDT) contact version optionally equipped with E gold contacts, LED and free wheeling diode L RMRS (except 60 V and 125 V devices) – Integrated test button for manual actuation and locking of output contacts (DC coil = blue, AC coil = orange) that Marks can be removed if necessary a CE – Cadmium-free contact material – Suited for logical and standard sockets – Width on socket: 27 mm (1.063 in) – Pluggable function modules: reverse polarity protection/ free wheeling diode, LED indication, RC elements, overvoltage protection Order data Packing unit = 10 pieces -
Mipspro 64-Bit Porting and Transition Guide
MIPSpro™ 64-Bit Porting and Transition Guide Document Number 007-2391-003 CONTRIBUTORS Written by George Pirocanac Edited by Larry Huffman, Cindy Kleinfeld Production by Cindy Stief Engineering contributions by Dave Anderson, Bean Anderson, Dave Babcock, Jack Carter, Ann Chang, Wei-Chau Chang, Steve Cobb, Rune Dahl, Jim Dehnert, David Frederick, Jay Gischer, Bob Green, W. Wilson Ho, Peter Hsu, Bill Johnson, Dror Maydan, Ash Munshi, Michael Murphy, Bron Nelson, Paul Rodman, John Ruttenberg, Ross Towle, Chris Wagner © Copyright 1994-1996 Silicon Graphics, Inc.— All Rights Reserved The contents of this document may not be copied or duplicated in any form, in whole or in part, without the prior written permission of Silicon Graphics, Inc. RESTRICTED RIGHTS LEGEND Use, duplication, or disclosure of the technical data contained in this document by the Government is subject to restrictions as set forth in subdivision (c) (1) (ii) of the Rights in Technical Data and Computer Software clause at DFARS 52.227-7013 and/or in similar or successor clauses in the FAR, or in the DOD or NASA FAR Supplement. Unpublished rights reserved under the Copyright Laws of the United States. Contractor/manufacturer is Silicon Graphics, Inc., 2011 N. Shoreline Blvd., Mountain View, CA 94043-1389. Silicon Graphics and IRIS are registered trademarks and IRIX, CASEVision, IRIS IM, IRIS Showcase, Impressario, Indigo Magic, Inventor, IRIS-4D, POWER Series, RealityEngine, CHALLENGE, Onyx, and WorkShop are trademarks of Silicon Graphics, Inc. UNIX is a registered trademark of UNIX System Laboratories. OSF/Motif is a trademark of Open Software Foundation, Inc. The X Window System is a trademark of the Massachusetts Institute of Technology. -
Decoupled Software Pipelining with the Synchronization Array
Decoupled Software Pipelining with the Synchronization Array Ram Rangan Neil Vachharajani Manish Vachharajani David I. August Department of Computer Science Princeton University {ram, nvachhar, manishv, august}@cs.princeton.edu Abstract mentally affect run-time performance of the code. Out-of- order (OOO) execution mitigates this problem to an extent. Despite the success of instruction-level parallelism (ILP) Rather than stalling when a consumer, whose dependences optimizations in increasing the performance of micropro- are not satisfied, is encountered, the OOO processor will ex- cessors, certain codes remain elusive. In particular, codes ecute instructions from after the stalled consumer. Thus, the containing recursive data structure (RDS) traversal loops compiler can now safely assume average latency since ac- have been largely immune to ILP optimizations, due to tual latencies longer than the average will not necessarily the fundamental serialization and variable latency of the lead to execution stalls. loop-carried dependence through a pointer-chasing load. Unfortunately, the predominant type of variable latency To address these and other situations, we introduce decou- instruction, memory loads, have worst case latencies (i.e., pled software pipelining (DSWP), a technique that stati- cache-miss latencies) so large that it is often difficult to find cally splits a single-threaded sequential loop into multi- sufficiently many independent instructions after a stalled ple non-speculative threads, each of which performs use- consumer. As microprocessors become wider and the dis- ful computation essential for overall program correctness. parity between processor speeds and memory access laten- The resulting threads execute on thread-parallel architec- cies grows [8], this problem is exacerbated since it is un- tures such as simultaneous multithreaded (SMT) cores or likely that dynamic scheduling windows will grow as fast chip multiprocessors (CMP), expose additional instruction as memory access latencies. -
VLIW Architectures Lisa Wu, Krste Asanovic
CS252 Spring 2017 Graduate Computer Architecture Lecture 10: VLIW Architectures Lisa Wu, Krste Asanovic http://inst.eecs.berkeley.edu/~cs252/sp17 WU UCB CS252 SP17 Last Time in Lecture 9 Vector supercomputers § Vector register versus vector memory § Scaling performance with lanes § Stripmining § Chaining § Masking § Scatter/Gather CS252, Fall 2015, Lecture 10 © Krste Asanovic, 2015 2 Sequential ISA Bottleneck Sequential Superscalar compiler Sequential source code machine code a = foo(b); for (i=0, i< Find independent Schedule operations operations Superscalar processor Check instruction Schedule dependencies execution CS252, Fall 2015, Lecture 10 © Krste Asanovic, 2015 3 VLIW: Very Long Instruction Word Int Op 1 Int Op 2 Mem Op 1 Mem Op 2 FP Op 1 FP Op 2 Two Integer Units, Single Cycle Latency Two Load/Store Units, Three Cycle Latency Two Floating-Point Units, Four Cycle Latency § Multiple operations packed into one instruction § Each operation slot is for a fixed function § Constant operation latencies are specified § Architecture requires guarantee of: - Parallelism within an instruction => no cross-operation RAW check - No data use before data ready => no data interlocks CS252, Fall 2015, Lecture 10 © Krste Asanovic, 2015 4 Early VLIW Machines § FPS AP120B (1976) - scientific attached array processor - first commercial wide instruction machine - hand-coded vector math libraries using software pipelining and loop unrolling § Multiflow Trace (1987) - commercialization of ideas from Fisher’s Yale group including “trace scheduling” - available -
C++ Programmer's Guide
C++ Programmer’s Guide Document Number 007–0704–130 St. Peter’s Basilica image courtesy of ENEL SpA and InfoByte SpA. Disk Thrower image courtesy of Xavier Berenguer, Animatica. Copyright © 1995, 1999 Silicon Graphics, Inc. All Rights Reserved. This document or parts thereof may not be reproduced in any form unless permitted by contract or by written permission of Silicon Graphics, Inc. LIMITED AND RESTRICTED RIGHTS LEGEND Use, duplication, or disclosure by the Government is subject to restrictions as set forth in the Rights in Data clause at FAR 52.227-14 and/or in similar or successor clauses in the FAR, or in the DOD, DOE or NASA FAR Supplements. Unpublished rights reserved under the Copyright Laws of the United States. Contractor/manufacturer is Silicon Graphics, Inc., 1600 Amphitheatre Pkwy., Mountain View, CA 94043-1351. Autotasking, CF77, CRAY, Cray Ada, CraySoft, CRAY Y-MP, CRAY-1, CRInform, CRI/TurboKiva, HSX, LibSci, MPP Apprentice, SSD, SUPERCLUSTER, UNICOS, X-MP EA, and UNICOS/mk are federally registered trademarks and Because no workstation is an island, CCI, CCMT, CF90, CFT, CFT2, CFT77, ConCurrent Maintenance Tools, COS, Cray Animation Theater, CRAY APP, CRAY C90, CRAY C90D, Cray C++ Compiling System, CrayDoc, CRAY EL, CRAY J90, CRAY J90se, CrayLink, Cray NQS, Cray/REELlibrarian, CRAY S-MP, CRAY SSD-T90, CRAY SV1, CRAY T90, CRAY T3D, CRAY T3E, CrayTutor, CRAY X-MP, CRAY XMS, CRAY-2, CSIM, CVT, Delivering the power . ., DGauss, Docview, EMDS, GigaRing, HEXAR, IOS, ND Series Network Disk Array, Network Queuing Environment, Network Queuing Tools, OLNET, RQS, SEGLDR, SMARTE, SUPERLINK, System Maintenance and Remote Testing Environment, Trusted UNICOS, and UNICOS MAX are trademarks of Cray Research, Inc., a wholly owned subsidiary of Silicon Graphics, Inc. -
Static Instruction Scheduling for High Performance on Limited Hardware
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TC.2017.2769641, IEEE Transactions on Computers 1 Static Instruction Scheduling for High Performance on Limited Hardware Kim-Anh Tran, Trevor E. Carlson, Konstantinos Koukos, Magnus Själander, Vasileios Spiliopoulos Stefanos Kaxiras, Alexandra Jimborean Abstract—Complex out-of-order (OoO) processors have been designed to overcome the restrictions of outstanding long-latency misses at the cost of increased energy consumption. Simple, limited OoO processors are a compromise in terms of energy consumption and performance, as they have fewer hardware resources to tolerate the penalties of long-latency loads. In worst case, these loads may stall the processor entirely. We present Clairvoyance, a compiler based technique that generates code able to hide memory latency and better utilize simple OoO processors. By clustering loads found across basic block boundaries, Clairvoyance overlaps the outstanding latencies to increases memory-level parallelism. We show that these simple OoO processors, equipped with the appropriate compiler support, can effectively hide long-latency loads and achieve performance improvements for memory-bound applications. To this end, Clairvoyance tackles (i) statically unknown dependencies, (ii) insufficient independent instructions, and (iii) register pressure. Clairvoyance achieves a geomean execution time improvement of 14% for memory-bound applications, on top of standard O3 optimizations, while maintaining compute-bound applications’ high-performance. Index Terms—Compilers, code generation, memory management, optimization ✦ 1 INTRODUCTION result in a sub-optimal utilization of the limited OoO engine Computer architects of the past have steadily improved that may stall the core for an extended period of time. -
Computer Architectures an Overview
Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements. -
Electronic Products and Relays Selection Table Interface Relays CR-Range and R600 / R500 Range Pluggable Interface Relays
Electronic Products and Relays Selection Table Interface Relays CR-Range and R600 / R500 Range Pluggable Interface Relays CR-M Range Order number number Order 1SVR 405 611 R4000 1SVR 405 611 R1000 1SVR 405 611 R6000 1SVR 405 611 R4200 1SVR 405 611 R8000 1SVR 405 611 R8200 1SVR 405 611 R9000 1SVR 405 611 R0000 1SVR 405 611 R5000 1SVR 405 611 R7000 1SVR 405 611 R2000 1SVR 405 611 R3000 1SVR 405 612 R4000 1SVR 405 612 R1000 1SVR 405 612 R6000 1SVR 405 612 R4200 1SVR 405 612 R8000 1SVR 405 612 R8200 1SVR 405 612 R9000 1SVR 405 612 R0000 1SVR 405 612 R5000 1SVR 405 612 R5200 1SVR 405 612 R7000 1SVR 405 612 R2000 1SVR 405 612 R3000 1SVR 405 613 R4000 1SVR 405 613 R1000 1SVR 405 613 R6000 1SVR 405 613 R4200 1SVR 405 613 R8000 1SVR 405 613 R8200 1SVR 405 613 R9000 1SVR 405 613 R0000 1SVR 405 613 R5000 1SVR 405 613 R7000 1SVR 405 613 R2000 1SVR 405 613 R3000 1SVR 405 611 R4100 1SVR 405 611 R1100 1SVR 405 611 R6100 1SVR 405 611 R4300 1SVR 405 611 R8100 1SVR 405 611 R8300 1SVR 405 611 R9100 1SVR 405 611 R0100 1SVR 405 611 R5100 1SVR 405 611 R7100 1SVR 405 611 R2100 1SVR 405 611 R3100 1SVR 405 612 R4100 1SVR 405 612 R1100 1SVR 405 612 R6100 1SVR 405 612 R4300 1SVR 405 612 R8100 1SVR 405 612 R8300 1SVR 405 612 R9100 1SVR 405 612 R0100 1SVR 405 612 R5100 1SVR 405 612 R7100 1SVR 405 612 R2100 1SVR 405 612 R3100 1SVR 405 613 R4100 1SVR 405 613 R1100 1SVR 405 613 R6100 1SVR 405 613 R4300 1SVR 405 613 R8100 1SVR 405 613 R8300 1SVR 405 613 R9100 1SVR 405 613 R0100 1SVR 405 613 R5100 1SVR 405 613 R7100 1SVR 405 613 R2100 1SVR 405 613 R3100 1SVR 405 614 R1100 -
Parallel Processing Techniques: History and Usage of MIPS Approach for Implementation of Fast CPU
Vol-3 Issue-2 2017 IJARIIE-ISSN(O)-2395-4396 Parallel Processing Techniques: History and usage of MIPS approach for implementation of Fast CPU Drashti Joshi 1, Pooja Thakar 2 1 M.E. Student, Communication System Engineering, SAL Institute of Technology and Engineering Research, Gujarat, India. 2Assistant Professor, Communication System Engineering, SAL Institute of Technology and Engineering Research, Gujarat, India. ABSTRACT Parallel Processing is a term used to denote a class of technique that are used to provide simultaneous data processing task for the purpose of increasing computational speed of a computer system. Instead of processing each instruction sequentially as in conventional computer, a parallel processing system is able to perform concurrent data processing to achieve faster execution time. The Purpose of parallel processing techniques such as Pipeline processing, Vector processing and Array processing is to speed up the computer processing capability and increase its throughput, that is, the amount of processing that can be accomplished during a given interval of time. The amount of hardware increases with parallel processing, and with it, the cost of the system increases, However, technological developments have been reduced hardware cost to the point where parallel processing techniques are economically feasible. Advantage of parallel processing will be implemented using MIPS architecture for efficient implementation and of course its cost. Keyword: - Parallel processing, Pipelining, MIPS, and Interlock. 1. INTRODUCTION Basic computer architectures have been designed using two philosophies : 1) CISC (Complex Instruction set computer) 2) RISC (Reduced Instruction set computer) CISC support variety of addressing modes ,variable number of operands in various location in instruction set, wide variety of instructions with varying lengths and execution time and thus demanding complex control unit, which occupies large area on chip. -
Introduction to Software Pipelining in the IA-64 Architecture
Introduction to Software Pipelining in the IA-64 Architecture ® IA-64 Software Programs This presentation is an introduction to software pipelining in the IA-64 architecture. It is recommended that the reader have some exposure to the IA-64 architecture prior to viewing this presentation. Page 1 Agenda Objectives What is software pipelining? IA-64 architectural support Assembly code example Compiler support Summary ® IA-64 Software Programs Page 2 Objectives Introduce the concept of software pipelining (SWP) Understand the IA-64 architectural features that support SWP See an assembly code example Know how to enable SWP in compiler ® IA-64 Software Programs Page 3 What is Software Pipelining? ® IA-64 Software Programs Page 4 Software Pipelining Software performance technique that overlaps the execution of consecutive loop iterations Exploits instruction level parallelism across iterations ® IA-64 Software Programs Software Pipelining (SWP) is the term for overlapping the execution of consecutive loop iterations. SWP is a performance technique that can be done in just about every computer architecture. SWP is closely related to loop unrolling. Page 5 Software Pipelining i = 1 i = 1 i = 2 i = 1 i = 2 i = 3 i = 1 i = 2 i = 3 i = 4 Time i = 2 i = 3 i = 4 i = 5 i = 3 i = 4 i = 5 i = 6 i = 4 i = 5 i = 6 i = 5 i = 6 i = 6 ® IA-64 Software Programs Here is a conceptual block diagram of a software pipeline. The loop code is separated into four pipeline stages. Six iterations of the loop are shown (i = 1 to 6).