Compiler Writer's Guide for the Alpha 21264

Total Page:16

File Type:pdf, Size:1020Kb

Compiler Writer's Guide for the Alpha 21264 Compiler Writer’s Guide for the Alpha 21264 Order Number: EC–RJ66A–TE This document provides guidance for writing compilers for the Alpha 21264 micro- processor. You can access this document from the following website: ftp.digital.com/pub/Digital/info/semiconductor/literature/dsc-library.html Revision/Update Information: This is a new document. Compaq Computer Corporation June 1999 The information in this publication is subject to change without notice. COMPAQ COMPUTER CORPORATION SHALL NOT BE LIABLE FOR TECHNICAL OR EDITORIAL ERRORS OR OMISSIONS CONTAINED HEREIN, NOR FOR INCIDENTAL OR CONSEQUENTIAL DAM- AGES RESULTING FROM THE FURNISHING, PERFORMANCE, OR USE OF THIS MATERIAL. THIS INFORMATION IS PROVIDED “AS IS” AND COMPAQ COMPUTER CORPORATION DISCLAIMS ANY WARRANTIES, EXPRESS, IMPLIED OR STATUTORY AND EXPRESSLY DISCLAIMS THE IMPLIED WAR- RANTIES OF MERCHANTABILITY, FITNESS FOR PARTICULAR PURPOSE, GOOD TITLE AND AGAINST INFRINGEMENT. This publication contains information protected by copyright. No part of this publication may be photocopied or reproduced in any form without prior written consent from Compaq Computer Corporation. © 1999 Digital Equipment Corporation. All rights reserved. Printed in the U.S.A. COMPAQ, the Compaq logo, the Digital logo, and VAX Registered in United States Patent and Trademark Office. Pentium is a registered trademark of Intel Corporation. Other product names mentioned herein may be trademarks and/or registered trademarks of their respective compa- nies. 21264 Compiler Writer’s Guide Table of Contents Preface 1 Introduction 1.1 The Architecture . 1–1 1.1.1 Addressing . 1–2 1.1.2 Integer Data Types. 1–2 1.1.3 Floating-Point Data Types . 1–2 1.2 Microprocessor Features. 1–3 2 Internal Architecture 2.1 Microarchitecture. 2–1 2.1.1 Instruction Fetch, Issue, and Retire Unit . 2–2 2.1.1.1 Virtual Program Counter Logic . 2–2 2.1.1.2 Branch Predictor . 2–3 2.1.1.3 Instruction-Stream Translation Buffer . 2–5 2.1.1.4 Instruction Fetch Logic . 2–5 2.1.1.5 Register Rename Maps . 2–6 2.1.1.6 Integer Issue Queue . 2–6 2.1.1.7 Floating-Point Issue Queue . 2–7 2.1.1.8 Exception and Interrupt Logic . 2–8 2.1.1.9 Retire Logic. 2–8 2.1.2 Integer Execution Unit . 2–8 2.1.3 Floating-Point Execution Unit. 2–10 2.1.4 External Cache and System Interface Unit . 2–11 2.1.4.1 Victim Address File and Victim Data File . 2–11 2.1.4.2 I/O Write Buffer . 2–11 2.1.4.3 Probe Queue. 2–11 2.1.4.4 Duplicate Dcache Tag Array . 2–11 2.1.5 Onchip Caches. 2–11 2.1.5.1 Instruction Cache . 2–11 2.1.5.2 Data Cache . 2–12 2.1.6 Memory Reference Unit . 2–12 2.1.6.1 Load Queue . 2–13 2.1.6.2 Store Queue . 2–13 2.1.6.3 Miss Address File . 2–13 2.1.6.4 Dstream Translation Buffer . 2–13 2.1.7 SROM Interface . 2–13 2.2 Pipeline Organization . 2–13 2.2.1 Pipeline Aborts . 2–16 2.3 Instruction Issue Rules . 2–16 21264 Compiler Writer’s Guide iii 2.3.1 Instruction Group Definitions . 2–17 2.3.2 Ebox Slotting . 2–18 2.3.3 Instruction Latencies . 2–19 2.4 Instruction Retire Rules . 2–21 2.4.1 Floating-Point Divide/Square Root Early Retire . 2–21 2.5 Retire of Operate Instructions into R31/F31 . 2–22 2.6 Load Instructions to R31 and F31 . 2–22 2.6.1 Normal Prefetch: LDBU, LDF, LDG, LDL, LDT, LDWU Instructions . 2–23 2.6.2 Prefetch with Modify Intent: LDS Instruction . 2–23 2.6.3 Prefetch, Evict Next: LDQ Instruction. 2–23 2.7 Special Cases of Alpha Instruction Execution. 2–23 2.7.1 Load Hit Speculation . 2–23 2.7.2 Floating-Point Store Instructions . 2–25 2.7.3 CMOV Instruction. 2–25 2.8 Memory and I/O Address Space Instructions . 2–26 2.8.1 Memory Address Space Load Instructions . 2–26 2.8.2 I/O Address Space Load Instructions. 2–27 2.8.3 Memory Address Space Store Instructions . 2–28 2.8.4 I/O Address Space Store Instructions . 2–28 2.9 MAF Memory Address Space Merging Rules . 2–29 2.10 Instruction Ordering. 2–29 2.11 Replay Traps . 2–30 2.11.1 Mbox Order Traps . 2–30 2.11.1.1 Load-Load Order Trap . 2–31 2.11.1.2 Store-Load Order Trap . 2–31 2.11.2 Other Mbox Replay Traps . 2–31 2.12 I/O Write Buffer and the WMB Instruction . 2–31 2.12.1 Memory Barrier (MB/WMB/TB Fill Flow) . 2–31 2.12.1.1 MB Instruction Processing . 2–32 2.12.1.2 WMB Instruction Processing. 2–33 2.12.1.3 TB Fill Flow . ..
Recommended publications
  • Using the GNU Compiler Collection (GCC)
    Using the GNU Compiler Collection (GCC) Using the GNU Compiler Collection by Richard M. Stallman and the GCC Developer Community Last updated 23 May 2004 for GCC 3.4.6 For GCC Version 3.4.6 Published by: GNU Press Website: www.gnupress.org a division of the General: [email protected] Free Software Foundation Orders: [email protected] 59 Temple Place Suite 330 Tel 617-542-5942 Boston, MA 02111-1307 USA Fax 617-542-2652 Last printed October 2003 for GCC 3.3.1. Printed copies are available for $45 each. Copyright c 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with the Invariant Sections being \GNU General Public License" and \Funding Free Software", the Front-Cover texts being (a) (see below), and with the Back-Cover Texts being (b) (see below). A copy of the license is included in the section entitled \GNU Free Documentation License". (a) The FSF's Front-Cover Text is: A GNU Manual (b) The FSF's Back-Cover Text is: You have freedom to copy and modify this GNU Manual, like GNU software. Copies published by the Free Software Foundation raise funds for GNU development. i Short Contents Introduction ...................................... 1 1 Programming Languages Supported by GCC ............ 3 2 Language Standards Supported by GCC ............... 5 3 GCC Command Options .........................
    [Show full text]
  • The Alpha 21264 Microprocessor: Out-Of-Order Execution at 600 Mhz
    The Alpha 21264 Microprocessor: Out-of-Order Execution at 600 Mhz R. E. Kessler COMPAQ Computer Corporation Shrewsbury, MA REK August 1998 1 Some Highlights z Continued Alpha performance leadership y 600 Mhz operation in 0.35u CMOS6, 6 metal layers, 2.2V y 15 Million transistors, 3.1 cm2, 587 pin PGA y Specint95 of 30+ and Specfp95 of 50+ y Out-of-order and speculative execution y 4-way integer issue y 2-way floating-point issue y Sophisticated tournament branch prediction y High-bandwidth memory system (1+ GB/sec) REK August 1998 2 Alpha 21264: Block Diagram FETCH MAP QUEUE REG EXEC DCACHE Stage: 0 1 2 3 4 5 6 Int Branch Int Reg Exec Predictors Reg Issue File Queue Addr Sys Bus Map (80) Exec (20) L1 Bus 64-bit Data Reg Exec Inter- Cache Bus 80 in-flight instructions File Cache plus 32 loads and 32 stores Addr face 64KB 128-bit (80) Exec Unit Next-Line 2-Set Address Phys Addr 4 Instructions / cycle L1 Ins. 44-bit Cache FP ADD FP Reg 64KB FP Div/Sqrt Issue File Victim 2-Set Reg Queue (72) FP MUL Buffer Map (15) Miss Address REK August 1998 3 Alpha 21264: Block Diagram FETCH MAP QUEUE REG EXEC DCACHE Stage: 0 1 2 3 4 5 6 Int Branch Int Reg Exec Predictors Reg Issue File Queue Addr Sys Bus Map (80) Exec (20) L1 Bus 64-bit Data Reg Exec Inter- Cache Bus 80 in-flight instructions File Cache plus 32 loads and 32 stores Addr face 64KB 128-bit (80) Exec Unit Next-Line 2-Set Address Phys Addr 4 Instructions / cycle L1 Ins.
    [Show full text]
  • Common Object File Format (COFF)
    Application Report SPRAAO8–April 2009 Common Object File Format ..................................................................................................................................................... ABSTRACT The assembler and link step create object files in common object file format (COFF). COFF is an implementation of an object file format of the same name that was developed by AT&T for use on UNIX-based systems. This format encourages modular programming and provides powerful and flexible methods for managing code segments and target system memory. This appendix contains technical details about the Texas Instruments COFF object file structure. Much of this information pertains to the symbolic debugging information that is produced by the C compiler. The purpose of this application note is to provide supplementary information on the internal format of COFF object files. Topic .................................................................................................. Page 1 COFF File Structure .................................................................... 2 2 File Header Structure .................................................................. 4 3 Optional File Header Format ........................................................ 5 4 Section Header Structure............................................................. 5 5 Structuring Relocation Information ............................................... 7 6 Symbol Table Structure and Content........................................... 11 SPRAAO8–April 2009
    [Show full text]
  • 6 Preparing to Port Applications 6.1 Ada
    Porting Applications from HP OpenVMS Alpha to HP OpenVMS Industry Standard 64 for Integrity Servers Order Number: BA442–90001 January 2005 This manual provides a framework for application developers who are migrating applications from HP OpenVMS Alpha to HP OpenVMS Industry Standard 64 for Integrity Servers. Revision/Update Information: This is a new manual. Software Version: OpenVMS I64 Version 8.2 OpenVMS Alpha Version 8.2 Hewlett-Packard Company Palo Alto, California © Copyright 2005 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor’s standard commercial license. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. Intel, Itanium, and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Java is a U.S. trademark of Sun Microsystems, Inc. UNIX is a registered trademark of The Open Group. Printed in the US ZK6673 This document was prepared using DECdocument, Version 3.3-1b. Contents Preface ............................................................ ix 1 Introduction 1.1 OpenVMS Industry Standard 64 for Integrity Servers . ............... 1–1 1.2 Overview of the Porting Process .
    [Show full text]
  • Virtual Memory - Paging
    Virtual memory - Paging Johan Montelius KTH 2020 1 / 32 The process code heap (.text) data stack kernel 0x00000000 0xC0000000 0xffffffff Memory layout for a 32-bit Linux process 2 / 32 Segments - a could be solution Processes in virtual space Address translation by MMU (base and bounds) Physical memory 3 / 32 one problem Physical memory External fragmentation: free areas of free space that is hard to utilize. Solution: allocate larger segments ... internal fragmentation. 4 / 32 another problem virtual space used code We’re reserving physical memory that is not used. physical memory not used? 5 / 32 Let’s try again It’s easier to handle fixed size memory blocks. Can we map a process virtual space to a set of equal size blocks? An address is interpreted as a virtual page number (VPN) and an offset. 6 / 32 Remember the segmented MMU MMU exception no virtual addr. offset yes // < within bounds index + physical address segment table 7 / 32 The paging MMU MMU exception virtual addr. offset // VPN available ? + physical address page table 8 / 32 the MMU exception exception virtual address within bounds page available Segmentation Paging linear address physical address 9 / 32 a note on the x86 architecture The x86-32 architecture supports both segmentation and paging. A virtual address is translated to a linear address using a segmentation table. The linear address is then translated to a physical address by paging. Linux and Windows do not use use segmentation to separate code, data nor stack. The x86-64 (the 64-bit version of the x86 architecture) has dropped many features for segmentation.
    [Show full text]
  • Computer Organization EECC 550 • Introduction: Modern Computer Design Levels, Components, Technology Trends, Register Transfer Week 1 Notation (RTN)
    Computer Organization EECC 550 • Introduction: Modern Computer Design Levels, Components, Technology Trends, Register Transfer Week 1 Notation (RTN). [Chapters 1, 2] • Instruction Set Architecture (ISA) Characteristics and Classifications: CISC Vs. RISC. [Chapter 2] Week 2 • MIPS: An Example RISC ISA. Syntax, Instruction Formats, Addressing Modes, Encoding & Examples. [Chapter 2] • Central Processor Unit (CPU) & Computer System Performance Measures. [Chapter 4] Week 3 • CPU Organization: Datapath & Control Unit Design. [Chapter 5] Week 4 – MIPS Single Cycle Datapath & Control Unit Design. – MIPS Multicycle Datapath and Finite State Machine Control Unit Design. Week 5 • Microprogrammed Control Unit Design. [Chapter 5] – Microprogramming Project Week 6 • Midterm Review and Midterm Exam Week 7 • CPU Pipelining. [Chapter 6] • The Memory Hierarchy: Cache Design & Performance. [Chapter 7] Week 8 • The Memory Hierarchy: Main & Virtual Memory. [Chapter 7] Week 9 • Input/Output Organization & System Performance Evaluation. [Chapter 8] Week 10 • Computer Arithmetic & ALU Design. [Chapter 3] If time permits. Week 11 • Final Exam. EECC550 - Shaaban #1 Lec # 1 Winter 2005 11-29-2005 Computing System History/Trends + Instruction Set Architecture (ISA) Fundamentals • Computing Element Choices: – Computing Element Programmability – Spatial vs. Temporal Computing – Main Processor Types/Applications • General Purpose Processor Generations • The Von Neumann Computer Model • CPU Organization (Design) • Recent Trends in Computer Design/performance • Hierarchy
    [Show full text]
  • Address Translation
    CS 152 Computer Architecture and Engineering Lecture 8 - Address Translation John Wawrzynek Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~johnw http://inst.eecs.berkeley.edu/~cs152 9/27/2016 CS152, Fall 2016 CS152 Administrivia § Lab 2 due Friday § PS 2 due Tuesday § Quiz 2 next Thursday! 9/27/2016 CS152, Fall 2016 2 Last time in Lecture 7 § 3 C’s of cache misses – Compulsory, Capacity, Conflict § Write policies – Write back, write-through, write-allocate, no write allocate § Multi-level cache hierarchies reduce miss penalty – 3 levels common in modern systems (some have 4!) – Can change design tradeoffs of L1 cache if known to have L2 § Prefetching: retrieve memory data before CPU request – Prefetching can waste bandwidth and cause cache pollution – Software vs hardware prefetching § Software memory hierarchy optimizations – Loop interchange, loop fusion, cache tiling 9/27/2016 CS152, Fall 2016 3 Bare Machine Physical Physical Address Inst. Address Data Decode PC Cache D E + M Cache W Physical Memory Controller Physical Address Address Physical Address Main Memory (DRAM) § In a bare machine, the only kind of address is a physical address 9/27/2016 CS152, Fall 2016 4 Absolute Addresses EDSAC, early 50’s § Only one program ran at a time, with unrestricted access to entire machine (RAM + I/O devices) § Addresses in a program depended upon where the program was to be loaded in memory § But it was more convenient for programmers to write location-independent subroutines How
    [Show full text]
  • Lecture 12: Virtual Memory Finale and Interrupts/Exceptions 1 Review And
    EE360N: Computer Architecture Lecture #12 Department of Electical and Computer Engineering The University of Texas at Austin October 9, 2008 Disclaimer: The contents of this document are scribe notes for The University of Texas at Austin EE360N Fall 2008, Computer Architecture.∗ The notes capture the class discussion and may contain erroneous and unverified information and comments. Lecture 12: Virtual Memory Finale and Interrupts/Exceptions Lecture #12: October 8, 2008 Lecturer: Derek Chiou Scribe: Michael Sullivan and Peter Tran Lecture 12 of EE360N: Computer Architecture summarizes the significance of virtual memory and introduces the concept of interrupts and exceptions. As such, this document is divided into two sections. Section 1 details the key points from the discussion in class on virtual memory. Section 2 goes on to introduce interrupts and exceptions, and shows why they are necessary. 1 Review and Summation of Virtual Memory The beginning of this lecture wraps up the review of virtual memory. A broad summary of virtual memory systems is given first, in Subsection 1.1. Next, the individual components of the hardware and OS that are required for virtual memory support are covered in 1.2. Finally, the addressing of caches and the interface between the cache and virtual memory are described in 1.3. 1.1 Summary of Virtual Memory Systems Virtual memory automatically provides the illusion of a large, private, uniform storage space for each process, even on a limited amount of physically addressed memory. The virtual memory space available to every process looks the same every time the process is run, irrespective of the amount of memory available, or the program’s actual placement in main memory.
    [Show full text]
  • Digital Semiconductor Alpha 21064 and Alpha 21064A Microprocessors Hardware Reference Manual
    Digital Semiconductor Alpha 21064 and Alpha 21064A Microprocessors Hardware Reference Manual Order Number: EC–Q9ZUC–TE Abstract: This document contains information about the following Alpha microprocessors: 21064-150, 21064-166, 21064-200, 21064A-200, 21064A-233, 21064A-275, 21064A-275-PC, and 21064A-300. Revision/Update Information: This manual supersedes the Alpha 21064 and Alpha 21064A Microprocessors Hard- ware Reference Manual (EC–Q9ZUB–TE). Digital Equipment Corporation Maynard, Massachusetts June 1996 While Digital believes the information included in this publication is correct as of the date of publication, it is subject to change without notice. Digital Equipment Corporation makes no representations that the use of its products in the manner described in this publication will not infringe on existing or future patent rights, nor do the descriptions contained in this publication imply the granting of licenses to make, use, or sell equipment or software in accordance with the description. © Digital Equipment Corporation 1996. All rights reserved. Printed in U.S.A. AlphaGeneration, Digital, Digital Semiconductor, OpenVMS, VAX, VAX DOCUMENT, the AlphaGeneration design mark, and the DIGITAL logo are trademarks of Digital Equipment Corporation. Digital Semiconductor is a Digital Equipment Corporation business. GRAFOIL is a registered trademark of Union Carbide Corporation. Windows NT is a trademark of Microsoft Corporation. All other trademarks and registered trademarks are the property of their respective owners. This document was prepared using VAX DOCUMENT Version 2.1. Contents Preface ..................................................... xix 1 Introduction to the 21064/21064A 1.1 Introduction ......................................... 1–1 1.2 The Architecture . ................................... 1–1 1.3 Chip Features ....................................... 1–2 1.4 Backward Compatibility ...............................
    [Show full text]
  • Alpha 21064A Microprocessors Data Sheet
    Alpha 21064A Microprocessors Data Sheet Order Number: EC–QFGKC–TE This document contains information about the following Alpha micro- processors: 21064A–200, 21064A–233, 21064A–275, 21064A–275–PC, and 21064A–300. Revision/Update Information: This document supersedes the Alpha 21064A–233, –275 Microprocessor Data Sheet, EC–QFGKB–TE. Digital Equipment Corporation Maynard, Massachusetts January 1996 While Digital believes the information included in this publication is correct as of the date of publication, it is subject to change without notice. Digital Equipment Corporation makes no representations that the use of its products in the manner described in this publication will not infringe on existing or future patent rights, nor do the descriptions contained in this publication imply the granting of licenses to make, use, or sell equipment or software in accordance with the description. © Digital Equipment Corporation 1995, 1996. All rights reserved. Printed in U.S.A. AlphaGeneration, Digital, Digital Semiconductor, OpenVMS, VAX, VAX DOCUMENT, the AlphaGeneration design mark, and the DIGITAL logo are trademarks of Digital Equipment Corporation. Digital Semiconductor is a Digital Equipment Corporation business. GRAFOIL is a registered trademark of Union Carbide Corporation. Windows NT is a trademark of Microsoft Corporation. All other trademarks and registered trademarks are the property of their respective owners. This document was prepared using VAX DOCUMENT Version 2.1. Contents 1 Overview ........................................... 1 2 Signal Names and Functions . ........................... 6 3 Instruction Set ....................................... 17 3.1 Instruction Summary ............................... 17 3.2 IEEE Floating-Point Instructions . ................... 23 3.3 21064A IEEE Floating-Point Conformance .............. 25 3.4 VAX Floating-Point Instructions . ................... 28 3.5 Required PALcode Function Codes .
    [Show full text]
  • Design of the RISC-V Instruction Set Architecture
    Design of the RISC-V Instruction Set Architecture Andrew Waterman Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2016-1 http://www.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-1.html January 3, 2016 Copyright © 2016, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Design of the RISC-V Instruction Set Architecture by Andrew Shell Waterman A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor David Patterson, Chair Professor Krste Asanovi´c Associate Professor Per-Olof Persson Spring 2016 Design of the RISC-V Instruction Set Architecture Copyright 2016 by Andrew Shell Waterman 1 Abstract Design of the RISC-V Instruction Set Architecture by Andrew Shell Waterman Doctor of Philosophy in Computer Science University of California, Berkeley Professor David Patterson, Chair The hardware-software interface, embodied in the instruction set architecture (ISA), is arguably the most important interface in a computer system. Yet, in contrast to nearly all other interfaces in a modern computer system, all commercially popular ISAs are proprietary.
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]