How to Fake 1000 Registers

Total Page:16

File Type:pdf, Size:1020Kb

How to Fake 1000 Registers Appears in MICRO-38, November 2005 How to Fake 1000 Registers David W. Oehmke†, Nathan L. Binkert, Trevor Mudge, Steven K. Reinhardt Advanced Computer Architecture Lab University of Michigan, Ann Arbor {doehmke, binkertn, tnm, stever}@eecs.umich.edu Abstract those original ideas have merit. This paper shows how architects can achieve the benefits of large logical regis- Large numbers of logical registers can improve ter files using a standard physical register file. performance by allowing fast access to multiple Registers are a central component of both instruc- subroutine contexts (register windows) and multiple tion-set architectures (ISAs) and processor microarchi- thread contexts (multithreading). Support for both of tectures. From the ISA’s perspective, a small register these together requires a multiplicative number of namespace allows the encoding of multiple operands in registers that quickly becomes prohibitive. We overcome an instruction of reasonable size. Registers also provide this limitation with the virtual context architecture a simple, unambiguous specification of data depen- (VCA), a new register-file architecture that virtualizes dences, because—unlike memory locations—they are logical register contexts. VCA works by treating the specified directly in the instruction and cannot be physical registers as a cache of a much larger memory- aliased. From the microarchitecture’s point of view, reg- mapped logical register space. Complete contexts, isters comprise a set of high-speed, high-bandwidth stor- whether activation records or threads, are no longer age locations that are integrated into the datapath more required to reside in their entirety in the physical register tightly than a data cache, and are thus far more capable file. A VCA implementation of register windows on a of keeping up with a modern superscalar execution core. single-threaded machine reduces data cache accesses by As with many architectural features, these two pur- 20%, providing the same performance as a conventional poses of registers can conflict. For example, the depen- machine while requiring one fewer cache port. Using dence specification encoded by the register assignment VCA to support multithreading enables a four-thread is adequate for the ISA’s sequential execution semantics. machine to use half as many physical registers without a However, out-of-order execution requires that these logi- significant performance loss. VCA naturally extends to cal registers be renamed into an alternate, larger physical support both multithreading and register windows, register space to eliminate false dependencies. providing higher performance with significantly fewer This paper addresses a different conflict between the registers than a conventional machine. abstraction of registers and its implementation: that of context. A logical register identifier is meaningful only 1. Introduction in the context of a particular procedure instance (activa- tion record) in a particular thread of execution. From the Twenty-five years ago architects expected VLSI ISA’s perspective, the processor supports exactly one trends to enable processors with thousands of registers, context at any point in time. However, a processor and they devised schemes such as register windows and designer may wish for an implementation to support multithreading to take advantage of this resource [25]. multiple contexts for several reasons: to support multi- Unfortunately, the realities of wire delay and power con- threading, to reduce context switch overhead, or to sumption, along with the large number of ports needed reduce procedure call/return overhead (e.g., using regis- to support wide-issue superscalar pipelines, made these ter windows) [15, 27, 29, 5, 20, 25]. Conventional extremely large register files impractical. Nevertheless, designs require that each active context be present in its entirety; thus each additional context adds directly to the size of the register file. Unfortunately, larger register † Currently at Cray Inc. files are inherently slower to access, leading to a slower cycle time or additional cycles of register access The virtual context architecture enables a near latency, either of which reduces overall performance. optimal implementation of register windows, improv- This problem is further compounded by the additional ing performance and greatly reducing traffic to the data rename registers necessary to support out-of-order exe- cache (by up to 10% and 20%, in our simulations). cution. VCA naturally supports simultaneous multithreading We seek to bypass this trade-off between multiple (SMT) as well, allowing large numbers of threads to be context support and register file size by decoupling the multiplexed on relatively small physical register files. logical register requirements of active contexts from Our results show that a VCA SMT machine can the contents of the physical register file. Just as caches achieve speedups comparable to a conventional SMT and virtual memory allow a processor to give the illu- implementation with twice as many registers. sion of numerous multi-gigabyte address spaces with Implementing both register windows and multi- an average access time approaching that of several threading in the same processor requires a multiplica- kilobytes of SRAM, we propose an architecture that tive number of contexts. The one such machine of gives the illusion of numerous active contexts with an which we are aware (Sun’s Niagara [12]) has 640 reg- average access time approaching that of a convention- isters per core—even though it is an in-order pipeline, ally sized register file. Our design treats the physical it does not require renaming registers. VCA provides register file as a cache of a practically unlimited mem- unified support for both techniques in an out-of-order ory-backed logical register space. We call this scheme pipeline without the multiplicative increase in register the virtual context architecture (VCA). count normally required. We show that VCA can In VCA, an individual instruction needs only its achieve near-peak performance on an out-of-order source operands and its destination register to be pipeline using register windows and four threads (as in present in the register file to execute. Inactive register Niagara) with only 192 physical registers. values are automatically saved to memory as needed, We describe VCA in more detail in Section 2. and restored to the register file on demand. The archi- Section 3 discusses our simulation-based evaluation tecture modifies the rename stage of the pipeline to methodology. Section 4 evaluates a VCA-based imple- trigger the movement of register values between the mentations of register windows, SMT, and a combina- physical register file and the data cache. Furthermore, a tion of both techniques. Section 5 presents previous thread can change its register context simply by chang- work. Section 6 concludes and discusses future work. ing a base pointer—either to another register window on a call or return, or to an entirely different software 2. The Virtual Context Architecture thread context. Compared to prior proposals (see Section 5), VCA: This section presents the virtual context architec- • unifies support for both multiple independent ture in two phases. First, we outline the basic changes threads and register windowing within each thread; to a conventional out-of-order processor required to • completely decouples the physical register file size support VCA. Second, we describe a handful of imple- from the number of logical registers by using mem- mentation optimizations that make VCA more effi- ory as a backing store; cient. • enables the physical register file to hold just the most active subset of logical register values, instead 2.1. Basic VCA implementation of the complete register contexts, by allowing the hardware to spill and fill registers on demand; The fundamental operation of VCA lies in map- • is backward compatible with existing ISAs at the ping logical registers to physical registers. VCA builds application level for multithreaded contexts, and on the register renaming logic already found in a mod- requires only minimal ISA changes for register ern dynamically scheduled processor. A key feature of windowing; VCA is that it has minimal impact outside of the • requires no changes to the physical register file rename stage. Figure 1 illustrates the blocks that are design and the performance-critical schedule/exe- changed. We describe the basic operation of VCA by cute/writeback loop; first discussing the VCA renaming process, then detail- • is orthogonal to the other common techniques for ing the states and state transitions of physical registers. reducing the latency of the register file—register We then discuss the implications for branch recovery, banking and register caching; and the requirements for supporting multithreading and • does not involve speculation or prediction, avoid- register windows. ing the need for recovery mechanisms. physical register allocation are discussed in the follow- F D R1 R2 IQ ALU ing section.) RF In the simplest incarnation, fills and spills can be LSU D$ ASTQ accomplished by injecting load and store operations, I$ respectively, into the processor’s instruction queue. Figure 1: Pipeline diagram. These operations can use the same register ports, data- Shaded regions depict pipeline changes for VCA. paths, and cache ports as regular load and store instruc- R2 represents the extra rename stage necessary. tions, but are simpler in some
Recommended publications
  • Risc I: a Reduced Instruction Set Vlsi Computer
    RISC I: A REDUCED INSTRUCTION SET VLSI COMPUTER DAVID A. PATTERSON and CARLO H. SEQUIN Computer Science Division University of California Berkeley, California ABSTRACT to implement CISC is the best way to use this “scarce” resource. The Reduced Instruction Set Computer (RISC) Project investigates an alternatrve to the general trend toward computers wrth increasingly complex instruction sets: With a The above findings led to the Reduced Instruction Set proper set of instructions and a corresponding architectural Computer (RISC) Project. The purpose of the project is design, a machine wrth a high effective throughput can be to explore alternatives to the general trend toward achieved. The simplicity of the instruction set and addressing architectural complexity. The hypothesis is that by modes allows most Instructions to execute in a single machine cycle, and the srmplicity of each instruction guarantees a short reducing the instruction set, VLSI architecture can be cycle time. In addition, such a machine should have a much designed that uses the scarce resources more effectively shorter design trme. than CISC. We also expect this approach to reduce design time, the number of design errors, and the This paper presents the architecture of RISC I and its novel execution time of individual instructions. hardware support scheme for procedure call/return. Overlapprng sets of regrster banks that can pass parameters directly to subrouttnes are largely responsible for the excellent Our initial version of such a computer is called RISC I. performance of RISC I. Static and dynamtc comparisons To meet our goals of simplicity and effective single-chip between this new architecture and more traditional machines implementation, we placed the following “constraints” are given.
    [Show full text]
  • Computer Architecture and Assembly Language
    Computer Architecture and Assembly Language Gabriel Laskar EPITA 2015 License I Copyright c 2004-2005, ACU, Benoit Perrot I Copyright c 2004-2008, Alexandre Becoulet I Copyright c 2009-2013, Nicolas Pouillon I Copyright c 2014, Joël Porquet I Copyright c 2015, Gabriel Laskar Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with the Invariant Sections being just ‘‘Copying this document’’, no Front-Cover Texts, and no Back-Cover Texts. Introduction Part I Introduction Gabriel Laskar (EPITA) CAAL 2015 3 / 378 Introduction Problem definition 1: Introduction Problem definition Outline Gabriel Laskar (EPITA) CAAL 2015 4 / 378 Introduction Problem definition What are we trying to learn? Computer Architecture What is in the hardware? I A bit of history of computers, current machines I Concepts and conventions: processing, memory, communication, optimization How does a machine run code? I Program execution model I Memory mapping, OS support Gabriel Laskar (EPITA) CAAL 2015 5 / 378 Introduction Problem definition What are we trying to learn? Assembly Language How to “talk” with the machine directly? I Mechanisms involved I Assembly language structure and usage I Low-level assembly language features I C inline assembly Gabriel Laskar (EPITA) CAAL 2015 6 / 378 I Programmers I Wise managers Introduction Problem definition Who do I talk to? I System gurus I Low-level enthusiasts Gabriel Laskar (EPITA) CAAL
    [Show full text]
  • VME for Experiments Chairman: Junsei Chiba (KEK)
    KEK Report 89-26 March 1990 D PROCEEDINGS of SYMPOSIUM on Data Acquisition and Processing for Next Generation Experiments 9 -10 March 1989 KEK, Tsukuba Edited by H. FUJII, J. CHIBA and Y. WATASE NATIONAL LABORATORY FOR HIGH ENERGY PHYSICS PROCEEDINGS of SYMPOSIUM on Data Acquisition and Processing for Next Generation Experiments 9 - 10 March 1989 KEK, Tsukuba Edited H. Fiflii, J. Chiba andY. Watase i National Laboratory for High Energy Physics, 1990 KEK Reports are available from: Technical Infonnation&Libraiy National Laboratory for High Energy Physics 1-1 Oho, Tsukuba-shi Ibaraki-ken, 305 JAPAN Phone: 0298-64-1171 Telex: 3652-534 (Domestic) (0)3652-534 (International) Fax: 0298-64-4604 Cable: KEKOHO Foreword This symposium has been organized to foresee the next generation of data acquisition and processing system in high energy physics and nuclear physics experiments. The recent revolutionary progress in the semiconductor and computer technologies is giving us an oppotunity to extend our idea on the experiments. The high density electronics of LSI technology provides an ideal front-end electronics such as readout circuits for silicon strip detector and multi-anode phototubes as well as wire chambers. The VLSI technology has advantages over the obsolite discrete one in the various aspects ; reduction of noise, small propagation delay, lower power dissipation, small space for the installation, improvement of the system reliability and maintenability. The small sized front-end electronics will be mounted just on the detector and the digital data might be transfered off the detector to the computer room with optical fiber data transmission lines. Then, a monster of bandies of signal cables might disappear from the experimental area.
    [Show full text]
  • MIPS Assembly Language Programming Using Qtspim
    MIPS Assembly Language Programming using QtSpim Ed Jorgensen, Ph.D. Version 1.1.50 July 2019 Cover image: MIPS R3000 Custom Chip http://commons.wikimedia.org/wiki/File:RCP-NUS_01.jpg Spim is copyrighted by James Larus and distributed under a BSD license. Copyright (c) 1990-2011, James R. Larus. All rights reserved. Copyright © 2013, 2014, 2015, 2016, 2017 by Ed Jorgensen You are free: To Share — to copy, distribute and transmit the work To Remix — to adapt the work Under the following conditions: Attribution — you must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). Noncommercial — you may not use this work for commercial purposes. Share Alike — if you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one. Table of Contents 1.0 Introduction...........................................................................................................1 1.1 Additional References.........................................................................................1 2.0 MIPS Architecture Overview..............................................................................3 2.1 Architecture Overview........................................................................................3 2.2 Data Types/Sizes.................................................................................................4 2.3 Memory...............................................................................................................4
    [Show full text]
  • Design of the RISC-V Instruction Set Architecture
    Design of the RISC-V Instruction Set Architecture Andrew Waterman Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2016-1 http://www.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-1.html January 3, 2016 Copyright © 2016, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Design of the RISC-V Instruction Set Architecture by Andrew Shell Waterman A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor David Patterson, Chair Professor Krste Asanovi´c Associate Professor Per-Olof Persson Spring 2016 Design of the RISC-V Instruction Set Architecture Copyright 2016 by Andrew Shell Waterman 1 Abstract Design of the RISC-V Instruction Set Architecture by Andrew Shell Waterman Doctor of Philosophy in Computer Science University of California, Berkeley Professor David Patterson, Chair The hardware-software interface, embodied in the instruction set architecture (ISA), is arguably the most important interface in a computer system. Yet, in contrast to nearly all other interfaces in a modern computer system, all commercially popular ISAs are proprietary.
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Ultrasparc Architecture 2007
    UltraSPARC Architecture 2007 One Architecture ... Multiple Innovative Implementations Draft D0.9.4, 27 Sep 2010 Privilege Levels: Hyperprivileged, Privileged, and Nonprivileged Distribution: Public Part No.No: 950-5554-15 ReleaseRevision: 1.0, Draft 2002 D0.9.4, 27 Sep 2010 Oracle Corporation 4150 Network Circle Santa Clara, CA 95054 U.S.A. 650-960-1300 ii UltraSPARC Architecture 2007 • Draft D0.9.4, 27 Sep 2010 Copyright © 2007, 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. UNIX is a registered trademark licensed through X/Open Company, Ltd.. Comments and "bug reports” regarding this document are welcome; they should be submitted to email address: [email protected] iv UltraSPARC Architecture 2007 • Draft D0.9.4, 27 Sep 2010 Contents Preface. i 1 Document Overview . 1 1.1 Navigating UltraSPARC Architecture 2007 . 1 1.2 Fonts and Notational Conventions . 2 1.2.1 Implementation Dependencies . 3 1.2.2 Notation for Numbers. 3 1.2.3 Informational Notes . 3 1.3 Reporting Errors in this Specification . 4 2 Definitions . 5 3 Architecture Overview. 15 3.1 The UltraSPARC Architecture 2007. 15 3.1.1 Features. 15 3.1.2 Attributes . 16 3.1.2.1 Design Goals .
    [Show full text]
  • Processor Architectures
    CS143 Handout 18 Summer 2008 30 July, 2008 Processor Architectures Handout written by Maggie Johnson and revised by Julie Zelenski. Architecture Vocabulary Let’s review a few relevant hardware definitions: register: a storage location directly on the CPU, used for temporary storage of small amounts of data during processing. memory: an array of randomly accessible memory bytes each identified by a unique address. Flat memory models, segmented memory models, and hybrid models exist which are distinguished by the way locations are referenced and potentially divided into sections. instruction set: the set of instructions that are interpreted directly in the hardware by the CPU. These instructions are encoded as bit strings in memory and are fetched and executed one by one by the processor. They perform primitive operations such as "add 2 to register i1", "store contents of o6 into memory location 0xFF32A228", etc. Instructions consist of an operation code (opcode) e.g., load, store, add, etc., and one or more operand addresses. CISC: Complex instruction set computer. Older processors fit into the CISC family, which means they have a large and fancy instruction set. In addition to a set of common operations, the instruction set has special purpose instructions that are designed for limited situations. CISC processors tend to have a slower clock cycle, but accomplish more in each cycle because of the sophisticated instructions. In writing an effective compiler back-end for a CISC processor, many issues revolve around recognizing how to make effective use of the specialized instructions. RISC: Reduced instruction set computer. Many modern processors are in the RISC family, which means they have a relatively lean instruction set, containing mostly simple, general-purpose instructions.
    [Show full text]
  • Computer Architecture
    Computer Architecture Adrian Crăciun January 9, 2018 1 Contents 1 Computer Architecture - Overview and Motivation 6 1.1 The Structured Organization of Computers . 6 1.2 Milestones in Computer Architecture . 14 1.3 The Computer Zoo . 21 1.4 Computer Families . 26 2 Computer Systems Organization 30 2.1 Processors . 31 2.2 Primary Memory / Secondary Memory / Input/Output (Old Slides) 40 3 The Digital Logic Level 73 3.1 Gates and Boolean Algebra . 73 3.2 Basic Digital Logic Circuits . 80 3.3 Memory . 88 3.4 CPU Chips and Buses . 96 3.5 Example CPUs . 102 4 The Microarchitecture Level 109 4.1 An Example Microarchitecture . 109 4.2 An Example ISA: IJVM . 116 4.3 Implementation of the Instruction Set . 122 4.4 Designing the Microarchitecture Level . 127 4.5 Improving Performance . 135 4.6 Example Microarchitectures . 141 5 The Instruction Set Architecture Level 144 5.1 Overview of the Instruction Set Architecture Level . 144 5.2 Memory models . 146 5.3 Registers . 147 5.4 Data Types . 150 5.5 Instruction Formats . 152 5.6 Addressing . 155 5.7 Instruction types . 158 5.8 Flow of control . 162 5.9 Example ISAs . 166 5.10 Comparison of the Instruction Sets . 167 6 The Operating System Machine Level 170 6.1 Virtual Memory . 170 6.2 Virtual I/O Instructions . 172 6.3 Virtual Instructions for Parallel Processes . 173 6.4 Example Operating Systems . 175 7 The Assembly Language Level 177 2 List of Figures 1 Moving between language levels. 7 2 A multilevel machine. 9 3 A multilevel machine with 6 levels.
    [Show full text]
  • OVERLAPPED REGISTER WINDOWS CISC Versus RISC
    What is RISC? OVERLAPPED REGISTER WINDOWS CISC versus RISC What is RISC? • RISC? RISC, or Reduced Instruction Set Computer. is a type of microprocessor architecture that utilizes a small, highly-optimized set of instructions, rather than a more specialized set of instructions often found in other types of architectures. • History The first RISC projects came from IBM, Stanford, and UC-Berkeley in the late 70s and early 80s. The IBM 801, Stanford MIPS, and Berkeley RISC 1 and 2 were all designed with a similar philosophy which has become known as RISC. Certain design features have been characteristic of most RISC processors: – one cycle execution time: RISC processors have a CPI (clock per instruction) of one cycle. This is due to the optimization of each instruction on the CPU and a technique called PIPELINING – pipelining: a techique that allows for simultaneous execution of parts, or stages, of instructions to more efficiently process instructions; – large number of registers: the RISC design philosophy generally incorporates a larger number of registers to prevent in large amounts of interactions with memory Reduced Instruction Set Computer (RISC) lMajor characteristics of a RISC architecture »1) Relatively few instructions »2) Relatively few addressing modes »3) Memory access limited to load and store instruction »4) All operations done within the registers of the CPU »5) Fixed-length, easily decoded instruction format »6) Single-cycle instruction execution »7) Hardwired rather than microprogrammed control – RISC Instruction • Only use LOAD and STORE instruction when communicating between memory and CPU • All other instructions are executed within the registers of the CPU without referring to memory Program to evaluate X = ( A + B ) * ( C + D ) R1 M[A] LOAD R1, A R2 M[B] LOAD R2, B R3 M[C] LOAD R3, C R4 M[D] LOAD R4, D R1 R1 R2 ADD R1, R1, R2 R3 R3 R4 ADD R3, R3, R4 R1 R1 R3 MUL R1, R1, R3 M[X ] R1 STORE X, R1 •Load instruction transfers the operand from memory to CPU Register.
    [Show full text]
  • The Design of a Reduced Instruction Set Computer Using a Silicon Compiler
    AN ABSTRACT OF THE THESIS OF Licheng Zhang for the degree of Master of Science in Electrical and Computer Engineering presented on June 7, 1989. Title: The Design of A Reduced Instruction Set Computer UsingA Silicon Compiler. Redacted for Privacy Abstract appro ed- John Muff The objective of this thesisisto describe the design and implementation of a VSLI reduced instruction set computer(RISC). The RISC machine constitutes a new style of computerarchitecture. Itdifferssignificantly from the complex instructionset computer architectures(CISC)ofthepast.RISCarchitecturesare characterized by their high performance, simple instructionsets, minimal hardware requirements, and their abilityto support block structured programming languages adequately. In this thesis a 16-bit single chip RISC was designed using the Genesil Silicon Compiler. It has 14 instructions, anoverlapped register window structure,and on chip memory.Itcan execute most instructions in a single clock cycle,including procedure calls and returns. The peak performance of this chip is approximately6 MIPS. The chip was implemented in 2 micron CMOS technology. The chip sizeis 516.54 X 514.27 mils. This chip has not been fabricated. THE DESIGN OF A REDUCED INSTRUCTION SET COMPUTER USING A SILICON COMPLIER By Licheng Zhang A THESIS submittedto Oregon State University in partial fulfillment of the requirement for the degree of Master of Science Completed June 7, 1989 Commencement June,1990. APPROVED: Redacted for Privacy ssor of Electrical and Comp ter gineering in charge of major Redacted for Privacy Head of Department of Electrical and Computer Engineering Redacted for Privacy Dean of Gradua eSchool Date thesis is presented: June 7, 1989. TABLE OF CONTENTS 1.
    [Show full text]
  • Using As the Gnu Assembler
    Using as The gnu Assembler Version 2.14.90.0.7 The Free Software Foundation Inc. thanks The Nice Computer Company of Australia for loaning Dean Elsner to write the first (Vax) version of as for Project gnu. The proprietors, management and staff of TNCCA thank FSF for distracting the boss while they got some work done. Dean Elsner, Jay Fenlason & friends Using as Edited by Cygnus Support Copyright c 1991, 92, 93, 94, 95, 96, 97, 98, 99, 2000, 2001, 2002 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included in the section entitled \GNU Free Documentation License". Chapter 1: Overview 1 1 Overview This manual is a user guide to the gnu assembler as. Here is a brief summary of how to invoke as. For details, see Chapter 2 [Command-Line Options], page 15. as [-a[cdhlns][=file]] [-D][{defsym sym=val] [-f][{gstabs][{gstabs+] [{gdwarf2][{help] [-I dir][-J][-K][-L] [{listing-lhs-width=NUM][{listing-lhs-width2=NUM] [{listing-rhs-width=NUM][{listing-cont-lines=NUM] [{keep-locals][-o objfile][-R][{statistics][-v] [-version][{version][-W][{warn][{fatal-warnings] [-w][-x][-Z][{target-help][target-options] [{|files ...] Target Alpha options: [-mcpu] [-mdebug | -no-mdebug] [-relax][-g][-Gsize] [-F][-32addr] Target ARC options: [-marc[5|6|7|8]] [-EB|-EL] Target ARM
    [Show full text]