Ultrasparc Architecture 2007

Total Page:16

File Type:pdf, Size:1020Kb

Ultrasparc Architecture 2007 UltraSPARC Architecture 2007 One Architecture ... Multiple Innovative Implementations Draft D0.9.4, 27 Sep 2010 Privilege Levels: Hyperprivileged, Privileged, and Nonprivileged Distribution: Public Part No.No: 950-5554-15 ReleaseRevision: 1.0, Draft 2002 D0.9.4, 27 Sep 2010 Oracle Corporation 4150 Network Circle Santa Clara, CA 95054 U.S.A. 650-960-1300 ii UltraSPARC Architecture 2007 • Draft D0.9.4, 27 Sep 2010 Copyright © 2007, 2011, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. UNIX is a registered trademark licensed through X/Open Company, Ltd.. Comments and "bug reports” regarding this document are welcome; they should be submitted to email address: [email protected] iv UltraSPARC Architecture 2007 • Draft D0.9.4, 27 Sep 2010 Contents Preface. i 1 Document Overview . 1 1.1 Navigating UltraSPARC Architecture 2007 . 1 1.2 Fonts and Notational Conventions . 2 1.2.1 Implementation Dependencies . 3 1.2.2 Notation for Numbers. 3 1.2.3 Informational Notes . 3 1.3 Reporting Errors in this Specification . 4 2 Definitions . 5 3 Architecture Overview. 15 3.1 The UltraSPARC Architecture 2007. 15 3.1.1 Features. 15 3.1.2 Attributes . 16 3.1.2.1 Design Goals . 17 3.1.2.2 Register Windows . 17 3.1.3 System Components . 17 3.1.3.1 Binary Compatibility. 17 3.1.3.2 UltraSPARC Architecture 2007 MMU . 17 3.1.3.3 Privileged Software . 17 3.1.4 Architectural Definition . 18 3.1.5 UltraSPARC Architecture 2007 Compliance with SPARC V9 Architecture 18 3.1.6 Implementation Compliance with UltraSPARC Architecture 2007 18 3.2 Processor Architecture . 18 3.2.1 Integer Unit (IU) . 18 3.2.2 Floating-Point Unit (FPU). 19 3.3 Instructions. 19 3.3.1 Memory Access . 19 3.3.1.1 Memory Alignment Restrictions . 20 3.3.1.2 Addressing Conventions . 20 3.3.1.3 Addressing Range . 20 3.3.1.4 Load/Store Alternate . 20 3.3.1.5 Separate Instruction and Data Memories . 21 3.3.1.6 Input/Output (I/O) . 21 3.3.1.7 Memory Synchronization . 21 3.3.2 Integer Arithmetic / Logical / Shift Instructions . 21 3.3.3 Control Transfer . 22 3.3.4 State Register Access . 22 3.3.4.1 Ancillary State Registers. 22 3.3.4.2 PR State Registers . 22 i 3.3.4.3 HPR State Registers . 23 3.3.5 Floating-Point Operate. 23 3.3.6 Conditional Move . 23 3.3.7 Register Window Management. 23 3.3.8 SIMD. 23 3.4 Traps. .23 3.5 Chip-Level Multithreading (CMT) . 24 4 Data Formats. 25 4.1 Integer Data Formats . 26 4.1.1 Signed Integer Data Types . 26 4.1.1.1 Signed Integer Byte, Halfword, and Word. 27 4.1.1.2 Signed Integer Doubleword (64 bits) . 27 4.1.1.3 Signed Integer Extended-Word (64 bits) . 27 4.1.2 Unsigned Integer Data Types . 27 4.1.2.1 Unsigned Integer Byte, Halfword, and Word . 28 4.1.2.2 Unsigned Integer Doubleword (64 bits). 28 4.1.2.3 Unsigned Extended Integer (64 bits) . 28 4.1.3 Tagged Word (32 bits). 28 4.2 Floating-Point Data Formats . 29 4.2.1 Floating Point, Single Precision (32 bits) . 29 4.2.2 Floating Point, Double Precision (64 bits) . 29 4.2.3 Floating Point, Quad Precision (128 bits). 30 4.2.4 Floating-Point Data Alignment in Memory and Registers . 31 4.3 SIMD Data Formats . 31 4.3.1 Uint8 SIMD Data Format . 32 4.3.2 Int16 SIMD Data Formats . 32 4.3.3 Int32 SIMD Data Format . 32 5 Registers . 33 5.1 Reserved Register Fields . 34 5.2 General-Purpose R Registers. 35 5.2.1 Global R Registers. 36 5.2.2 Windowed R Registers. 36 5.2.3 Special R Registers . 39 5.3 Floating-Point Registers . 40 5.3.1 Floating-Point Register Number Encoding . 42 5.3.2 Double and Quad Floating-Point Operands . 43 5.4 Floating-Point State Register (FSR) . 44 5.4.1 Floating-Point Condition Codes (fcc0, fcc1, fcc2, fcc3). 44 5.4.2 Rounding Direction (rd) . 45 5.4.3 Trap Enable Mask (tem) . 45 5.4.4 Nonstandard Floating-Point (ns) . 45 5.4.5 FPU Version (ver) . 45 5.4.6 Floating-Point Trap Type (ftt). 46 5.4.7 Accrued Exceptions (aexc) . 48 5.4.8 Current Exception (cexc) . 48 5.4.9 Floating-Point Exception Fields . 49 5.4.10 FSR Conformance . 50 5.5 Ancillary State Registers . 50 5.5.1 32-bit Multiply/Divide Register (Y) (ASR 0) . 52 5.5.2 Integer Condition Codes Register (CCR) (ASR 2) . 52 5.5.2.1 Condition Codes (CCR.xcc and CCR.icc) . 52 5.5.3 Address Space Identifier (ASI) Register (ASR 3). 53 5.5.4 Tick (TICK) Register (ASR 4) . 54 5.5.5 Program Counters (PC, NPC) (ASR 5) . 55 ii UltraSPARC Architecture 2007 • Draft D0.9.4, 27 Sep 2010 5.5.6 Floating-Point Registers State (FPRS) Register (ASR 6). 55 5.5.7 General Status Register (GSR) (ASR 19). 56 5.5.8 SOFTINTP Register (ASRs 20, 21, 22) . ..
Recommended publications
  • Risc I: a Reduced Instruction Set Vlsi Computer
    RISC I: A REDUCED INSTRUCTION SET VLSI COMPUTER DAVID A. PATTERSON and CARLO H. SEQUIN Computer Science Division University of California Berkeley, California ABSTRACT to implement CISC is the best way to use this “scarce” resource. The Reduced Instruction Set Computer (RISC) Project investigates an alternatrve to the general trend toward computers wrth increasingly complex instruction sets: With a The above findings led to the Reduced Instruction Set proper set of instructions and a corresponding architectural Computer (RISC) Project. The purpose of the project is design, a machine wrth a high effective throughput can be to explore alternatives to the general trend toward achieved. The simplicity of the instruction set and addressing architectural complexity. The hypothesis is that by modes allows most Instructions to execute in a single machine cycle, and the srmplicity of each instruction guarantees a short reducing the instruction set, VLSI architecture can be cycle time. In addition, such a machine should have a much designed that uses the scarce resources more effectively shorter design trme. than CISC. We also expect this approach to reduce design time, the number of design errors, and the This paper presents the architecture of RISC I and its novel execution time of individual instructions. hardware support scheme for procedure call/return. Overlapprng sets of regrster banks that can pass parameters directly to subrouttnes are largely responsible for the excellent Our initial version of such a computer is called RISC I. performance of RISC I. Static and dynamtc comparisons To meet our goals of simplicity and effective single-chip between this new architecture and more traditional machines implementation, we placed the following “constraints” are given.
    [Show full text]
  • Computer Architecture and Assembly Language
    Computer Architecture and Assembly Language Gabriel Laskar EPITA 2015 License I Copyright c 2004-2005, ACU, Benoit Perrot I Copyright c 2004-2008, Alexandre Becoulet I Copyright c 2009-2013, Nicolas Pouillon I Copyright c 2014, Joël Porquet I Copyright c 2015, Gabriel Laskar Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with the Invariant Sections being just ‘‘Copying this document’’, no Front-Cover Texts, and no Back-Cover Texts. Introduction Part I Introduction Gabriel Laskar (EPITA) CAAL 2015 3 / 378 Introduction Problem definition 1: Introduction Problem definition Outline Gabriel Laskar (EPITA) CAAL 2015 4 / 378 Introduction Problem definition What are we trying to learn? Computer Architecture What is in the hardware? I A bit of history of computers, current machines I Concepts and conventions: processing, memory, communication, optimization How does a machine run code? I Program execution model I Memory mapping, OS support Gabriel Laskar (EPITA) CAAL 2015 5 / 378 Introduction Problem definition What are we trying to learn? Assembly Language How to “talk” with the machine directly? I Mechanisms involved I Assembly language structure and usage I Low-level assembly language features I C inline assembly Gabriel Laskar (EPITA) CAAL 2015 6 / 378 I Programmers I Wise managers Introduction Problem definition Who do I talk to? I System gurus I Low-level enthusiasts Gabriel Laskar (EPITA) CAAL
    [Show full text]
  • VME for Experiments Chairman: Junsei Chiba (KEK)
    KEK Report 89-26 March 1990 D PROCEEDINGS of SYMPOSIUM on Data Acquisition and Processing for Next Generation Experiments 9 -10 March 1989 KEK, Tsukuba Edited by H. FUJII, J. CHIBA and Y. WATASE NATIONAL LABORATORY FOR HIGH ENERGY PHYSICS PROCEEDINGS of SYMPOSIUM on Data Acquisition and Processing for Next Generation Experiments 9 - 10 March 1989 KEK, Tsukuba Edited H. Fiflii, J. Chiba andY. Watase i National Laboratory for High Energy Physics, 1990 KEK Reports are available from: Technical Infonnation&Libraiy National Laboratory for High Energy Physics 1-1 Oho, Tsukuba-shi Ibaraki-ken, 305 JAPAN Phone: 0298-64-1171 Telex: 3652-534 (Domestic) (0)3652-534 (International) Fax: 0298-64-4604 Cable: KEKOHO Foreword This symposium has been organized to foresee the next generation of data acquisition and processing system in high energy physics and nuclear physics experiments. The recent revolutionary progress in the semiconductor and computer technologies is giving us an oppotunity to extend our idea on the experiments. The high density electronics of LSI technology provides an ideal front-end electronics such as readout circuits for silicon strip detector and multi-anode phototubes as well as wire chambers. The VLSI technology has advantages over the obsolite discrete one in the various aspects ; reduction of noise, small propagation delay, lower power dissipation, small space for the installation, improvement of the system reliability and maintenability. The small sized front-end electronics will be mounted just on the detector and the digital data might be transfered off the detector to the computer room with optical fiber data transmission lines. Then, a monster of bandies of signal cables might disappear from the experimental area.
    [Show full text]
  • MIPS Assembly Language Programming Using Qtspim
    MIPS Assembly Language Programming using QtSpim Ed Jorgensen, Ph.D. Version 1.1.50 July 2019 Cover image: MIPS R3000 Custom Chip http://commons.wikimedia.org/wiki/File:RCP-NUS_01.jpg Spim is copyrighted by James Larus and distributed under a BSD license. Copyright (c) 1990-2011, James R. Larus. All rights reserved. Copyright © 2013, 2014, 2015, 2016, 2017 by Ed Jorgensen You are free: To Share — to copy, distribute and transmit the work To Remix — to adapt the work Under the following conditions: Attribution — you must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). Noncommercial — you may not use this work for commercial purposes. Share Alike — if you alter, transform, or build upon this work, you may distribute the resulting work only under the same or similar license to this one. Table of Contents 1.0 Introduction...........................................................................................................1 1.1 Additional References.........................................................................................1 2.0 MIPS Architecture Overview..............................................................................3 2.1 Architecture Overview........................................................................................3 2.2 Data Types/Sizes.................................................................................................4 2.3 Memory...............................................................................................................4
    [Show full text]
  • Design of the RISC-V Instruction Set Architecture
    Design of the RISC-V Instruction Set Architecture Andrew Waterman Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2016-1 http://www.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-1.html January 3, 2016 Copyright © 2016, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Design of the RISC-V Instruction Set Architecture by Andrew Shell Waterman A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor David Patterson, Chair Professor Krste Asanovi´c Associate Professor Per-Olof Persson Spring 2016 Design of the RISC-V Instruction Set Architecture Copyright 2016 by Andrew Shell Waterman 1 Abstract Design of the RISC-V Instruction Set Architecture by Andrew Shell Waterman Doctor of Philosophy in Computer Science University of California, Berkeley Professor David Patterson, Chair The hardware-software interface, embodied in the instruction set architecture (ISA), is arguably the most important interface in a computer system. Yet, in contrast to nearly all other interfaces in a modern computer system, all commercially popular ISAs are proprietary.
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Processor Architectures
    CS143 Handout 18 Summer 2008 30 July, 2008 Processor Architectures Handout written by Maggie Johnson and revised by Julie Zelenski. Architecture Vocabulary Let’s review a few relevant hardware definitions: register: a storage location directly on the CPU, used for temporary storage of small amounts of data during processing. memory: an array of randomly accessible memory bytes each identified by a unique address. Flat memory models, segmented memory models, and hybrid models exist which are distinguished by the way locations are referenced and potentially divided into sections. instruction set: the set of instructions that are interpreted directly in the hardware by the CPU. These instructions are encoded as bit strings in memory and are fetched and executed one by one by the processor. They perform primitive operations such as "add 2 to register i1", "store contents of o6 into memory location 0xFF32A228", etc. Instructions consist of an operation code (opcode) e.g., load, store, add, etc., and one or more operand addresses. CISC: Complex instruction set computer. Older processors fit into the CISC family, which means they have a large and fancy instruction set. In addition to a set of common operations, the instruction set has special purpose instructions that are designed for limited situations. CISC processors tend to have a slower clock cycle, but accomplish more in each cycle because of the sophisticated instructions. In writing an effective compiler back-end for a CISC processor, many issues revolve around recognizing how to make effective use of the specialized instructions. RISC: Reduced instruction set computer. Many modern processors are in the RISC family, which means they have a relatively lean instruction set, containing mostly simple, general-purpose instructions.
    [Show full text]
  • Computer Architecture
    Computer Architecture Adrian Crăciun January 9, 2018 1 Contents 1 Computer Architecture - Overview and Motivation 6 1.1 The Structured Organization of Computers . 6 1.2 Milestones in Computer Architecture . 14 1.3 The Computer Zoo . 21 1.4 Computer Families . 26 2 Computer Systems Organization 30 2.1 Processors . 31 2.2 Primary Memory / Secondary Memory / Input/Output (Old Slides) 40 3 The Digital Logic Level 73 3.1 Gates and Boolean Algebra . 73 3.2 Basic Digital Logic Circuits . 80 3.3 Memory . 88 3.4 CPU Chips and Buses . 96 3.5 Example CPUs . 102 4 The Microarchitecture Level 109 4.1 An Example Microarchitecture . 109 4.2 An Example ISA: IJVM . 116 4.3 Implementation of the Instruction Set . 122 4.4 Designing the Microarchitecture Level . 127 4.5 Improving Performance . 135 4.6 Example Microarchitectures . 141 5 The Instruction Set Architecture Level 144 5.1 Overview of the Instruction Set Architecture Level . 144 5.2 Memory models . 146 5.3 Registers . 147 5.4 Data Types . 150 5.5 Instruction Formats . 152 5.6 Addressing . 155 5.7 Instruction types . 158 5.8 Flow of control . 162 5.9 Example ISAs . 166 5.10 Comparison of the Instruction Sets . 167 6 The Operating System Machine Level 170 6.1 Virtual Memory . 170 6.2 Virtual I/O Instructions . 172 6.3 Virtual Instructions for Parallel Processes . 173 6.4 Example Operating Systems . 175 7 The Assembly Language Level 177 2 List of Figures 1 Moving between language levels. 7 2 A multilevel machine. 9 3 A multilevel machine with 6 levels.
    [Show full text]
  • OVERLAPPED REGISTER WINDOWS CISC Versus RISC
    What is RISC? OVERLAPPED REGISTER WINDOWS CISC versus RISC What is RISC? • RISC? RISC, or Reduced Instruction Set Computer. is a type of microprocessor architecture that utilizes a small, highly-optimized set of instructions, rather than a more specialized set of instructions often found in other types of architectures. • History The first RISC projects came from IBM, Stanford, and UC-Berkeley in the late 70s and early 80s. The IBM 801, Stanford MIPS, and Berkeley RISC 1 and 2 were all designed with a similar philosophy which has become known as RISC. Certain design features have been characteristic of most RISC processors: – one cycle execution time: RISC processors have a CPI (clock per instruction) of one cycle. This is due to the optimization of each instruction on the CPU and a technique called PIPELINING – pipelining: a techique that allows for simultaneous execution of parts, or stages, of instructions to more efficiently process instructions; – large number of registers: the RISC design philosophy generally incorporates a larger number of registers to prevent in large amounts of interactions with memory Reduced Instruction Set Computer (RISC) lMajor characteristics of a RISC architecture »1) Relatively few instructions »2) Relatively few addressing modes »3) Memory access limited to load and store instruction »4) All operations done within the registers of the CPU »5) Fixed-length, easily decoded instruction format »6) Single-cycle instruction execution »7) Hardwired rather than microprogrammed control – RISC Instruction • Only use LOAD and STORE instruction when communicating between memory and CPU • All other instructions are executed within the registers of the CPU without referring to memory Program to evaluate X = ( A + B ) * ( C + D ) R1 M[A] LOAD R1, A R2 M[B] LOAD R2, B R3 M[C] LOAD R3, C R4 M[D] LOAD R4, D R1 R1 R2 ADD R1, R1, R2 R3 R3 R4 ADD R3, R3, R4 R1 R1 R3 MUL R1, R1, R3 M[X ] R1 STORE X, R1 •Load instruction transfers the operand from memory to CPU Register.
    [Show full text]
  • The Design of a Reduced Instruction Set Computer Using a Silicon Compiler
    AN ABSTRACT OF THE THESIS OF Licheng Zhang for the degree of Master of Science in Electrical and Computer Engineering presented on June 7, 1989. Title: The Design of A Reduced Instruction Set Computer UsingA Silicon Compiler. Redacted for Privacy Abstract appro ed- John Muff The objective of this thesisisto describe the design and implementation of a VSLI reduced instruction set computer(RISC). The RISC machine constitutes a new style of computerarchitecture. Itdifferssignificantly from the complex instructionset computer architectures(CISC)ofthepast.RISCarchitecturesare characterized by their high performance, simple instructionsets, minimal hardware requirements, and their abilityto support block structured programming languages adequately. In this thesis a 16-bit single chip RISC was designed using the Genesil Silicon Compiler. It has 14 instructions, anoverlapped register window structure,and on chip memory.Itcan execute most instructions in a single clock cycle,including procedure calls and returns. The peak performance of this chip is approximately6 MIPS. The chip was implemented in 2 micron CMOS technology. The chip sizeis 516.54 X 514.27 mils. This chip has not been fabricated. THE DESIGN OF A REDUCED INSTRUCTION SET COMPUTER USING A SILICON COMPLIER By Licheng Zhang A THESIS submittedto Oregon State University in partial fulfillment of the requirement for the degree of Master of Science Completed June 7, 1989 Commencement June,1990. APPROVED: Redacted for Privacy ssor of Electrical and Comp ter gineering in charge of major Redacted for Privacy Head of Department of Electrical and Computer Engineering Redacted for Privacy Dean of Gradua eSchool Date thesis is presented: June 7, 1989. TABLE OF CONTENTS 1.
    [Show full text]
  • Using As the Gnu Assembler
    Using as The gnu Assembler Version 2.14.90.0.7 The Free Software Foundation Inc. thanks The Nice Computer Company of Australia for loaning Dean Elsner to write the first (Vax) version of as for Project gnu. The proprietors, management and staff of TNCCA thank FSF for distracting the boss while they got some work done. Dean Elsner, Jay Fenlason & friends Using as Edited by Cygnus Support Copyright c 1991, 92, 93, 94, 95, 96, 97, 98, 99, 2000, 2001, 2002 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included in the section entitled \GNU Free Documentation License". Chapter 1: Overview 1 1 Overview This manual is a user guide to the gnu assembler as. Here is a brief summary of how to invoke as. For details, see Chapter 2 [Command-Line Options], page 15. as [-a[cdhlns][=file]] [-D][{defsym sym=val] [-f][{gstabs][{gstabs+] [{gdwarf2][{help] [-I dir][-J][-K][-L] [{listing-lhs-width=NUM][{listing-lhs-width2=NUM] [{listing-rhs-width=NUM][{listing-cont-lines=NUM] [{keep-locals][-o objfile][-R][{statistics][-v] [-version][{version][-W][{warn][{fatal-warnings] [-w][-x][-Z][{target-help][target-options] [{|files ...] Target Alpha options: [-mcpu] [-mdebug | -no-mdebug] [-relax][-g][-Gsize] [-F][-32addr] Target ARC options: [-marc[5|6|7|8]] [-EB|-EL] Target ARM
    [Show full text]
  • Performance Portable Short Vector Transforms
    Dissertation Performance Portable Short Vector Transforms ausgefuhrt¨ zum Zwecke der Erlangung des akademischen Grades eines Doktors der technischen Wissenschaften unter der Leitung von Ao. Univ.-Prof. Dipl.-Ing. Dr. techn. Christoph W. Uberhuber¨ E115 – Institut fur¨ Angewandte und Numerische Mathematik eingereicht an der Technischen Universit¨at Wien Fakult¨at fur¨ Technische Naturwissenschaften und Informatik von Dipl.-Ing. Franz Franchetti Matrikelnummer 9525993 Hartiggasse 3/602 2700 Wiener Neustadt Wien, am 7. J¨anner 2003 Kurzfassung In dieser Dissertation wird eine mathematische Methode entwickelt, die automati- sche Leistungsoptimierung von Programmen zur Berechnung von diskreten linearen Transformationen fur¨ Prozessoren mit Multimedia-Vektorerweiterungen (short vector SIMD extensions) erm¨oglicht, wobei besonderes Gewicht auf die diskrete Fourier- Transformation (DFT) gelegt wird. Die neuentwickelte Methode basiert auf dem Kronecker-Produkt-Formalismus, der erweitert wurde, um die spezifischen Eigenschaf- ten von Multimedia-Vektorerweiterungen abzubilden. Es wurde auch eine speziell ange- paßte Cooley-Tukey-FFT-Variante1 entwickelt, die sowohl fur¨ Vektorl¨angen der Form N =2k als auch fur¨ allgemeinere Problemgr¨oßen anwendbar ist. Die neuentwickelte Methode wurde als Erweiterung fur¨ Spiral2 und Fftw3,die derzeitigen Top-Systeme im Bereich der automatischen Leistungsoptimierung fur¨ dis- krete lineare Transformationen, getestet. Sie erlaubt es, extrem schnelle Programme zur Berechnung der DFT zu erzeugen, welche die derzeit schnellsten Programme zur Berechnung der DFT auf Intel-Prozessoren mit den Multimedia-Vektorerweiterungen “Streaming SIMD Extensions” (SSE und SSE 2) sind. Sie sind schneller als die ent- sprechenden Programme aus der manuell optimierten Intel-Softwarebibliothek MKL (Math Kernel Library). Zus¨atzlich wurden die bisher ersten und einzigen automatisch leistungsoptimierten Programme zur Berechnung der Walsh-Hadamard-Transformation und fur¨ zwei-dimensionale Kosinus-Transformationen erzeugt.
    [Show full text]