6X86 PROCESSOR

Total Page:16

File Type:pdf, Size:1020Kb

6X86 PROCESSOR Abbreviated Data Book Version 1.1 Contains selected pages from: 6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU Advancing the Standards Introduction ¨ Sixth-Generation Superscalar ¨ Best-in-Class Performance Superpipelined Architecture Through Superior Architecture - Dual 7-stage integer pipelines - Intelligent instruction dispatch - High performance 80-bit FPU with 64-bit interface - Register renaming - Operating frequencies of 100,110, 120, 133 MHz - Out-of-order completion - 16-KByte unified write-back L1 cache - Data dependency removal - Multi-branch prediction ¨ X86 Instruction Set Compatible - Speculative execution - Runs Windows 95, Windows 3.x, Windows NT, ¨ 64-Bit Data Bus DOS, UNIX, OS/2, Solaris, and others - P54C socket compatible - Optimized to run both 16-bit and 32-bit - Supports “one-plus-four” and linear burst modes software applications The Cyrix 6x86™ processor is a superscalar, and an on-chip floating point unit. The superpipelined superpipelined sixth generation CPU that offers the architecture reduces timing constraints and increases highest level of performance available for desktop frequency scalability to 150 MHz and beyond. personal computers. Optimized to run both 16-bit and Additionally, the integer and floating point units are 32-bit software applications, the 6x86 processor is fully optimized for maximum instruction throughput by compatible with the x86 instruction set and delivers using advanced architectural techniques including industry-leading performance running Windows® 95, register renaming, out-of-order completion, data Windows, Windows NT, OS/2®, DOS Solaris, UNIX® dependency removal, branch prediction and and other operating systems. speculative execution. These design innovations eliminate many data dependencies and resource The 6x86 processor achieves top performance through conflicts to achieve high performance when executing the use of two optimized superpipelined integer units existing 16-bit and future 32-bit software applications. Instruction Address IF ID1 Sequence Instruction Data 128 32 Control ID2 ID2 Lines AC1 AC1 Address X Data A31-A3 AC2 AC2 Bus 32 BE7#-BE0# EX EX 256-Byte Instruction Interface 32 WB WB Line Cache Unit FPU OpCode X Pipe Y Pipe Y Data Floating Point Queue Integer Unit 32 16- KByte Unified Cache Data 32 32 D63-D0 Cache Unit 64 64 Floating Point X Linear Y Linear Processor Address Address CLK Floating Point Unit 64 Y Physical 32 X Physical Address Address Control Memory Management Unit 32 Bus Interface 1738502 PRELIMINARY March 1996 Order Number: 94175-01 1- 6x86 PROCESSOR Superscalar, Superpipelined, Sixth-generation, x86 Compatible CPU Advancing the Standards Introduction Product Overview 1. ARCHITECTURE The on-chip FPU allows floating point instruc- OVERVIEW tions to execute in parallel with integer instructions and features a 64-bit data inter- The Cyrix 6x86 CPU is a leader in the sixth face. The FPU incorporates a four-deep generation of high performance, x86-compat- instruction queue and a four-deep store queue ible processors. Increased performance is to facilitate parallel execution. accomplished by the use of superscalar and superpipelined design techniques. The 6x86 CPU operates from a 3.3 volt power supply resulting in reasonable power The 6x86 CPU is superscalar in that it contains consumption at all frequencies. In addition, two separate pipelines that allow multiple the 6x86 CPU incorporates a low power instructions to be processed at the same time. suspend mode, stop clock capability, and The use of advanced processing technology system management mode (SMM) for power and the increased number of pipeline stages sensitive applications. (superpipelining) allow the 6x86 CPU to achieve clocks rates of 100 MHz and above. Through the use of unique architectural 1.1 Major Functional features, the 6x86 processor eliminates many Blocks data dependencies and resource conflicts, resulting in optimal performance for both The 6x86 processor consists of five major 16-bit and 32-bit x86 software. functional blocks, as shown in the overall block diagram on the first page of this manual: The 6x86 CPU contains two caches: a 16-KByte dual-ported unified cache and a • Integer Unit 256-byte instruction line cache. Since the • Cache Unit unified cache can store instructions and data in • Memory Management Unit any ratio, the unified cache offers a higher hit • Floating Point Unit rate than separate data and instruction caches • Bus Interface Unit of equal size. An increase in overall cache-to-integer unit bandwidth is achieved by Instructions are executed in the X and Y pipe- supplementing the unified cache with a small, lines within the Integer Unit and also in the high-speed, fully associative instruction line Floating Point Unit (FPU). The Cache Unit cache. The inclusion of the instruction line stores the most recently used data and instruc- cache avoids excessive conflicts between code and data accesses in the unified cache. PRELIMINARY 1-1 Integer Unit Advancing the Standards tions to allow fast access to the information 1.2 Integer Unit by the Integer Unit and FPU. The Integer Unit (Figure 1-1) provides parallel Physical addresses are calculated by the instruction execution using two seven-stage Memory Management Unit and passed to integer pipelines. Each of the two pipelines, the Cache Unit and the Bus Interface Unit X and Y, can process several instructions (BIU). The BIU provides the interface simultaneously. between the external system board and the processor’s internal execution units. Instruction Fetch Instruction Decode 1 In-Order Instruction Instruction Processing Decode 2 Decode 2 Address Address Calculation 1 Calculation 1 Address Address Calculation 2 Calculation 2 Execution Execution Out-of-Order Completion Write-Back Write-Back X Pipeline Y Pipeline 1727301 Figure 1-1. Integer Unit 1-2 PRELIMINARY Integer Unit 1 The Integer Unit consists of the following already in each pipeline and how fast they are pipeline stages: expected to flow through the remaining pipe- line stages. • Instruction Fetch (IF) • Instruction Decode 1 (ID1) The Address Calculation function contains • Instruction Decode 2 (ID2) two stages, AC1 and AC2. If the instruction • Address Calculation 1 (AC1) refers to a memory operand, the AC1 calcu- • Address Calculation 2 (AC2) lates a linear memory address for the instruc- • Execute (EX) tion. • Write-Back (WB) The AC2 stage performs any required memory The instruction decode and address calcula- management functions, cache accesses, and tion functions are both divided into superpipe- register file accesses. If a floating point instruc- lined stages. tion is detected by AC2, the instruction is sent to the FPU for processing. 1.2.1 Pipeline Stages The Execute (EX) stage executes instructions The Instruction Fetch (IF) stage, shared by using the operands provided by the address both the X and Y pipelines, fetches 16 bytes of calculation stage. code from the cache unit in a single clock The Write-Back (WB) stage is the last IU cycle. Within this section, the code stream is stage. The WB stage stores execution results checked for any branch instructions that could either to a register file within the IU or to a affect normal program sequencing. write buffer in the cache control unit. If an unconditional or conditional branch is detected, branch prediction logic within the IF 1.2.2 Out-of-Order stage generates a predicted target address for Processing the instruction. The IF stage then begins If an instruction executes faster than the fetching instructions at the predicted address. previous instruction in the other pipeline, the The superpipelined Instruction Decode instructions may complete out of order. All function contains the ID1 and ID2 stages. instructions are processed in order, up to the ID1, shared by both pipelines, evaluates the EX stage. While in the EX and WB stages, code stream provided by the IF stage and instructions may be completed out of order. determines the number of bytes in each If there is a data dependency between two instruction. Up to two instructions per clock instructions, the necessary hardware interlocks are delivered to the ID2 stages, one in each are enforced to ensure correct program pipeline. execution. Even though instructions may The ID2 stages decode instructions and send complete out of order, exceptions and writes the decoded instructions to either the X or Y resulting from the instructions are always pipeline for execution. The particular pipeline issued in program order. is chosen, based on which instructions are PRELIMINARY 1-3 Integer Unit Advancing the Standards 1.2.3 Pipeline Selection 1.2.4 Data Dependency Solutions In most cases, instructions are processed in either pipeline and without pairing constraints When two instructions that are executing in on the instructions. However, certain instruc- parallel require access to the same data or tions are processed only in the X pipeline: register, one of the following types of data dependencies may occur: • Branch instructions • Floating point instructions • Read-After-Write (RAW) • Exclusive instructions • Write-After-Read (WAR) • Write-After-Write (WAW) Branch and floating point instructions may be paired with a second instruction in the Y pipe- Data dependencies typically force serialized line. execution of instructions. However, the 6x86 CPU implements three mechanisms that allow Exclusive Instructions cannot be paired with parallel execution of instructions containing instructions in the Y pipeline.
Recommended publications
  • Super 7™ Motherboard
    SY-5EH5/5EHM V1.0 Super 7Ô Motherboard ************************************************ Pentium® Class CPU supported ETEQ82C663 PCI/AGP Motherboard AT Form Factor ************************************************ User's Guide & Technical Reference NSTL “Year 2000 Test” Certification Letter September 23, 1998 Testing Date: September 23, 1998 Certification Date: September 23, 1998 Certification Number: NCY2000-980923-004 To Whom It May Concern: We are please to inform you that the “SY-5EHM/5EH5” system has passed NSTL Year 2000 certification test program. The Year 2000 test program tests a personal computer for its ability to support the year 2000. The “SY-5EHM/5EH5: system is eligible to carry the NSTL :Year 2000 Certification” seal. The Year 2000 certification test has been done under the following system configuration: Company Name : SOYO COMPUTER INC. System Model Name : SY-5EHM/5EH5 Hardware Revision : N/A CPU Model : Intel Pentium 200/66Mhz On Board Memory/L2 Cache : PC100 SDRAM DIMM 32MBx1 /1MB System BIOS : Award Modular BIOS V4.51PG, An Energy Star Ally Copyright © 1984—98, EH-1A6,07/15/1998-VP3-586B- 8669-2A5LES2AC-00 Best regards, SPORTON INTERNATIONAL INC. Declaration of Conformity According to 47 CFR, Part 2 and 15 of the FCC Rules Declaration No.: D872907 July.10 1998 The following designated product EQUIPMENT: Main Board MODEL NO.: SY-5EH Which is the Class B digital device complies with 47 CFR Parts 2 and 15 of the FCC rules. Operation is subject to the following two conditions : (1) this device may not cause harmful interference, and (2) this device must accept any interference received, including interference that may cause undesired operation.
    [Show full text]
  • Pentium 82430VX / P54C PCI Mainboard User’S Guide & Technical Reference 5V A0/A2/A5 Ii ¨ ª
    Pentium 82430VX / P54C PCI Mainboard User’s Guide & Technical Reference 5V A0/A2/A5 ii ® ™ About This Guide This UserÕs Guide is for assisting system manufacturers and end users in setting up and installing the mainboard. Information in this guide has been carefully checked for reliability; however, no guarantee is given as to the correctness of the contents. The information in this document is subject to change without notice. Copyright Notice Copyright 1997, Soyo Computer Inc. All rights reserved. This manual is copyrighted by Soyo Computer Inc. You may not reproduce, transmit, transcribe, store in a retrieval system, or translate into any language, in any form or by any means, electronic, mechanical, magnetic, optical, chemical, manual, or otherwise, any part of this publication without express written permission of Soyo Computer Inc. Trademarks Soyo is a registered trademark of Soyo Computer Inc. All trademarks are the property of their owners. Disclaimer Soyo Computer Inc. makes no representations or warranties regarding the contents of this manual. We reserve the right to revise the manual or make changes in the specifications of the product described within it at any time without notice and without obligation to notify any person of such revision or change. The information contained in this manual is provided for general use by our customers. Our customers should be aware that the personal computer field is the subject of many patents. Our customers should ensure that they take appropriate action so that their use of our products does not infringe upon any patents. It is the policy of Soyo Computer Inc.
    [Show full text]
  • Communication Theory II
    Microprocessor (COM 9323) Lecture 2: Review on Intel Family Ahmed Elnakib, PhD Assistant Professor, Mansoura University, Egypt Feb 17th, 2016 1 Text Book/References Textbook: 1. The Intel Microprocessors, Architecture, Programming and Interfacing, 8th edition, Barry B. Brey, Prentice Hall, 2009 2. Assembly Language for x86 processors, 6th edition, K. R. Irvine, Prentice Hall, 2011 References: 1. Computer Architecture: A Quantitative Approach, 5th edition, J. Hennessy, D. Patterson, Elsevier, 2012. 2. The 80x86 Family, Design, Programming and Interfacing, 3rd edition, Prentice Hall, 2002 3. The 80x86 IBM PC and Compatible Computers, Assembly Language, Design, and Interfacing, 4th edition, M.A. Mazidi and J.G. Mazidi, Prentice Hall, 2003 2 Lecture Objectives 1. Provide an overview of the various 80X86 and Pentium family members 2. Define the contents of the memory system in the personal computer 3. Convert between binary, decimal, and hexadecimal numbers 4. Differentiate and represent numeric and alphabetic information as integers, floating-point, BCD, and ASCII data 5. Understand basic computer terminology (bit, byte, data, real memory system, protected mode memory system, Windows, DOS, I/O) 3 Brief History of the Computers o1946 The first generation of Computer ENIAC (Electrical and Numerical Integrator and Calculator) was started to be used based on the vacuum tube technology, University of Pennsylvania o1970s entire CPU was put in a single chip. (1971 the first microprocessor of Intel 4004 (4-bit data bus and 2300 transistors and 45 instructions) 4 Brief History of the Computers (cont’d) oLate 1970s Intel 8080/85 appeared with 8-bit data bus and 16-bit address bus and used from traffic light controllers to homemade computers (8085: 246 instruction set, RISC*) o1981 First PC was introduced by IBM with Intel 8088 (CISC**: over 20,000 instructions) microprocessor oMotorola emerged with 6800.
    [Show full text]
  • Intel Prepares MMX CPU Wave After Weak 1996, AMD and Cyrix Ready 1997 Counterstrike
    MICRODESIGN R ESOURCES SPECIAL REPORT Intel Prepares MMX CPU Wave After Weak 1996, AMD and Cyrix Ready 1997 Counterstrike by Michael Slater Nineteen ninety six was a remarkably successful one for Intel and a difficult one for its competitors. It was a quiet year for Intel in terms of new products, but the company nevertheless increased both its market share and its gross margin—thanks, in large part, to weak performances from AMD and Cyrix. (See list on page 14 for a summary of the key events of 1996 and pointers to Microprocessor Report’s coverage.) This year promises to be far more turbulent. Not only is Intel intro- ducing two major new product lines—P55C (officially, Pentium Processor with MMX Technology) and Klamath—but AMD and Cyrix will each launch their own next-generation processors, which will be more formi- dable competitors to Intel’s line. Intel is sure to do well despite the stepped-up challenge, but its dominance of the market is unlikely to be quite as complete in 1997 as in 1996. Intel Shifting into In 1996, Intel’s microprocessor lineup changed little. After the introduction High Gear of the 150- and 166-MHz Pentiums in January, the only new products were the 200-MHz Pentium and 150-MHz Mobile Pentium. The P55C would have been a great kicker for fall sales, but Intel apparently didn’t have the part ready in time. Intel began producing P55C processors in the late fall but chose not to announce the part until January in order to minimize the impact on the Christmas selling season.
    [Show full text]
  • IDT Winchip 2A Data Sheet
    Preliminary Information PROCESSOR Version A Data Sheet Preliminary Information January 1999 IDT WINCHIP 2ATM PROCESSOR DATA SHEET This is Version 1.0 of the IDT WinChip 2 version A Processor data sheet. The latest versions of this data sheet may be obtained from www.winchip.com © 1999 Integrated Device Technology, Inc. All Rights Reserved Integrated Device Technology, Inc. (IDT) reserves the right to make changes in its products without notice in order to improve design or performance characteristics. This publication neither states nor implies any representations or warranties of any kind, including but not limited to any implied warranty of merchantability or fitness for a particular purpose. No license, express or implied, to any intellectual property rights is granted by this document. IDT makes no representations or warranties with respect to the accuracy or completeness of the contents of this publication or the information contained herein, and reserves the right to make changes at any time, without notice. IDT disclaims responsibility for any consequences resulting from the use of the information included herein. LIFE SUPPORT POLICY Integrated Device Technology's products are not authorized for use as components in life support or other medical devices or systems (hereinafter life support devices) unless a specific written agreement pertaining to such intended use is executed between the manufacturer and an officer of IDT. 1. Life support devices are devices which (a) are intended for surgical implant into the body or (b) support or sustain life and whose failure to perform, when properly used in accordance with instructions for use provided in the labeling, can be reasonably expected to result in a significant injury to the user.
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Brainiacs, Speed Demons, and Farewell; 12/29/1999 Page 1 of 2
    Brainiacs, Speed Demons, and Farewell; 12/29/1999 Page 1 of 2 Client Login Search MDR Home Vol 13, Issue 17 December 27, 1999 Brainiacs, Speed Demons, and Farewell Some Vendors Learn Later Than Others That Clock Speed Drives Performance As my final editorial for this august publication, I would like to reflect on how the industry has changed--and in some ways stayed the same--since one of my earliest editorials, discussing Brainiacs and Speed Demons (see MPR 3/8/93, p. 3). At that time, Digital's brand-new Alpha line, HP's PA-RISC, and the MIPS R4000 strove for high clock speeds, while IBM (Power), Sun (SuperSparc), and Motorola (88110) focused on high-IPC (instruction per cycle) designs. In 1993, Speed Demons used simple scalar or two-issue designs running at 100 to 200 MHz in state-of-the-art 0.8-micron IC processes; Brainiacs could issue three or four instructions per cycle but at no more than 66 MHz. In the subsequent seven years, better IC processes have greatly improved both the IPC and the cycle time of microprocessors, leading some vendors to claim to deliver the best of both worlds. But a chip becomes a Speed Demon through microarchitecture design philosophy, not IC process gains. The Speed Demon philosophy is best summed up by an Alpha designer who said that a processor's cycle time should be the minimum required to cycle an ALU and pass the result to the next instruction. The processor can implement any amount of complexity so long as it doesn't compromise this primary goal of ultimate speed.
    [Show full text]
  • View See for Instance [13])
    Towards Dynamic Execution Environment for System Security Protection against Hardware Flaws Kenneth Schmitzy Oliver Keszocze∗y Jurij Schmidt∗y Daniel Große∗y Rolf Drechsler∗y ∗Institute of Computer Science, University of Bremen, 28359 Bremen, Germany yCyber-Physical Systems, DFKI GmbH, 28359 Bremen, Germany {kenneth, keszocze, grosse, drechsler}@cs.uni-bremen.de Abstract—Attacks exploiting security flaws in software are complex, verification and test become more challenging and very common. They are typically addressed during the ongoing flaws can remain undiscovered prior to the fabrication. Powerful software development process or by providing software patches. instruction set extensions to the x86 Instruction Set Architecture Attacks making use of hardware related flaws via malicious soft- ware recently gained popularity. Prominent examples are errata- (ISA) have been recently reported to result in unpredictable based, aging-related or, for example, the infamous Rowhammer- behavior [5]. Undocumented features inside the ISA, which can attack. In this paper, we present an approach to detect software- cause unpredictable system behavior, have been revealed [6] based attacks which exploit hardware flaws. Since the flaws are as well. typically triggered by characteristic instruction sequences, our approach is implemented as a dynamic execution environment for The second category covers flaws which are inherited program monitoring at runtime. Several case studies underline from the feature sizes used to fabricate the components. The the effectiveness and the low overhead of our approach. Rowhammer-attack affects Random Access Memory (RAM) and Solid-State Drives (SSDs) [7]. The malicious aging in I. INTRODUCTION circuits/cores (MAGIC) [8] leads to very fast semiconductor Malicious software such as Trojans or viruses can be aging.
    [Show full text]
  • The X86 Is Dead. Long Live the X86!
    the x86 is dead. long live the x86! CC3.0 share-alike attribution copyright c 2013 nick black with diagrams by david kanter of http://realworldtech.com “Upon first looking into Intel’s x86” that upon which we gaze is mankind’s triumph, and we are its stewards. use it well. georgia tech ◦ summer 2013 ◦ cs4803uws ◦ nick black The x86 is dead. Long live the x86! Why study the x86? Used in a majority of servers, workstations, and laptops Receives the most focus in the kernel/toolchain Very complex processor, thus large optimization space Excellent documentation and literature Fascinating, revealing, lengthy history Do not think that x86 is all that’s gone on over the past 30 years1. That said, those who’ve chased peak on x86 can chase it anywhere. 1Commonly expressed as “All the world’s an x86.” georgia tech ◦ summer 2013 ◦ cs4803uws ◦ nick black The x86 is dead. Long live the x86! In the grim future of computing there are 10,000 ISAs Alpha + BWX/FIX/CIX/MVI SPARC V9 + VIS3a AVR32 + JVM JVMb CMS PTX/SASSc PA-RISC + MAX-2 TILE-Gxd SuperH ARM + NEONe i960 Blackfin IA64 (Itanium) PowerISA + AltiVec/VSXf MIPS + MDMX/MIPS-3D MMIX IBMHLA (s390 + z) a Most recently the “Oracle SPARC Architecture 2011”. b m68k Most recently the Java SE 7 spec, 2013-02-28. c Most recently the PTX ISA 3.1 spec, 2012-09-13. VAX + VAXVA d TILE-Gx ISA 1.2, 2013-02-26. e z80 / MOS6502 ARMv8: A64, A32, and T32, 2011-10-27. f MIX PowerISA v.2.06B, 2010-11-03.
    [Show full text]
  • IDT Winchip 3 Processor Data Sheet
    Preliminary Information PROCESSOR Data Sheet Preliminary Information April 1999 IDT WINCHIPTM 3 PROCESSOR DATA SHEET This is Version 0.9 of the IDT WinChip 3 Processor data sheet. The latest versions of this data sheet may be obtained from www.winchip.com All Rights Reserved Integrated Device Technology, Inc. (IDT) reserves the right to make changes in its products without notice in order to improve design or performance characteristics. This publication neither states nor implies any representations or warranties of any kind, including but not limited to any implied warranty of merchantability or fitness for a particular purpose. No license, express or implied, to any intellectual property rights is granted by this document. IDT makes no representations or warranties with respect to the accuracy or completeness of the contents of this publication or the information contained herein, and reserves the right to make changes at any time, without notice. IDT disclaims responsibility for any consequences resulting from the use of the information included herein. LIFE SUPPORT POLICY Integrated Device Technology's products are not authorized for use as components in life support or other medical devices or systems (hereinafter life support devices) unless a specific written agreement pertaining to such intended use is executed between the manufacturer and an officer of IDT. 1. Life support devices are devices which (a) are intended for surgical implant into the body or (b) support or sustain life and whose failure to perform, when properly used in accordance with instructions for use provided in the labeling, can be reasonably expected to result in a significant injury to the user.
    [Show full text]
  • Performance Portable Short Vector Transforms
    Dissertation Performance Portable Short Vector Transforms ausgefuhrt¨ zum Zwecke der Erlangung des akademischen Grades eines Doktors der technischen Wissenschaften unter der Leitung von Ao. Univ.-Prof. Dipl.-Ing. Dr. techn. Christoph W. Uberhuber¨ E115 – Institut fur¨ Angewandte und Numerische Mathematik eingereicht an der Technischen Universit¨at Wien Fakult¨at fur¨ Technische Naturwissenschaften und Informatik von Dipl.-Ing. Franz Franchetti Matrikelnummer 9525993 Hartiggasse 3/602 2700 Wiener Neustadt Wien, am 7. J¨anner 2003 Kurzfassung In dieser Dissertation wird eine mathematische Methode entwickelt, die automati- sche Leistungsoptimierung von Programmen zur Berechnung von diskreten linearen Transformationen fur¨ Prozessoren mit Multimedia-Vektorerweiterungen (short vector SIMD extensions) erm¨oglicht, wobei besonderes Gewicht auf die diskrete Fourier- Transformation (DFT) gelegt wird. Die neuentwickelte Methode basiert auf dem Kronecker-Produkt-Formalismus, der erweitert wurde, um die spezifischen Eigenschaf- ten von Multimedia-Vektorerweiterungen abzubilden. Es wurde auch eine speziell ange- paßte Cooley-Tukey-FFT-Variante1 entwickelt, die sowohl fur¨ Vektorl¨angen der Form N =2k als auch fur¨ allgemeinere Problemgr¨oßen anwendbar ist. Die neuentwickelte Methode wurde als Erweiterung fur¨ Spiral2 und Fftw3,die derzeitigen Top-Systeme im Bereich der automatischen Leistungsoptimierung fur¨ dis- krete lineare Transformationen, getestet. Sie erlaubt es, extrem schnelle Programme zur Berechnung der DFT zu erzeugen, welche die derzeit schnellsten Programme zur Berechnung der DFT auf Intel-Prozessoren mit den Multimedia-Vektorerweiterungen “Streaming SIMD Extensions” (SSE und SSE 2) sind. Sie sind schneller als die ent- sprechenden Programme aus der manuell optimierten Intel-Softwarebibliothek MKL (Math Kernel Library). Zus¨atzlich wurden die bisher ersten und einzigen automatisch leistungsoptimierten Programme zur Berechnung der Walsh-Hadamard-Transformation und fur¨ zwei-dimensionale Kosinus-Transformationen erzeugt.
    [Show full text]
  • Solaris 7 (Intel Platform Edition) 11/99 Hardware Compatibility List 1
    Solaris 7 (Intel Platform Edition) 11/ 99 Hardware Compatibility List Sun Microsystems, Inc. 901 San Antonio Road Palo Alto, CA 94303–4900 U.S.A. Part No: 806-2510–10 November 1999 Copyright 1999 Sun Microsystems, Inc. 901 San Antonio Road, Palo Alto, California 94303-4900 U.S.A. All rights reserved. This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation. No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any. Third-party software, including font technology, is copyrighted and licensed from Sun suppliers. Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd. Sun, Sun Microsystems, the Sun logo, docs.sun.com, and Solaris are trademarks, registered trademarks, or service marks of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. The OPEN LOOK and SunTM Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun holds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Sun’s licensees who implement OPEN LOOK GUIs and otherwise comply with Sun’s written license agreements.
    [Show full text]