A Bibliography of Publications on Visual Instruction Sets

Total Page:16

File Type:pdf, Size:1020Kb

A Bibliography of Publications on Visual Instruction Sets A Bibliography of Publications on Visual Instruction Sets Nelson H. F. Beebe University of Utah Department of Mathematics, 110 LCB 155 S 1400 E RM 233 Salt Lake City, UT 84112-0090 USA Tel: +1 801 581 5254 FAX: +1 801 581 4148 E-mail: [email protected], [email protected], [email protected], [email protected] (Internet) WWW URL: http://www.math.utah.edu/~beebe/ 12 December 2019 Version 2.16 Title word cross-reference Mot98, Mot99b, Mot99a, Mot99c, Phi98]. Alto [IEE98]. AMD [Adv00b, Hal97, IM99, OWJF98, OFW99, Pou98]. AMD-K62-E 2 [IM99, Lee96, OWJF98]. 2.0 [LHxx]. [Adv00b]. Analysis [IM99, SB01]. Apple [Fit01]. Application [Hun00, PW96]. 3 [Hal97]. 3-D [Hal97]. 3DNow [OWJF98, Architecture [Hsu01, LHxx, MPW97, OFW99, Adv00a, BN99, IM99, Fom00]. OFW99, PW96, Phi98, TO96]. architectures [Ano96a]. Aspects [Hsu01]. 64-bit [LHxx]. Assembly [Sun00]. Auditorium [IEE98]. August [IEE96, IEE97, IEE98]. 7100LC Automatically [BGGT01]. [Ano95, BBJ+95, BKQW95, LBLS95, Smi94]. 712/60 [Smi94]. 7300LC [BT97]. based [BM00, Hun00]. Better [Ano97a]. Beyond [Hal97]. bit [LHxx]. Black [Ben95]. accelerate [TONH96]. Accelerates Blazing [Tho98]. Blinn [Bli97]. [DDHS00]. Accelerating breakthroughs [Smi94]. Briefs [GL97]. [DDC+98, Lee96, Sun96a]. achieve [Smi94]. Bugs [Ben95]. builds [Pou98]. Built Across [DDC+98]. algorithms [TONH96]. [Pea95]. Built-In [Pea95]. Business Alpha [RRM96]. Alternatives [Ano98]. [BEM95, GL97]. BYTE [Ano96a, Ano96b]. AltiVec [DDC+98, DDHS00, Fit01, Ful98, 1 2 California [IEE96, IEE97, IEE98]. free [Ano96a]. Fugue [Bli97]. Functions Capabilities [Pea95]. Case [Sun96a]. [BKQW95, SB01]. Centaur [Hal97]. challenge [Hal97]. Chip G4 [Fit01, Hun00]. Gaining [GL97]. gating [Sun96b, BT97, Pou98, TO96]. Chips [BM00]. Generation [Mah96, Phi98]. [Ano97b, IEE97, Hal97, IEE96, IEE98]. Graphics [Fom00, Pea95]. clock [BM00]. Code [Fom99]. Coding [Hsu01]. Combining [TO96]. commerce Hardware [Hsu01, Smi94]. High [Ano96b]. communications [PW96]. [AJ97, BEM95, Mod97, Pea95, Smi94, TO96]. Competition [GL97]. Competitive High-Performance [BEM95, Pea95, TO96]. [BKQW95]. Computational [Hun00]. highly [BT97]. hopped [Ano97a]. Hot Computer [Hsu01]. Computing [IEE96, IEE97, IEE98]. HP [Smi94]. [Hun00, JC98]. conference [IEE98]. Considerations [Wei96]. Contrast [Ben95]. IC [BKQW95, Ben95]. II [Mod97, KGOL97]. conventional [TONH96]. Converging Imaging [Ben95]. Implementation [GL97]. Core [Sun96a, Ano96b]. Corner [AJ97, Ano95, IM99, Kag96]. [Bli97]. Cost [BEM95, Pea95, Smi94]. CPU Implementations [OFW99]. Implicit [Mod97, Smi94]. CPUs [Tho98]. Cutting [BGGT01]. Improving [PW96, BM00]. [BN99]. Cyrix [Hal97]. INBOX [Ano96b]. Information [GL97]. innovative [Smi94]. Instruction D [Hal97]. Data [Adv00b]. Decisions [Mah96, RRM96, Sun96a, Sun96b, Sun02, [BKQW95]. Decoder [AJ97, LBLS95]. Lee96, TONH96]. Instructions Design [BBJ+95, BKQW95]. Desktop [GV06, IM99, AB01, TO96]. integrated [Ano97c]. Development [BEM95]. Digital [BT97]. Intel [AJ97, GL97]. Dilemma [GL97]. Does [AB01, Ano96b, Ano97a, GL97, Hal97, KB96, [Tho98]. Dolby [AJ97]. Dynamic PW96, PWW97, Ram99, SB01, Wei96]. [Mod97, BM00]. Dynamics [Hun00]. Interface [Mot99a]. Internet [Ano96b, TH99]. issue [Ano97d, TO96]. Edge [BN99]. Effectively [BN99]. efficient issues [Ano96a]. IX [IEE97]. [Lee96]. electric [Ano96b]. Embedded [Adv00b]. Emmerald [AB01]. Enhanced Jerry [Pou98]. Jim [Bli97]. Just [Ano97b]. [Pou98, LBLS95]. Enhancements [Ano95]. Enhancing [TONH96]. Environment K6 [Pou98, IM99, OWJF98]. K6-2 [BKQW95]. Environments [Mot98]. [IM99, OWJF98]. K62 [Adv00b]. Key Evaluation [Hun00]. Examples [Mot99b]. [GV06]. Execution [Mod97]. Exploiting [BGGT01]. Extension [DDHS00, PW96]. Extensions Language [Sun00]. Levels [Ben95]. Linux [LHxx, Mah96, RRM96, JC98, Lee96, TH99]. [BN99]. Logic [Ben95]. Look [Fit01, Ano97c]. Low Fast [Ano97b, AB01]. First [Fit01, Kag96]. [BEM95, Pea95, Smi94]. Low-Cost Fits [Ano97a]. Floating [SGJ+99]. [BEM95, Pea95]. Floating-Point [SGJ+99]. Fluid [Hun00]. FORTRAN [Hun00]. FORTRAN-based Manual [Adv00a, Mot98, Mot99a, Sun00]. [Hun00]. four [TO96]. four-issue [TO96]. matrix [AB01]. matrix-matrix [AB01]. 3 MAX [Lee96]. MAX-2 [Lee96]. Media [Hun00]. PowerPC [DDHS00, Tho98]. [DDC+98, DDHS00, Sun96b, TONH96, praise [Ano96b]. Presents [GL97]. Price Lee96]. media-processing [TONH96]. [Smi94]. privacy [Ano96b]. Processing Memorial [IEE98]. methodologies [DDC+98, DDHS00, Sun96b, TONH96, [BBJ+95]. Microarchitecture JC98, Lee96]. Processor [Adv00b, Fit01, [Kag96, KGOL97]. Microprocessor Mah96, Mod97, BM00, TO96]. Processors [BKQW95, OWJF98, Phi98, BBJ+95, BT97]. [KGOL97, LBLS95]. Products [Ano94]. Microprocessors [KGOL97]. minimal Programming [KB96, Mot98, Mot99b, [Lee96]. MMX [AJ97, Ano96a, Ano96b, Mot99a, Ano96a, Ano96b]. Ano97a, Ano97b, Ano97c, Ano97d, Ano98, Bli97, Dro00, Fom99, GL97, Hal97, Kag96, Quality [AJ97]. KGOL97, KB96, MPW97, Mod97, PW96, PWW97, Pou98, Ram99, SB01, Wei96]. readers [Ano96a, Ano96b]. Real Mobile [JC98]. Model [Smi94]. money [Fom00, LBLS95]. Real-Time [Ano96b]. Motion [RRM96]. Motorola [Fom00, LBLS95]. Reciprocal [IM99]. [Ful98]. MPEG [LBLS95]. Multimedia record [IEE96, IEE98]. Reference [Sun00]. [Ano95, LHxx, Mah96, Pea95, PWW97, RISC [LHxx, Pea95, TONH96]. Root SGJ+99, LBLS95, PW96, TO96]. [IM99]. Routines [GV06]. multimedia-enhanced [LBLS95]. multiply [AB01]. Multiuser [BEM95]. Scientific [Hun00]. Second [Phi98]. Series [Smi94]. Server [BEM95, Ano96a]. Set Networking [Pea95, Sun96a]. New-Media [Mah96, Sun96a, Sun02, Lee96, Sun96b]. sets [Sun96b]. News [GL97]. Notes [GL97]. NT [TONH96]. Sheet [Adv00b]. significantly [Ano96a]. [TONH96]. SIMD [TH99, Phi98]. Simulation [Hun00]. Sixth [Mah96]. off [Ano96b, Wei96]. on-chip [TO96]. Sixth-Generation [Mah96]. Smart [GL97]. operation [BM00]. Optimization [Fom99]. Software [AS96, Hsu01, LBLS95]. sound Optimizing [Fom00]. Order [SGJ+99]. [Ano96b]. souped [Ano97c]. souped-up Out-of-Order [SGJ+99]. Overview [Ano97c]. SPARC [Sun00]. Special [Ano95, MPW97, Ram99]. [Ano97d]. Spectrum [DDC+98]. Speed [GV06]. Speeds [TONH96]. Square [IM99]. P55C [Kag96]. PA SSE [AB01, SB01]. Stanford [Ano95, BBJ+95, BKQW95, BT97, LBLS95, [IEE96, IEE97, IEE98]. Starts [Ano97a]. LHxx, Pea95, Smi94]. PA-7100LC [Smi94]. strategies [BM00, KB96]. Streaming PA-RISC [LHxx, Pea95]. packaging [TH99]. Studio [GV06]. Study [Smi94]. packing [BM00]. Palo [IEE98]. [BKQW95, SB01]. Subword [JC98, Lee96]. Parallelism [BGGT01, Lee96]. PC [AS96]. Sun [GV06]. Superscalar [SGJ+99, Smi94]. PCs [Ano96a, Ano97c, PWW97]. Pentium Support [Sun96b]. supporting [Ano97a, Ano97c, KGOL97, Smi94]. [Lee96, TO96]. symposium [IEE96]. PentiumAE [Mod97]. Performance System [BEM95, BT97, Pou98]. Systems [AS96, BEM95, Mod97, Pea95, SB01, Wei96, [Hun00, Ano97c, JC98]. BM00, PW96, Smi94, TO96]. Phoning [GL97]. Point [SGJ+99]. Power Technology [Adv00a, AJ97, Ano97d, [Ano97c, Smi94, BM00, Hal97]. PowerMac DDC+98, Fom99, Ful98, GL97, Kag96, REFERENCES 4 KGOL97, MPW97, Mod97, Mot98, Mot99a, ISSN 1532-0626 (print), 1532- Mot99c, OWJF98, OFW99, PW96, Phi98, 0634 (electronic). URL http:/ Ram99, Wei96, Ano97c]. test [Ano97a]. /www3.interscience.wiley.com/ Those [Ano97b]. Three [SGJ+99]. cgi-bin/abstract/77004416/START; Three-Way [SGJ+99]. Time http://www3.interscience.wiley. [Fom00, LBLS95]. times [PW96]. Tour com/cgi-bin/fulltext?ID=77004416& [Mot99c]. Trade [Wei96]. Trade-off [Wei96]. PLACEBO=IE.pdf. Transitioning [GL97]. Translating [Dro00]. Tuning [AS96]. TV [GL97]. AMD:2000:TM UltraSPARC [Sun96a, Sun96b, TO96]. [Adv00a] Advanced Micro Devices, Inc., Unit [SGJ+99]. University One AMD Place, P.O. Box [IEE96, IEE97, IEE98]. Unix [Smi94]. 3453, Sunnyvale, California, USA. Upgrade [GL97]. uses [Pou98, Smi94]. 3DNow! Technology Manual, March Using [GV06, Sun96a, AJ97, AB01]. 2000. x + 62 pp. URL Utilizing [BN99]. http://www.amd.com/products/ epd/processors/6.32bitproc/8. Value [BM00]. Value-based [BM00]. amdk6fami/26.amdk62e/21928/21928. Vector [Tho98]. Version [Sun02]. Video pdf. Order number 21928G/0. [RRM96, JC98, LBLS95]. VIII [IEE96]. AMD:2000:AKE VIS [GV06, Sun96a, Sun96b, Sun02, TONH96]. [Adv00b] Advanced Micro Devices, Inc., Visual [Sun96b]. Voices [GL97]. Voltage One AMD Place, P.O. Box [Ben95]. Vulnerable [GL97]. 3453, Sunnyvale, California, USA. AMD-K62-E+ Embedded Pro- + Want [GL97]. Way [SGJ 99]. Web [GL97]. cessor Data Sheet, September White [Ben95]. will [Hal97]. Windows 2000. xxii + 346 pp. URL [Ano96a]. Workbench [Pou98]. http://www.amd.com/products/ Workstation [Pea95, Smi94]. write epd/processors/6.32bitproc/8. [Ano96a]. amdk6fami/28.amdk62e/23542/23542a. pdf. Order number 23542A/0. x86 [Mah96]. Abel:1997:IHQ References [AJ97] James C. Abel and Michael A. Aberdeen:2001:EFM Julier. Implementation of a high quality Dolby digital decoder us- [AB01] Douglas Aberdeen and Jonathan ing MMX technology. Intel Tech- Baxter. Emmerald: a fast matrix- nology Journal, (Q3):11, 1997. matrix multiply using Intel's SSE URL http://developer.intel. instructions. Concurrency and com/technology/itj/q31997/articles/ Computation: Practice and Ex- art_3.htm; http://developer. perience, 13(2):103{119, Febru- intel.com/technology/itj/q31997/ ary 2001. CODEN CCPEBO. pdf/decoder.pdf. REFERENCES 5 Anonymous:1994:NPd up with Intel's MMX. Byte Mag- azine, 22(2):26{??, February 1997. [Ano94] Anonymous. New Products. Data- CODEN BYTEDJ. ISSN 0360-5280 mation, 40(4):85{??, February 15, (print), 1082-7838 (electronic). 1994. CODEN DTMNAT. ISSN 0011-6963. Anonymous:1997:JHF Anonymous:1995:OIP [Ano97b] Anonymous. Just how fast are those new MMX chips? Byte Magazine, [Ano95] Anonymous. Overview of the imple- 22(2):26{??, February
Recommended publications
  • REPORT Compaq Chooses SMT for Alpha Simultaneous Multithreading
    VOLUME 13, NUMBER 16 DECEMBER 6, 1999 MICROPROCESSOR REPORT THE INSIDERS’ GUIDE TO MICROPROCESSOR HARDWARE Compaq Chooses SMT for Alpha Simultaneous Multithreading Exploits Instruction- and Thread-Level Parallelism by Keith Diefendorff Given a full complement of on-chip memory, increas- ing the clock frequency will increase the performance of the As it climbs rapidly past the 100-million- core. One way to increase frequency is to deepen the pipeline. transistor-per-chip mark, the micro- But with pipelines already reaching upwards of 12–14 stages, processor industry is struggling with the mounting inefficiencies may close this avenue, limiting future question of how to get proportionally more performance out frequency improvements to those that can be attained from of these new transistors. Speaking at the recent Microproces- semiconductor-circuit speedup. Unfortunately this speedup, sor Forum, Joel Emer, a Principal Member of the Technical roughly 20% per year, is well below that required to attain the Staff in Compaq’s Alpha Development Group, described his historical 60% per year performance increase. To prevent company’s approach: simultaneous multithreading, or SMT. bursting this bubble, the only real alternative left is to exploit Emer’s interest in SMT was inspired by the work of more and more parallelism. Dean Tullsen, who described the technique in 1995 while at Indeed, the pursuit of parallelism occupies the energy the University of Washington. Since that time, Emer has of many processor architects today. There are basically two been studying SMT along with other researchers at Washing- theories: one is that instruction-level parallelism (ILP) is ton. Once convinced of its value, he began evangelizing SMT abundant and remains a viable resource waiting to be tapped; within Compaq.
    [Show full text]
  • Prozessorarchitektur Am Beispiel Des Amdathlon
    PROZESSORARCHITEKTUR AM BEISPIEL DES AMD ATHLON AUSGEARBEITET VON ALEXANDER TABAKOFF Betreuender Lehrer: Prof. Wolfgang Schinwald VERÖFFENTLICHT AM 26.2.2001 PROZESSORARCHITEKTUR INHALTSVERZEICHNIS: 1 Historische / allgemeine Einführung 1.1Die Anwendungsbereiche von Prozessoren 1.2Der erste Prozessor 1.3Die Entwicklung bis zum 586 1.4Der AMD Athlon und der Pentium III - Entwicklungsgeschichte 2 Grundlegende Dinge zur Prozessorarchitektur und dem Bau von Prozessoren 2.1Physikalisch 2.1.1Der Aufbau eines Transistors 2.1.2Die Auswirkungen in die Praxis 2.2Logisch 2.3Die Herstellung von Prozessoren und ihre Grenzen 2.4Der Von-Neumann-Rechner 3 Die Prozessorarchitektur des AMD Athlon im Vergleich zu seinen Konkurrenten 3.1Das Design des AMD Athlon 3.2Das Bussytem des AMD Athlon 3.3Die Cachearchitektur des AMD Athlon 3.4Vor- und Nachteile gegenüber anderen Designs 3.5Interview mit Jan Gütter, Public Relations Sprecher von AMD 4 Anhang 4.1Der Grund dieser Arbeit 4.2Glossar 4.3Literaturverzeichnis 4.4Begleitprotokoll 4.5Bildnachweis Inhaltsverzeichnis: - Seite 2 PROZESSORARCHITEKTUR 1 HISTORISCHE / ALLGEMEINE EINFÜHRUNG 1.1Die Anwendungsbereiche von Prozessoren Prozessoren haben heute verschiedenste Anwendungsbereiche. Sie werden in Autos, Set Top Boxen, Spielekonsolen, Handys, Taschenrechnern, PCs usw. verwendet. Dabei macht der Marktanteil der PC Prozessoren nur rund 2%1 aus. Trotz dieser vergleichsweise geringen Produktion genießen PC Prozessoren einen bedeutend höheren Bekanntheitsgrad. Fast jeder kennt PC Prozessoren wie den Intel Pentium
    [Show full text]
  • Implementing Elliptic Curve Cryptography (A Narrow Survey)
    Implementing Elliptic Curve Cryptography (a narrow survey) Institute of Computing – UNICAMP Campinas, Brazil April 2005 Darrel Hankerson Auburn University Implementing ECC – 1/110 Overview Objective: sample selected topics of practical interest. Talk will favor: I Software solutions on general-purpose processors rather than dedicated hardware. I Techniques with broad applicability. I Methods targeted to standardized curves. Goals: I Present proposed methods in context. I Limit coverage of technical details (but “implementing” necessarily involves platform considerations). Implementing ECC – 2/110 Focus: higher-performance processors “Higher-performance” includes processors commonly associated with workstations, but also found in surprisingly small portable devices. Sun and IBM workstations RIM pager circa 1999 SPARC or Intel x86 (Pentium) Intel x86 (custom 386) 256 MB – 8 GB 2 MB “disk”, 304 KB RAM 0.5 GHz – 3 GHz 10 MHz, single AA battery heats entire building fits in shirt pocket Implementing ECC – 3/110 Optimizing ECC Elliptic Curve Digital Signature Algorithm (ECDSA) Random number Big number and Curve generation modular arithmetic arithmetic Fq field arithmetic General categories of optimization efforts: 1. Field-level optimizations. 2. Curve-level optimizations. 3. Protocol-level optimizations. Constraints: security requirements, hardware limitations, bandwidth, interoperability, and patents. Implementing ECC – 4/110 Optimizing ECC... 1. Field-level optimizations. I Choose fields with fast multiplication and inversion. I Use special-purpose hardware (cryptographic coprocessors, DSP, floating-point, SIMD). 2. Curve-level optimizations. I Reduce the number of point additions (windowing). I Reduce the number of field inversions (projective coords). I Replace point doubles (endomorphism methods). 3. Protocol-level optimizations. I Develop efficient protocols. I Choose methods and protocols that favor your computations or hardware.
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • A Bibliography of Publications in IEEE Micro
    A Bibliography of Publications in IEEE Micro Nelson H. F. Beebe University of Utah Department of Mathematics, 110 LCB 155 S 1400 E RM 233 Salt Lake City, UT 84112-0090 USA Tel: +1 801 581 5254 FAX: +1 801 581 4148 E-mail: [email protected], [email protected], [email protected] (Internet) WWW URL: http://www.math.utah.edu/~beebe/ 16 September 2021 Version 2.108 Title word cross-reference -Core [MAT+18]. -Cubes [YW94]. -D [ASX19, BWMS19, DDG+19, Joh19c, PZB+19, ZSS+19]. -nm [ABG+16, KBN16, TKI+14]. #1 [Kah93i]. 0.18-Micron [HBd+99]. 0.9-micron + [Ano02d]. 000-fps [KII09]. 000-Processor $1 [Ano17-58, Ano17-59]. 12 [MAT 18]. 16 + + [ABG+16]. 2 [DTH+95]. 21=2 [Ste00a]. 28 [BSP 17]. 024-Core [JJK 11]. [KBN16]. 3 [ASX19, Alt14e, Ano96o, + AOYS95, BWMS19, CMAS11, DDG+19, 1 [Ano98s, BH15, Bre10, PFC 02a, Ste02a, + + Ste14a]. 1-GHz [Ano98s]. 1-terabits DFG 13, Joh19c, LXB07, LX10, MKT 13, + MAS+07, PMM15, PZB+19, SYW+14, [MIM 97]. 10 [Loc03]. 10-Gigabit SCSR93, VPV12, WLF+08, ZSS+19]. 60 [Gad07, HcF04]. 100 [TKI+14]. < [BMM15]. > [BMM15]. 2 [Kir84a, Pat84, PSW91, YSMH91, ZACM14]. [WHCK18]. 3 [KBW95]. II [BAH+05]. ∆ 100-Mops [PSW91]. 1000 [ES84]. 11- + [Lyl04]. 11/780 [Abr83]. 115 [JBF94]. [MKG 20]. k [Eng00j]. µ + [AT93, Dia95c, TS95]. N [YW94]. x 11FO4 [ASD 05]. 12 [And82a]. [DTB01, Dur96, SS05]. 12-DSP [Dur96]. 1284 [Dia94b]. 1284-1994 [Dia94b]. 13 * [CCD+82]. [KW02]. 1394 [SB00]. 1394-1955 [Dia96d]. 1 2 14 [WD03]. 15 [FD04]. 15-Billion-Dollar [KR19a].
    [Show full text]
  • 2 the VIS Instruction Set Pdist Instruction
    The VISTM Instruction Set Version 1.0 June 2002 A White Paper This document provides an overview of the Visual Instruction Set. ® 4150 Network Circle Santa Clara, CA 95054 USA www.sun.com Copyright © 2002 Sun Microsystems, Inc. All Rights reserved. THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED"AS IS" WITHOUT ANY EXPRESS REPRESENTATIONS OR WARRANTIES. IN ADDITION, SUN MICROSYSTEMS, INC. DISCLAIMS ALL IMPLIED REPRESENTATIONS AND WARRANTIES, INCLUDING ANY WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT OF THIRD PARTY INTELLECTUAL PROPERTY RIGHTS. This document contains proprietary information of Sun Microsystems, Inc. or under license from third parties. No part of this document may be reproduced in any form or by any means or transferred to any third party without the prior written consent of Sun Microsystems, Inc. Sun, Sun Microsystems, the Sun Logo, VIS, Java, and mediaLib are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the United States and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. UNIX is a registered trademark in the United States and other countries, exclusively licensed through x/Open Company Ltd. The information contained in this document is not designed or intended for use in on-line control of aircraft, air traffic, aircraft navigation or aircraft communications; or in the design, construction, operation or maintenance of any nuclear facility. Sun disclaims any express or implied warranty of fitness for such uses.
    [Show full text]
  • Optimizing SIMD Execution in HW/SW Co-Designed Processors
    Optimizing SIMD Execution in HW/SW Co-designed Processors Rakesh Kumar Department of Computer Architecture Universitat Politècnica de Catalunya Advisors: Alejandro Martínez Intel Barcelona Research Center Antonio González Intel Barcelona Research Center Universitat Politècnica de Catalunya A thesis submitted in fulfillment of the requirements for the degree of Doctor of Philosophy / Doctor per la UPC ABSTRACT SIMD accelerators are ubiquitous in microprocessors from different computing domains. Their high compute power and hardware simplicity improve overall performance in an energy efficient manner. Moreover, their replicated functional units and simple control mechanism make them amenable to scaling to higher vector lengths. However, code generation for these accelerators has been a challenge from the days of their inception. Compilers generate vector code conservatively to ensure correctness. As a result they lose significant vectorization opportunities and fail to extract maximum benefits out of SIMD accelerators. This thesis proposes to vectorize the program binary at runtime in a speculative manner, in addition to the compile time static vectorization. There are different environments that support runtime profiling and optimization support required for dynamic vectorization, one of most prominent ones being: 1) Dynamic Binary Translators and Optimizers (DBTO) and 2) Hardware/Software (HW/SW) Co-designed Processors. HW/SW co-designed environment provides several advantages over DBTOs like transparent incorporations of new hardware features, binary compatibility, etc. Therefore, we use HW/SW co-designed environment to assess the potential of speculative dynamic vectorization. Furthermore, we analyze vector code generation for wider vector units and find out that even though SIMD accelerators are amenable to scaling from hardware point of view, vector code generation at higher vector length is even more challenging.
    [Show full text]
  • SIMD-Swift: Improving Performance of Swift Fault Detection
    Master thesis SIMD-Swift: Improving Performance of Swift Fault Detection Oleksii Oleksenko 15. November 2015 Technische Universität Dresden Department of Computer Science Systems Engineering Group Supervisor: Prof. Christof Fetzer Adviser: MSc. Dmitrii Kuvaiskii Declaration Herewith I declare that this submission is my own work and that, to the best of my knowledge, it contains no material previously published or written by another person nor material which to a substantial extent has been accepted for the award of any other degree or diploma of the university or other institute of higher education, except where due acknowledgment has been made in the text. Dresden, 9. November 2015 Oleksii Oleksenko Abstract The general tendency in modern hardware is an increase in fault rates, which is caused by the decreased operation voltages and feature sizes. Previously, the issue of hardware faults was mainly approached only in high-availability enterprise servers and in safety-critical applications, such as transport or aerospace domains. These fields generally have very tight requirements, but also higher budgets. However, as fault rates are increasing, fault tolerance solutions are starting to be also required in applications that have much smaller profit margins. This brings to the front the idea of software-implemented hardware fault tolerance, that is, the ability to detect and tolerate hardware faults using software-based techniques in commodity CPUs, which allows to get resilience almost for free. Current solutions, however, are lacking in performance, even though they show quite good fault tolerance results. This thesis explores the idea of using the Single Instruction Multiple Data (SIMD) technology for executing all program’s operations on two copies of the same data.
    [Show full text]
  • Ultrasparc T1™ Supplement to the Ultrasparc Architecture 2005
    UltraSPARC T1™ Supplement to the UltraSPARC Architecture 2005 Draft D2.1, 14 May 2007 Privilege Levels: Hyperprivileged, Privileged, and Nonprivileged Distribution: Public Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A. 650-960-1300 Part No.No: 8xx-xxxx-xx819-3404-05 ReleaseRevision: 1.0, Draft 2002 D2.1, 14 May 2007 ii UltraSPARC T1 Supplement • Draft D2.1, 14 May 2007 Copyright 2002-2006 Sun Microsystems, Inc., 4150 Network Circle • Santa Clara, CA 950540 USA. All rights reserved. This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation. No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any. Third-party software, including font technology, is copyrighted and licensed from Sun suppliers. Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd. For Netscape Communicator™, the following notice applies: Copyright 1995 Netscape Communications Corporation. All rights reserved. Sun, Sun Microsystems, the Sun logo, Solaris, and VIS are trademarks, registered trademarks, or service marks of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc The OPEN LOOK and Sun™ Graphical User Interface was developed by Sun Microsystems, Inc.
    [Show full text]
  • Idisa+: a Portable Model for High Performance Simd Programming
    IDISA+: A PORTABLE MODEL FOR HIGH PERFORMANCE SIMD PROGRAMMING by Hua Huang B.Eng., Beijing University of Posts and Telecommunications, 2009 a Thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in the School of Computing Science Faculty of Applied Science c Hua Huang 2011 SIMON FRASER UNIVERSITY Fall 2011 All rights reserved. However, in accordance with the Copyright Act of Canada, this work may be reproduced without authorization under the conditions for Fair Dealing. Therefore, limited reproduction of this work for the purposes of private study, research, criticism, review and news reporting is likely to be in accordance with the law, particularly if cited appropriately. APPROVAL Name: Hua Huang Degree: Master of Science Title of Thesis: IDISA+: A Portable Model for High Performance SIMD Pro- gramming Examining Committee: Dr. Kay C. Wiese Associate Professor, Computing Science Simon Fraser University Chair Dr. Robert D. Cameron Professor, Computing Science Simon Fraser University Senior Supervisor Dr. Thomas C. Shermer Professor, Computing Science Simon Fraser University Supervisor Dr. Arrvindh Shriraman Assistant Professor, Computing Science Simon Fraser University SFU Examiner Date Approved: ii Declaration of Partial Copyright Licence The author, whose copyright is declared on the title page of this work, has granted to Simon Fraser University the right to lend this thesis, project or extended essay to users of the Simon Fraser University Library, and to make partial or single copies only for such users or in response to a request from the library of any other university, or other educational institution, on its own behalf or for one of its users.
    [Show full text]
  • UNIVERSITY of CALIFORNIA, SAN DIEGO Holistic Design for Multi-Core Architectures a Dissertation Submitted in Partial Satisfactio
    UNIVERSITY OF CALIFORNIA, SAN DIEGO Holistic Design for Multi-core Architectures A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Computer Science (Computer Engineering) by Rakesh Kumar Committee in charge: Professor Dean Tullsen, Chair Professor Brad Calder Professor Fred Chong Professor Rajesh Gupta Dr. Norman P. Jouppi Professor Andrew Kahng 2006 Copyright Rakesh Kumar, 2006 All rights reserved. The dissertation of Rakesh Kumar is approved, and it is acceptable in quality and form for publication on micro- film: Chair University of California, San Diego 2006 iii DEDICATIONS This dissertation is dedicated to friends, family, labmates, and mentors { the ones who taught me, indulged me, loved me, challenged me, and laughed with me, while I was also busy working on my thesis. To Professor Dean Tullsen for teaching me the values of humility, kind- ness,and caring while trying to teach me football and computer architecture. For always encouraging me to do the right thing. For always letting me be myself. For always believing in me. For always challenging me to dream big. For all his wisdom. And for being an adviser in the truest sense, and more. To Professor Brad Calder. For always caring about me. For being an inspiration. For his trust. For making me believe in myself. For his lies about me getting better at system administration and foosball even though I never did. To Dr Partha Ranganathan. For always being there for me when I would get down on myself. And that happened often. For the long discussions on life, work, and happiness.
    [Show full text]
  • HPC-Event-Return-Of-Vector-20160801
    The OrionX Constellation: Data Center High Performance Computing The Return of Vector Processing Shahin Khan Digital transformation, cloud computing and application elasticity, open source software, and new app areas like AI and IoT are driving a renaissance in system architecture. This is an area of interest and research for us at OrionX. Vector processing was an interesting topic to re-emerge recently. First during the International Supercomputing Conference (ISC), and again in various announcements in implicit and explicit ways. On the Monday of the ISC conference, a new leader on the TOP500 list was announced. The Sunway TaihuLight system uses a new processor architecture that is Single-Instruction-Multiple-Data (SIMD) with a pipeline that can do eight 64-bit floating-point calculations per cycle. 10,649,600 computing cores comprising 40,960 nodes produce 93 petaflop/s running the LINPACK benchmark. Later that day, at the “ISC 2016 Vendor Showdown”, NEC had a presentation about its project “Aurora”. This project aims to combine x86 clusters and NEC’s vector processors in the same high bandwidth system. NEC has a long history of advanced vector processors with its SX architecture. Among many achievements, it built the Earth Simulator, a vector-parallel system that was #1 on the TOP500 list from 2002 to 2004. At its debut, it had a substantial (nearly 5x) lead over the previous #1. Vector Processing and Parallelism Vector processing is a time-honored system architecture that started the supercomputing market, starting with the CDC STAR-100 system (STrings and ARrays, performing at 100 million floating point operations per second), and then led by the legendary Seymour Cray and his superb team.
    [Show full text]