Vincent M. Weaver Sally A. Mckee

Total Page:16

File Type:pdf, Size:1020Kb

Vincent M. Weaver Sally A. Mckee Optimizing for Size: Exploring the Limits of Code Density Sally A. McKee Vincent M. Weaver ASPLOS XIV Poster Session, 8 March 2009 Cornell University Chalmers University of Technology Abstract Hand-Optimized Assembly Results Architectural Size Correlations 0.9020 Minimum possible instruction size Reductions in instruction count can improve cache and bandwidth utiliza- 0.8652 Number of integer registers tion, lower power consumption, and increase overall performance. Nonethe- Total Size of Executable VLIW 2560 RISC less, code density is often overlooked when studying processor architectures. 2048 CISC 0.6623 Architecture has a zero register embedded 1536 8/16-bit 0.5845 Bit-width bytes 1024 We hand-optimize an embedded benchmark for size in assembly language -0.4366 Hardware divide in ALU on 20 different instruction set architectures and investigate the architectural 512 features that contribute most heavily to code density. 0 0.4385 Number of operands in each instruction arm ppc vax sh3 z80 ia64 6502 mips s390 i386 alphaparisc sparc m88k m68k thumb avr32 -0.3356 Unaligned load/store available x86_64 crisv32pdp-11 0.2773 Year the architecture was introduced 256 Size of LZSS Decompression Code VLIW -0.2597 Auto-incrementing addressing scheme Background RISC CISC 192 embedded -0.2597 Hardware status flags (zero/overflow/etc.) 128 8/16-bit bytes -0.1252 Little or Big endian 64 The 20 architectures investigated can be broadly broken into 5 categories: -0.0487 Branch delay slot 0 -0.0079 Maximum possible instruction size Instr Length Opcode arm ppc vax z80 sh3 ia64 mips s390 6502 i386 Type Represented Architectures alpha parisc sparc m88k m68kthumb avr32 (bytes) Args pdp-11 x86_64crisv32 VLIW ia64 16/3 3 Size of String-Search Code VLIW Other Considerations alpha, arm, m88k, mips 256 RISC RISC 4 3 CISC pa-risc, ppc, sparc 192 embedded 128 8/16-bit System libraries and compiler overhead can overshadow the effects of size bytes CISC m68k, s390, vax, x86, x86 64 1-54 2 64 optimization, increasing memory footprint by several orders of magnitude. N/A Embedded avr32, crisv32, sh3, THUMB 2 2 0 These effects cannot be ignored when analyzing system-wide code-density. ppc arm sh3 vax z80 8/16-bit 6502, pdp-11, z80 1-6 1-2 ia64 mips i386 s390 6502 alpha sparc m88k parisc m68kthumb avr32 473kB 473kB 473kB 473kB pdp-11 x86_64 crisv32 Below is the output from the benchmark tool. It uses LZSS decompression, 8192 Size of Integer/ASCII Conversion Code VLIW file I/O, and string manipulation. 256 RISC 6144 CISC 192 embedded 4096 8/16-bit ּ############################################################################### 128 Size (bytes) bytes No HW divide 2048 ּ############################################################################### O#O##########ּ 64 1024################################################################## N/A ּ############################################################################### O3 Os O3,Os O3,Os O3 Os O3 Os O3,Os O3,Os O3,Os O3 Os O3,Os O3,Os hand gcc42 gcc42 intel9 sun12 gcc41 gcc41 gcc42 gcc42 intel9 sun12 gcc41 gcc42 gcc42 intel9 sun12 optimized 0 ּ############################################################################### GLIBC uCLIBC ּ############################################################################### arm z80 sh3 ppc vax ia64 6502 mips s390 i386 parisc alpha sparc m88k m68kthumb avr32 STATICALLY LINKED DYNAMICALLY LINKED SYSCALL ONLY ASM ּ############################################################################### crisv32pdp-11 x86_64 ּ############################################################################### ּ############################################################################### ּ############################################################################### Size of String Concatenation Code VLIW Future Work 64 ּ############################################################################### RISC ּ############################################################################### CISC embedded 48 ּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּ LinuxּVersionּ2.6.11.421.17smp,ּCompiledּ#1ּSMPּFriּAprּ6ּ08:42:34ּUTCּ2007ּּּ 32 8/16-bit .Twoּ2791MHzּIntel(R)ּXeon(TM)ּProcessors,ּ2027MּRAM,ּ5521.40ּBogomipsּTotalּּּ bytes • Investigate more architecturesּּ sampakaּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּ 16ּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּ • ּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּ .Evaluate performance impact of dense architectures 0 ּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּ • arm ppc vax sh3 z80 ia64 mips 6502 s390 i386 Propose compact architectures for FPGA and GPU-based systems. alpha sparc parisc m88k thumb m68k avr32 crisv32x86_64pdp-11.
Recommended publications
  • AVR32 EVK1105 Evaluation Kit
    Your Electronic Engineering Resource ATMEL - ATEVK1105 - AVR32 EVK1105 Evaluation Kit Product Overview: The AVR32 EVK1105 is an evaluation kit for the AT32UC3A3256 which combines Atmel’s state of art AVR32 microcontroller with an unrivalled selection of communication interface like USB device including On-The-Go functionality, SDcard, NAND flash with ECC and stereo 16-bit DAC. The AVR32 EVK1105 is an evaluation kit for the AT32UC3A0512 which demonstrates Atmel’s state-of-the-art AVR32 microcontroller in Hi-Fi audio decoding and streaming applications. Kit Contents: The kit contains reference hardware and software for generic MP3 player docking stations. Key Features: High Performance, Low Power AVR®32 UC 32-Bit Microcontroller Multi-Layer Bus System Internal High-Speed Flash Internal High-Speed SRAM Interrupt Controller Power and Clock Manager Including Internal RC Clock and One 32KHz Oscillator Two Multipurpose Oscillators and Two Phase-Lock-Loop (PLL), Watchdog Timer, Real-Time Clock Timer External Memories MultiMediaCard (MMC), Secure-Digital (SD), SDIO V1.1 CE-ATA, FastSD, SmartMedia, Compact Flash Memory Stick: Standard Format V1.40, PRO Format V1.00, Micro IDE Interface One Advanced Encryption System (AES) for AT32UC3A3256S, AT32UC3A3128S and AT32UC3A364S Universal Serial Bus (USB) One 8-channel 10-bit Analog-To-Digital Converter, multiplexed with Digital IOs. Legal Disclaimer: The content of the pages of this website is for your general information and use only. It is subject to change without notice. From time to time, this website may also include links to other websites. These links are provided for your convenience to provide further information. They do not signify that we endorse the website(s).
    [Show full text]
  • Schedule 14A Employee Slides Supertex Sunnyvale
    UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549 SCHEDULE 14A Proxy Statement Pursuant to Section 14(a) of the Securities Exchange Act of 1934 Filed by the Registrant Filed by a Party other than the Registrant Check the appropriate box: Preliminary Proxy Statement Confidential, for Use of the Commission Only (as permitted by Rule 14a-6(e)(2)) Definitive Proxy Statement Definitive Additional Materials Soliciting Material Pursuant to §240.14a-12 Supertex, Inc. (Name of Registrant as Specified In Its Charter) Microchip Technology Incorporated (Name of Person(s) Filing Proxy Statement, if other than the Registrant) Payment of Filing Fee (Check the appropriate box): No fee required. Fee computed on table below per Exchange Act Rules 14a-6(i)(1) and 0-11. (1) Title of each class of securities to which transaction applies: (2) Aggregate number of securities to which transaction applies: (3) Per unit price or other underlying value of transaction computed pursuant to Exchange Act Rule 0-11 (set forth the amount on which the filing fee is calculated and state how it was determined): (4) Proposed maximum aggregate value of transaction: (5) Total fee paid: Fee paid previously with preliminary materials. Check box if any part of the fee is offset as provided by Exchange Act Rule 0-11(a)(2) and identify the filing for which the offsetting fee was paid previously. Identify the previous filing by registration statement number, or the Form or Schedule and the date of its filing. (1) Amount Previously Paid: (2) Form, Schedule or Registration Statement No.: (3) Filing Party: (4) Date Filed: Filed by Microchip Technology Incorporated Pursuant to Rule 14a-12 of the Securities Exchange Act of 1934 Subject Company: Supertex, Inc.
    [Show full text]
  • Mi!!Lxlosalamos SCIENTIFIC LABORATORY
    LA=8902-MS C3b ClC-l 4 REPORT COLLECTION REPRODUCTION COPY VAXNMS Benchmarking 1-’ > .— u) 9 g .— mi!!lxLOS ALAMOS SCIENTIFIC LABORATORY Post Office Box 1663 Los Alamos. New Mexico 87545 — wAifiimative Action/Equal Opportunity Employer b . l)lS(”L,\l\ll K “Thisreport wm prcpmd J, an xcttunt ,,1”wurk ,pmwrd by an dgmcy d the tlnitwl SIdtcs (kvcm. mm:. Ncit her t hc llniml SIJIL.. ( Lwcrnmcm nor any .gcncy tlhmd. nor my 08”Ihcif cmployccs. makci my wur,nly. mprcss w mphd. or JwImL.s m> lcg.d Iululity ur rcspmuhdily ltw Ilw w.cur- acy. .vmplctcncs. w uscftthtc>. ttt”any ml’ormdt ml. dpprdl us. prudu.i. w proccw didowd. or rep. resent%Ihd IIS us wuukl not mfrm$e priwtcly mvnd rqdtts. Itcl”crmcti herein 10 my sp.xi!l tom. mrcial ptotlucr. prtxcm. or S.rvskc hy tdc mmw. Irdcnmrl.. nmu(a.lurm. or dwrwi~.. does nut mmwsuily mnstitutc or reply its mdursmwnt. rccummcnddton. or favorin: by the llniwd States (“mvcmment ormy qxncy thctcd. rhc V!C$VSmd opinmm d .mthor% qmxd herein do nut net’. UMrily r;~lt or died lhow. ol”the llnttcd SIJIL.S( ;ovwnnwnt or my ugcncy lhure of. UNITED STATES .. DEPARTMENT OF ENERGY CONTRACT W-7405 -ENG. 36 . ... LA-8902-MS UC-32 Issued: July 1981 G- . VAX/VMS Benchmarking Larry Creel —. I . .._- -- ----- ,. .- .-. .: .- ,.. .. ., ..,..: , .. .., . ... ..... - .-, ..:. .. *._–: - . VAX/VMS BENCHMARKING by Larry Creel ABSTRACT Primary emphasis in this report is on the perform- ance of three Digital Equipment Corporation VAX-11/780 computers at the Los Alamos National Laboratory. Programs used in the study are part of the Laboratory’s set of benchmark programs.
    [Show full text]
  • Cross-Compiling Linux Kernels on X86 64: a Tutorial on How to Get Started
    Cross-compiling Linux Kernels on x86_64: A tutorial on How to Get Started Shuah Khan Senior Linux Kernel Developer – Open Source Group Samsung Research America (Silicon Valley) [email protected] Agenda ● Cross-compile value proposition ● Preparing the system for cross-compiler installation ● Cross-compiler installation steps ● Demo – install arm and arm64 ● Compiling on architectures ● Demo – compile arm and arm64 ● Automating cross-compile testing ● Upstream cross-compile testing activity ● References and Package repositories ● Q&A Cross-compile value proposition ● 30+ architectures supported (several sub-archs) ● Native compile testing requires wide range of test systems – not practical ● Ability to cross-compile non-natively on an widely available architecture helps detect compile errors ● Coupled with emulation environments (e.g: qemu) testing on non-native architectures becomes easier ● Setting up cross-compile environment is the first and necessary step arch/ alpha frv arc microblaze h8300 s390 um arm mips hexagon score x86_64 arm64 mn10300 unicore32 ia64 sh xtensa avr32 openrisc x86 m32r sparc blackfin parisc m68k tile c6x powerpc metag cris Cross-compiler packages ● Ubuntu arm packages (12.10 or later) – gcc-arm-linux-gnueabi – gcc-arm-linux-gnueabihf ● Ubuntu arm64 packages (13.04 or later) – use arm64 repo for older Ubuntu releases. – gcc-4.7-aarch64-linux-gnu ● Ubuntu keeps adding support for compilers. Search Ubuntu repository for packages. Cross-compiler packages ● Embedded Debian Project is a good resource for alpha, mips,
    [Show full text]
  • Introduction to AVR®
    Introduction to AVR® By BiPOM Electronics, Inc. Revision 1.02 © 2011 BiPOM Electronics, Inc. All rights reserved. All trademark names in this document are the property of their respective owners. AVR® History · The AVR® architecture was conceived by two students at the Norwegian Institute of Technology. · The original AVR® was known as μRISC (Micro RISC). · Among the first of the AVR® line was the AT90S8515, which in a 40-pin DIP package has the same pinout as an 8051 microcontroller. · The creators of the AVR® give no definitive answer as to what the term "AVR" stands for. AVR® Features · Some 8-bit, some 32-bit · TinyAVR, megaAVR, XMEGA, FPSLIC · Harvard Architecture for 8-bit devices: Separate code and data space · Flash, EEPROM and SRAM are all on a single chip, eliminating the need for external memory. · All code executed by the AVR® core must reside in the on-chip flash. · Most instructions take just one or two clock cycles. · The AVR family of processors were designed with the efficient execution of compiled C code in mind. Why AVR® ? · The AVR® instruction set is more powerful than PIC or 8051. · The AVR® runs instructions very fast (can execute 1 instruction in 1 machine clock cycle) · AVR® is a good choice for industrial projects. Frequently Asked Questions Can AVR® run an OS? - Yes, AVR32 can run Linux core 2.6.XX with BusyBox What programming languages are for programming AVR® microcontroller? - There are many different languages but most commonly used are C and BASCOM BASIC. Does BiPOM offer AVR® design services ? – Yes,
    [Show full text]
  • A Characterization of Processor Performance in the VAX-1 L/780
    A Characterization of Processor Performance in the VAX-1 l/780 Joel S. Emer Douglas W. Clark Digital Equipment Corp. Digital Equipment Corp. 77 Reed Road 295 Foster Street Hudson, MA 01749 Littleton, MA 01460 ABSTRACT effect of many architectural and implementation features. This paper reports the results of a study of VAX- llR80 processor performance using a novel hardware Prior related work includes studies of opcode monitoring technique. A micro-PC histogram frequency and other features of instruction- monitor was buiit for these measurements. It kee s a processing [lo. 11,15,161; some studies report timing count of the number of microcode cycles execute z( at Information as well [l, 4,121. each microcode location. Measurement ex eriments were performed on live timesharing wor i loads as After describing our methods and workloads in well as on synthetic workloads of several types. The Section 2, we will re ort the frequencies of various histogram counts allow the calculation of the processor events in 5 ections 3 and 4. Section 5 frequency of various architectural events, such as the resents the complete, detailed timing results, and frequency of different types of opcodes and operand !!Iection 6 concludes the paper. specifiers, as well as the frequency of some im lementation-s ecific events, such as translation bu h er misses. ?phe measurement technique also yields the amount of processing time spent, in various 2. DEFINITIONS AND METHODS activities, such as ordinary microcode computation, memory management, and processor stalls of 2.1 VAX-l l/780 Structure different kinds. This paper reports in detail the amount of time the “average’ VAX instruction The llf780 processor is composed of two major spends in these activities.
    [Show full text]
  • GNU Toolchain for Atmel AVR 32-Bit Embedded Processors
    RELEASE NOTES GNU Toolchain for Atmel AVR 32-bit Embedded Processors Introduction The Atmel AVR® 32-bit GNU Toolchain (3.4.3.820) supports all AVR 32-bit devices. The AVR 32-bit Toolchain is based on the free and open-source GCC compiler. The toolchain includes compiler, assembler, linker, and binutils (GCC and Binutils), Standard C library (Newlib). 32215A-MCU-08/2015 Table of Contents Introduction .................................................................................... 1 1. Installation Instructions .......................................................... 3 1.1. System Requirements ............................................................ 3 1.1.1. Hardware Requirements ............................................. 3 1.1.2. Software Requirements .............................................. 3 1.2. Downloading, Installing, and Upgrading ..................................... 3 1.2.1. Downloading/Installing on Windows .............................. 3 1.2.2. Downloading/Installing on Linux ................................... 3 1.2.3. Upgrading from previous versions ................................ 3 1.2.4. Manifest .................................................................. 3 1.3. Layout ................................................................................. 4 2. Toolset Background ................................................................ 5 2.1. Compiler .............................................................................. 5 2.2. Assembler, Linker, and Librarian .............................................
    [Show full text]
  • Atmel AVR Microcontrollers for Automotive
    Atmel AVR Microcontrollers for Automotive +150°C Qualified Innovative Atmel AVR Several AVR microcontrollers are qualified for operation up to +150°C ambient Microcontroller Solutions for temperature (AEC-Q100 Grade0). These devices enable designers to distribute Automotive intelligence and control functions directly into or near, transfer cases, engine sensors Increasing consumer demands for comfort, safety and reduced actuators, turbochargers and exhaust fuel consumption are driving rapid growth in the market for systems. automotive electronics. All the new functions designed to meet Automotive AVR microcontrollers available these demands require local intelligence and control, which can as Grade0 versions are the Atmel ATtiny45, be optimized by the use of small, powerful microcontrollers. ATtiny87/167, ATtiny261/461/861, ATmega88/168, ATmega16M1, Atmel® leverages its unsurpassed experience in embedded ATmega32M1, ATmega32C1, Flash memory microcontrollers to bring innovative solutions ATmega64M1, and ATmega64C1. for automotive applications. These include a wide range of Atmel AVR® 8- and 32-bit microcontrollers for everything from Automotive AVR microcontrollers are sensor or actuator control to more sophisticated networking available in two different temperature ranges applications. Atmel microcontrollers are fully engineered to to serve various applications: fulfill the quality requirements of OEMs in their drive towards AEC-Q100 Grade1 Z: -40°C to +125°C zero defects. AEC-Q100 Grade0 D, T2: -40°C to +150°C Typical Applications Atmel
    [Show full text]
  • The Implementation of Prolog Via VAX 8600 Microcode ABSTRACT
    The Implementation of Prolog via VAX 8600 Microcode Jeff Gee,Stephen W. Melvin, Yale N. Patt Computer Science Division University of California Berkeley, CA 94720 ABSTRACT VAX 8600 is a 32 bit computer designed with ECL macrocell arrays. Figure 1 shows a simplified block diagram of the 8600. We have implemented a high performance Prolog engine by The cycle time of the 8600 is 80 nanoseconds. directly executing in microcode the constructs of Warren’s Abstract Machine. The imulemention vehicle is the VAX 8600 computer. The VAX 8600 is a general purpose processor Vimal Address containing 8K words of writable control store. In our system, I each of the Warren Abstract Machine instructions is implemented as a VAX 8600 machine level instruction. Other Prolog built-ins are either implemented directly in microcode or executed by the general VAX instruction set. Initial results indicate that. our system is the fastest implementation of Prolog on a commercrally available general purpose processor. 1. Introduction Various models of execution have been investigated to attain the high performance execution of Prolog programs. Usually, Figure 1. Simplified Block Diagram of the VAX 8600 this involves compiling the Prolog program first into an intermediate form referred to as the Warren Abstract Machine (WAM) instruction set [l]. Execution of WAM instructions often follow one of two methods: they are executed directly by a The 8600 consists of six subprocessors: the EBOX. IBOX, special purpose processor, or they are software emulated via the FBOX. MBOX, Console. and UO adamer. Each of the seuarate machine language of a general purpose computer.
    [Show full text]
  • VAX VMS at 20
    1977–1997... and beyond Nothing Stops It! Of all the winning attributes of the OpenVMS operating system, perhaps its key success factor is its evolutionary spirit. Some would say OpenVMS was revolutionary. But I would prefer to call it evolutionary because its transition has been peaceful and constructive. Over a 20-year period, OpenVMS has experienced evolution in five arenas. First, it evolved from a system running on some 20 printed circuit boards to a single chip. Second, it evolved from being proprietary to open. Third, it evolved from running on CISC-based VAX to RISC-based Alpha systems. Fourth, VMS evolved from being primarily a technical oper- ating system, to a commercial operat- ing system, to a high availability mission-critical commercial operating system. And fifth, VMS evolved from time-sharing to a workstation environment, to a client/server computing style environment. The hardware has experienced a similar evolution. Just as the 16-bit PDP systems laid the groundwork for the VAX platform, VAX laid the groundwork for Alpha—the industry’s leading 64-bit systems. While the platforms have grown and changed, the success continues. Today, OpenVMS is the most flexible and adaptable operating system on the planet. What start- ed out as the concept of ‘Starlet’ in 1975 is moving into ‘Galaxy’ for the 21st century. And like the universe, there is no end in sight. —Jesse Lipcon Vice President of UNIX and OpenVMS Systems Business Unit TABLE OF CONTENTS CHAPTER I Changing the Face of Computing 4 CHAPTER II Setting the Stage 6 CHAPTER
    [Show full text]
  • AVR 32-Bit GNU Toolchain: Release 3.4.2.435
    AVR 32-bit GNU Toolchain: Release 3.4.2.435 The AVR 32-bit GNU Toolchain supports all AVR 32-bit devices. The AVR 32- bit Toolchain is based on the free and open-source GCC compiler. The toolchain 8/32-bits Atmel includes compiler, assembler, linker and binutils (GCC and Binutils), Standard C library (Newlib). Microcontrollers About this release Release 3.4.2.435 This is an update release that fixes some defects and upgrades GCC and Binutils to higher versions. Installation Instructions System Requirements AVR 32-bit GNU Toolchain is supported under the following configurations Hardware requirements • Minimum processor Pentium 4, 1GHz • Minimum 512 MB RAM • Minimum 500 MB free disk space AVR 32-bit GNU Toolchain has not been tested on computers with less resources, but may run satisfactorily depending on the number and size of projects and the user's patience. Software requirements • Windows 2000, Windows XP, Windows Vista or Windows 7 (x86 or x86-64). • Fedora 13 or 12 (x86 or x86-64), RedHat Enterprise Linux 4/5/6, Ubuntu Linux 10.04 or 8.04 (x86 or x86-64), or SUSE Linux 11.2 or 11.1 (x86 or x86-64). AVR 32-bit GNU Toolchain may very well work on other distributions. However those would be untested and unsupported. AVR 32-bit GNU Toolchain is not supported on Windows 98, NT or ME. Downloading and Installing The package comes in two forms. • As part of a standalone installer • As Atmel Studio 6.x Toolchain Extension It may be downloaded from Atmel's website at http://www.atmel.com or from the Atmel Studio Extension Gallery.
    [Show full text]
  • Atmel Studio
    Software Atmel Studio USER GUIDE Preface Atmel® Studio is an Integrated Development Environment (IDE) for writing and debugging AVR®/ARM® applications in Windows® XP/Windows Vista®/ Windows 7/8 environments. Atmel Studio provides a project management tool, source file editor, simulator, assembler, and front-end for C/C++, programming, and on-chip debugging. Atmel Studio supports the complete range of Atmel AVR tools. Each new release contains the latest updates for the tools as well as support for new AVR/ARM devices. Atmel Studio has a modular architecture, which allows interaction with 3rd party software vendors. GUI plugins and other modules can be written and hooked to the system. Contact Atmel for more information. Atmel-42167B-Atmel-Studio_User Guide-09/2016 Table of Contents Preface............................................................................................................................ 1 1. Introduction................................................................................................................8 1.1. Features....................................................................................................................................... 8 1.2. New and Noteworthy.................................................................................................................... 8 1.2.1. Atmel Studio 7.0............................................................................................................ 8 1.2.2. Atmel Studio 6.2 Service Pack 2..................................................................................11
    [Show full text]