Vincent M. Weaver Sally A. Mckee

Vincent M. Weaver Sally A. Mckee

Optimizing for Size: Exploring the Limits of Code Density Sally A. McKee Vincent M. Weaver ASPLOS XIV Poster Session, 8 March 2009 Cornell University Chalmers University of Technology Abstract Hand-Optimized Assembly Results Architectural Size Correlations 0.9020 Minimum possible instruction size Reductions in instruction count can improve cache and bandwidth utiliza- 0.8652 Number of integer registers tion, lower power consumption, and increase overall performance. Nonethe- Total Size of Executable VLIW 2560 RISC less, code density is often overlooked when studying processor architectures. 2048 CISC 0.6623 Architecture has a zero register embedded 1536 8/16-bit 0.5845 Bit-width bytes 1024 We hand-optimize an embedded benchmark for size in assembly language -0.4366 Hardware divide in ALU on 20 different instruction set architectures and investigate the architectural 512 features that contribute most heavily to code density. 0 0.4385 Number of operands in each instruction arm ppc vax sh3 z80 ia64 6502 mips s390 i386 alphaparisc sparc m88k m68k thumb avr32 -0.3356 Unaligned load/store available x86_64 crisv32pdp-11 0.2773 Year the architecture was introduced 256 Size of LZSS Decompression Code VLIW -0.2597 Auto-incrementing addressing scheme Background RISC CISC 192 embedded -0.2597 Hardware status flags (zero/overflow/etc.) 128 8/16-bit bytes -0.1252 Little or Big endian 64 The 20 architectures investigated can be broadly broken into 5 categories: -0.0487 Branch delay slot 0 -0.0079 Maximum possible instruction size Instr Length Opcode arm ppc vax z80 sh3 ia64 mips s390 6502 i386 Type Represented Architectures alpha parisc sparc m88k m68kthumb avr32 (bytes) Args pdp-11 x86_64crisv32 VLIW ia64 16/3 3 Size of String-Search Code VLIW Other Considerations alpha, arm, m88k, mips 256 RISC RISC 4 3 CISC pa-risc, ppc, sparc 192 embedded 128 8/16-bit System libraries and compiler overhead can overshadow the effects of size bytes CISC m68k, s390, vax, x86, x86 64 1-54 2 64 optimization, increasing memory footprint by several orders of magnitude. N/A Embedded avr32, crisv32, sh3, THUMB 2 2 0 These effects cannot be ignored when analyzing system-wide code-density. ppc arm sh3 vax z80 8/16-bit 6502, pdp-11, z80 1-6 1-2 ia64 mips i386 s390 6502 alpha sparc m88k parisc m68kthumb avr32 473kB 473kB 473kB 473kB pdp-11 x86_64 crisv32 Below is the output from the benchmark tool. It uses LZSS decompression, 8192 Size of Integer/ASCII Conversion Code VLIW file I/O, and string manipulation. 256 RISC 6144 CISC 192 embedded 4096 8/16-bit ּ############################################################################### 128 Size (bytes) bytes No HW divide 2048 ּ############################################################################### O#O##########ּ 64 1024################################################################## N/A ּ############################################################################### O3 Os O3,Os O3,Os O3 Os O3 Os O3,Os O3,Os O3,Os O3 Os O3,Os O3,Os hand gcc42 gcc42 intel9 sun12 gcc41 gcc41 gcc42 gcc42 intel9 sun12 gcc41 gcc42 gcc42 intel9 sun12 optimized 0 ּ############################################################################### GLIBC uCLIBC ּ############################################################################### arm z80 sh3 ppc vax ia64 6502 mips s390 i386 parisc alpha sparc m88k m68kthumb avr32 STATICALLY LINKED DYNAMICALLY LINKED SYSCALL ONLY ASM ּ############################################################################### crisv32pdp-11 x86_64 ּ############################################################################### ּ############################################################################### ּ############################################################################### Size of String Concatenation Code VLIW Future Work 64 ּ############################################################################### RISC ּ############################################################################### CISC embedded 48 ּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּ LinuxּVersionּ2.6.11.421.17smp,ּCompiledּ#1ּSMPּFriּAprּ6ּ08:42:34ּUTCּ2007ּּּ 32 8/16-bit .Twoּ2791MHzּIntel(R)ּXeon(TM)ּProcessors,ּ2027MּRAM,ּ5521.40ּBogomipsּTotalּּּ bytes • Investigate more architecturesּּ sampakaּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּ 16ּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּ • ּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּ .Evaluate performance impact of dense architectures 0 ּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּ • arm ppc vax sh3 z80 ia64 mips 6502 s390 i386 Propose compact architectures for FPGA and GPU-based systems. alpha sparc parisc m88k thumb m68k avr32 crisv32x86_64pdp-11.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    1 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us