Code Density Concerns for New Architectures

Code Density Concerns for New Architectures

Code Density Concerns for New Architectures Vincent M. Weaver Sally A. McKee Cornell University Chalmers University of Technology 7 October 2009 Introduction • Benchmark ported to 21 different assembly languages • Hand-optimized for minimum size • Code tested and works on all architectures What ISA features lead to high code density? Can this help designers of new ISAs? 1 New ISAs? Really? • ISA design still a concern • FPGAs make it easy • Embedded architectures want dense code • Linux has 12 embedded architectures and counting 2 Benefits of Code Density • L1 iCache holds more instructions • More data fits in unified L2 cache • Less bandwidth required to memory and disk • Fewer TLB misses • Compact loops can be executed from instruction buffer • Smaller cache footprint can lead to energy savings 3 What about Performance? • Hard to optimize performance • Varies across implementations • Dense code often performs well 4 The Benchmark ּ############################################################################### ּ############################################################################### ּ##########O#O################################################################## ּ############################################################################### ּ############################################################################### ּ############################################################################### ּ############################################################################### ּ############################################################################### ּ############################################################################### ּ############################################################################### ּ############################################################################### ּ############################################################################### ּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּ LinuxּVersionּ2.6.11.421.17smp,ּCompiledּ#1ּSMPּFriּAprּ6ּ08:42:34ּUTCּ2007ּּּ Twoּ2791MHzּIntel(R)ּXeon(TM)ּProcessors,ּ2027MּRAM,ּ5521.40ּBogomipsּTotalּּּּּ sampakaּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּ ּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּ ּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּּ LZSS decompression System calls, including disk I/O String manipulation and search Integer to ASCII conversion 5 ISA Categories • VLIW • CISC • RISC • Embedded • 8/16-bit 6 VLIW Processors ia64 • 16-byte bundle holds 3 instructions • Instruction has 3 arguments • Hundreds of integer registers • Predication 7 CISC Processors m68k, s390, VAX, x86, x86 64 • Variable instruction length, 1-54 bytes • Instruction has 2 arguments • 16 integer registers (x86 has 8) • Status flags • Unaligned loads • Complex addressing modes 8 RISC Processors Alpha, ARM, m88k, microblaze, MIPS, PA-RISC, PPC, SPARC • 4 byte instruction length • Instruction has 3 arguments • 32 integer registers (except ARM, SPARC) • Most have a zero register • Many have branch delay slot 9 Embedded Processors avr32, crisv32, sh3, ARM Thumb • 2 byte instruction length • Instruction has 2 arguments • Most have 16 integer registers • Auto-incrementing loads • Status flags 10 8/16-bit Processors 6502, PDP-11, z80 • Variable instruction length (1-6 bytes) • Instruction has 1-2 arguments • Status flags 11 VLIW 256 RISC CISC 192 embedded 8/16-bit bytes 128 64 Results { LZSS Decompression 0 arm ppc vax z80 sh3 ia64 mips 6502 s390 i386 alpha parisc sparc m88k m68kthumbavr32 mblaze pdp-11 x86_64crisv32 12 VLIW 64 RISC CISC 48 embedded 8/16-bit bytes 32 16 Results { String Concatenation 0 arm ppc vax sh3 z80 ia64 mips 6502 s390 i386 alpha sparc parisc m88k thumb avr32 m68k mblaze crisv32x86_64pdp-11 13 VLIW 256 RISC CISC 192 embedded 8/16-bit bytes 128 64 0 arm ppc sh3 vax z80 ia64 mips 6502 i386 s390 alpha sparc m88kparisc m68k thumb Results { Stringavr32 Search mblaze pdp-11 x86_64 crisv32 14 VLIW 256 RISC CISC 192 embedded 8/16-bit bytes 128 No HW divide 64 0 Results { Integer arm z80 sh3 ppc vax ia64 6502 mips s390 i386 parisc alpha m88k sparc thumb m68k avr32 mblaze crisv32pdp-11 x86_64 ! ASCII 15 VLIW 2560 RISC 2048 CISC embedded 1536 8/16-bit bytes 1024 512 0 arm ppc vax sh3 z80 ia64 mips 6502 s390 i386 alphaparisc sparc m88k m68k thumb avr32 mblaze x86_64 crisv32pdp-11 Results { Overall 16 Correlations Correlation Architectural Parameter Coefficient 0.9381 Smallest possible instruction length 0.9116 Low number of integer registers 0.7823 Low Virtual address of first instruction 0.6607 Architecture lacks a zero register 0.6159 Low Bit-width 0.4982 Few operands in each instruction 0.3854 Hardware divide in ALU 17 More Correlations Correlation Architectural Parameter Coefficient 0.3653 Unaligned load/store available 0.3129 Year the architecture was introduced 0.2521 Hardware status flags (zero/overflow/etc.) 0.2121 Auto-incrementing addressing scheme 0.0809 Machine is big-endian 0.0021 Branch delay slot 18 Results { C Comparison (x86/Linux) ~500kB 8192 6144 4096 Size (bytes) 2048 1024 -O3 -Os -Os -Os -Os -O3 -Os -Os -O3 -Os -Os -Os -O3 -Os hand gcc gcc intel sun intel gcc gcc sun gcc gcc gcc intel gcc gcc opt. GLIBC / STATIC GLIBC / DYNAMIC uCLIBC SYSCALL ONLY ASM 19 What is holding back the C version? • Stack frame (Calling convention) • Pointer aliasing • Full program register allocation • Constant loading optimizations • String instructions 20 Related Work • RISC Code Compression • Kozuch and Wolfe { investigate VAX, MIPS, SPARC, m68k, RS6000, PPC • Hasegawa et al. { gcc generated code on m68k, x86, i960, Sparclite, SPARC, MIPS, AMD29k, m88k, Alpha, RS6000 • Flynn et al. { synthetic architectures 21 Conclusions / Future Work • New ISAs are continually being developed; code density is still a concern • Short instruction codings are key • High code density requires co-operation of ISA, operating system, system libraries, and compiler • More architectures should be investigated, as well as more and larger benchmarks 22 Questions? All code is available: http://www.deater.net/weave/vmwprod/asm/ll 23.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    24 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us