Intel Itanium Architecture Alex Crawford Matt Ofalt Brief History

Intel Itanium Architecture Alex Crawford Matt Ofalt Brief History

Intel Itanium Architecture Alex Crawford Matt Ofalt Brief History ● Merced - 2001 ○ Slower than competing RISC and CISC ● McKinley (Itanium 2) - 2002 ○ Fixed many of the performance problems on Merced ● Montecito (Itanium 2 9000) - 2006 ○ Dual-core, roughly doubled performance ● Tukwila (Itanium 2 9300) - 2010 ○ Quad-core, memory error correction ○ Shares its chipset with Nehalem Itanium Overview ● 64-bit (path, data, address space) ● Explicit instruction-level parallelism (VLIW) ○ Static "superscaling" ● Compiler ○ Predication ○ Speculation ○ Branch Prediction ● 128 integer registers, 128 FP registers ● 30 functional execution units Compilers ● Very difficult to write ○ Predication ○ Speculation ○ Branch Prediction ● This is the reason the architecture is failing, but... ● Allows for huge improvements ● We like assembly better anyway, right? IA-64 Instructions ● Issued in 128-bit "bundles" ● Three 41-bit instructions per bundle ● Template tells CPU which instructions execute in parallel ○ Not constrained to just one bundle (8 inst. in parallel) ● Six instruction types ○ A Integer ALU I/M unit ○ I Non-ALU integer I unit ○ M Memory M unit ○ F Floating-point F unit ○ B Branch B unit ○ X Extended I/B unit Execution Units ● I-Unit ○ Integer arithmetic ○ Shift and add ○ Logical ● M-Unit ○ Load and Store ○ Basic integer ALU operations ● B-Unit ○ Branches ● F-Unit ○ Floating point IA-64 Assembly [pq] mnemonic [.comp] dest = src [;;] [//] (p0) cmp.eq p1,p2=5,r7 // conditional 5 == r7 pq - 1-bit predicate register mnemonic - name of instruction comp - instruction completer dest - one or more destination operands src - one or more source operands ;; - instruction group stops // - comment Assembly Example ld8 r2 = [r3] sub r4 = r10, r11 ;; add r5 = r2, r6 st8 [r4] = r7 ;; add r2 = r2, 1 ;; st8 [r2] = r5 Assembly Example ld8 r2 = [r3] sub r4 = r10, r11 ;; add r5 = r2, r6 st8 [r4] = r7 ;; add r2 = r2, 1 ;; st8 [r2] = r5 IA-64 Instruction Format 128-Bit Bundle Instruction 1 Instruction 2 Instruction 3 Template (41 bits) (41 bits) (41 bits) (5 bits) 41-Bit Instruction Major Opcode Modifying Bits GR3 GR2 GR1 PR (4 bits) (10 bits) (7 bits) (7 bits) (7 bits) (6 bits) Template Field Template Slot 1 Slot 2 Slot 3 Template Slot 1 Slot 2 Slot 3 00000 M I I 01110 M M F 00001 M I I 01111 M M F 00010 M I I 10000 M I B 00011 M I I 10001 M I B 00100 M L X 10010 M B B 00101 M L X 10011 M B B 01000 M M I 10110 B B B 01001 M M I 10111 B B B 01010 M M I 11000 M M B 01011 M M I 11001 M M B 01100 M F I 11100 M F B 01101 M F I 11101 M F B Branching on x86 if (G_LIKELY(random() != 1)) call 8048440 <random@plt> printf("not one"); cmp $0x1,%eax je 8048524 <main+0x20> mov $0x80485f0,%eax mov %eax,(%esp) call 8048410 <printf@plt> if (G_UNLIKELY(random() != 1)) call 8048440 <random@plt> printf("not one"); cmp $0x1,%eax jne 8048524 <main+0x1B> mov $0x0,%eax leave ret Branching on IA-64 // random() -> r14 // not_ones -> r31 // ones -> r32 if(random() != 1) cmp.eq p1,p2=1,r14 not_ones++; (p1) adds r31=1,r31 else (p2) adds r32=1,r32 ones++; Data Speculation on IA-64 ld8.a r6 = [r8] ;; // other stuff ld8.c r6 = [r8] add r5 = r6, r7 ;; st8 [r18] = r5 Data Speculation on IA-64 (cont.) ld8.a r6 = [r8] // other stuff ;; add r5 = r6, r7 // more stuff chk.a r6, dirty origin: st8 [r18] = r5 dirty: ld8.a r6 = [r8] ;; add r5 = r6, r7 ;; br origin Data Speculation on x86 ??? Rotating Register Stack ● r32-r127 can rotate ("register renaming") ● loop unrolling ● parameter passing ● overflows to memory Performance ● Two bundles per cycle ○ Up to six instructions per cycle ○ Multiply-accumulate allows for 4 FLOPs per cycle ● Quad core ○ QPI (96 GiB/s) ○ Four memory controllers (34 GiB/s) ● Split L1 cache (16kiB Data, 16kiB Data) ● Unified L2 cache (256kiB) ● Unified L3 cache (24MiB) Where do I buy one? ● $3,838 for the Tukwila 9350 ● Servers in excess of $200,000 ● newegg doesn't have them Emulation ● ski ○ ski - ncurses-based IA-64 simulator ○ xski - ski with a GUI ○ http://ski.sourceforge.net/ ● cross compile ○ ia64-gcc ○ ia64-as (live on the edge) Questions?.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    21 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us