Comparison of Hardware Performance of Selected Phase II eSTREAM Candidates
Kris Gaj Gabriel Southern Ramakrishna Bachimanchi & Fall 2006 GMU ECE 545: Introduction to VHDL class
George Mason University Goal
Comparison of Profile II (hardware) Phase 2 Focus candidates: • Grain • Mickey-128 • Phelix • Trivium
Two additional reference points: • A5/1 (old & insecure GSM standard) • AES (compact architecture)
Two hardware technologies:
• Xilinx Spartan 3 FPGAs • TSMC 90 nm standard-cell library ASICs Genesis & approach
• Part of GMU Fall 2006 graduate course ECE 545 Introduction to VHDL
• Individual 6-week project
• 4 students working independently on each eSTREAM cipher
• best code for each algorithm selected at the end of the semester
• selected designs verified and revised in order to assure • correct functionality • standard interface & control • uniform design & coding style Fixed interface
clk reset k key_IV key_IV_ready d data_out key_IV_write write d eSTREAM data_in full data_in_ready cipher data_in_write
enc_dec Two independent parameters d – number of bits processed per clock cycle (radix) k – number of bits of key/IV loaded per clock cycle + area d encryption/decryption throughput # of pins
setup time # of pins k (key & IV loading + initialization) area
All results generated with k = d Methodology
Specification
Execution Unit Control Unit
Block Algorithmic diagram State Machine
VHDL code VHDL code Methodology & tools
Technology FPGA ASIC
VHDL simulation Aldec Active HDL & debugging ModelSim Xilinx Edition
Logic synthesis Synplicity Synopsys Synplify Pro Design Analyzer v. 8.5 X-2005.9 Implementation Xilinx ISE v. 8.1i (mapping, placing No physical & routing) implementation
All results after All results after placing & routing logic synthesis Assumptions • Only encryption/decryption, no MAC • Maximum allowed key and IV sizes
Cipher22 Key size IV size Internal state size Grain 80 64 160 Mickey-128 128 128 320 Phelix 256 128 288 Trivium 80 80 288 A5/1 64 22 64
• Key and IV need to be reloaded each time either of them changes • No precomputations of internal state outside of the circuit. Choice of hardware architecture Three categories of stream ciphers represented among those implemented Based on Based on LFSRs / NFSRs LFSRs / NFSRs with serial inputs with parallel inputs
NFSR LFSR NFSR LFSR
Grain, Trivium, A5/1 Mickey-128
Based on basic iterative architecture and component operations of block ciphers and hash functions
Phelix, AES in OFB or CTR mode Optimizations for the first group of ciphers Grain
si+81 s i+80 si+80
si si+79 sisi+1 si+79
d=1 d=2
Si+80=si+62 + si+52 + si+38 + si+23 + si+13 + si
Si+81=si+63 + si+53 + si+39 + si+24 + si+14 + si+1 Optimizations for the third group of ciphers Phelix Key mixing Encryption Key mixing Encryption
HB HB HB Block function, HB HB Sharing HB Block function, No sharing
Key mixing Encryption Key mixing Encryption
Half-Block HB HB function, HB Sharing Half-Block function, 2 x number of clock cycles No sharing Results Ease of design as perceived by students based on the specification of each cipher
Average score Number of students ( 5 – very easy, who selected the 1 – very difficult) cipher as their first choice
Trivium 3.36 5
Mickey-128 3.32 3
Grain 3.00 4
Phelix 2.00 0 Throughput vs. area FPGA: Xilinx Spartan 3 family
Throughput [Mbit/s] Best 12000 T64
10000
Trivium 8000
T32 6000
4000 T16 G16 Phelix 2000 Grain AES G1 Worst 0 Mickey-128 0 200 400 600 800 1000 1200 1400 Area [CLB slices] Throughput vs. area: Phelix FPGA: Xilinx Spartan 3 family Throughput [Mbit/s] 1200 Block function, Block function, Sharing No sharing 1000 Half-Block function, No sharing 800 Half-Block function, 600 Sharing
400
200
0 1200 1250 1300 1350 1400 Area [CLB slices] Throughput vs. area: Grain, Mickey-128, Trivium, A5/1 FPGA: Xilinx Spartan 3 family Throughput [Mbit/s] Legend: T64 12000 A – A5/1 G – Grain 10000 M – Mickey-128 T – Trivium
8000
T32 6000
4000 T16 G16 2000 G8 T8 G4 G1 G2 T1-4 M 0 Area 0 50 100 150 200 250 300 350 400 [CLB slices] Throughput vs. area: Throughput up to 3 Gbit/s FPGA: Xilinx Spartan 3 family Throughput [Mbit/s] Legend: T16 3000 A – A5/1 G – Grain G16 2500 M – Mickey-128 T – Trivium
2000 T8 1500 G8
1000 G4 T4
500 G2 T2 A1 G1 A3 T1 M A4 0 Area 0 50 100 150 200 250 300 350 [CLB slices] Optimizations for minimum area FPGA: Xilinx Spartan 3 family Throughput [Mbit/s] Phelix (Half-block, No sharing) 800
700
600
500
400
300 Trivium-1 200 Mickey-128 Grain-1 100 A5/1-1
0 Area 0 200 400 600 800 1000 1200 [CLB slices] Optimizations for minimum area FPGA: Xilinx Spartan 3 family
Area [CLB slices] x cipher area/Grain area 1216 1200 x 9.97
1000
800
600
400 230 188 x 1.89 200 122 x 1.54 57 x 1.00 x 0.47 0 A5/1 Grain Trivium Mickey-128 Phelix (d=1) (d=1) (d=1) (d=1) (d=32, HB-NS) Optimizations for maximum throughput to area ratio FPGA: Xilinx Spartan 3 family Throughput [Mbit/s] Trivium-64 12000
10000
8000
6000
4000
Grain-16 2000 Phelix (Full block, Sharing) A5/1-1 Mickey-128 0 Area 0 200 400 600 800 1000 1200 [CLB slices] Optimizations for maximum throughput to area ratio FPGA: Xilinx Spartan 3 family Throughput/Area [Mbit/sCLB slices] x Trivium ratio / cipher ratio 31.34 30.00 x 1.0 25.00
20.00
15.00
10.00 6.97
5.00 x 4.5 3.05 x 10.3 0.96 0.83 x 32.5 x 37.7 0.00 Trivium Grain A5/1 Phelix Mickey-128 (d=64) (d=16) (d=1) (d=32, FB-SH) (d=1) Setup Time = Key & IV Loading + Initialization Time FPGA: Xilinx Spartan 3 family Setup time [ns] 6000
5000
4000
3000
2000
1000
0 k= 1 2 4 8 16 32 64 1 2 4 8 16 1 3 4 32 Trivium Mickey-128 Grain A5/1 Phelix Setup Time = Key & IV Loading + Initialization Time FPGA: Xilinx Spartan 3 family Clock cycles Nanoseconds
1200 12000
1000 10000
800 8000
600 6000
400 4000
200 2000
0 0 Trivium Mickey-128 Grain A5/1 Phelix (k=1) (k=1) (k=1) (k=1) (k=32) Setup Time = Key & IV Loading + Initialization Time FPGA: Xilinx Spartan 3 family Clock cycles Nanoseconds
400 4000
350 3500
300 3000
250 2500
200 2000
150 1500
100 1000
50 500
0 0 Trivium Mickey-128 Grain A5/1 Phelix (k=64) (k=1) (k=16) (k=1) (k=32) Results for ASICs: TSMC 90 nm library
Relative results comparable to results for FPGAs
Absolute speed increase by a factor from 3 to 10 before ASIC layout synthesis Conclusions • Very large differences among candidate ciphers (much larger than for five final candidates in the AES contest)
Possible reasons: • variety of ciphers based on different design principles • different internal state, key, and IV sizes • early stage of the contest
Trivium and Grain outperform other eSTREAM ciphers in terms of • flexibility • minimum area • maximum throughput to area ratio.
Once again ciphers based on LFSR and NFSRs show their superiority in hardware implementations
Security analysis should focus first on the most efficient ciphers Thank you!
Questions?? ?