<<

Comparison of Hardware Performance of Selected Phase II eSTREAM Candidates

Kris Gaj Gabriel Southern Ramakrishna Bachimanchi & Fall 2006 GMU ECE 545: Introduction to VHDL class

George Mason University Goal

Comparison of Profile II (hardware) Phase 2 Focus candidates: • • Mickey-128 •

Two additional reference points: • A5/1 (old & insecure GSM standard) • AES (compact architecture)

Two hardware technologies:

• Xilinx Spartan 3 FPGAs • TSMC 90 nm standard-cell library ASICs Genesis & approach

• Part of GMU Fall 2006 graduate course ECE 545 Introduction to VHDL

• Individual 6-week project

• 4 students working independently on each eSTREAM cipher

• best code for each algorithm selected at the end of the semester

• selected designs verified and revised in order to assure • correct functionality • standard interface & control • uniform design & coding style Fixed interface

clk reset k key_IV key_IV_ready d data_out key_IV_write write d eSTREAM data_in full data_in_ready cipher data_in_write

enc_dec Two independent parameters d – number of bits processed per clock cycle (radix) k – number of bits of /IV loaded per clock cycle +  area d /decryption throughput # of pins

setup time # of pins k (key & IV loading + initialization) area

All results generated with k = d Methodology

Specification

Execution Unit Control Unit

Block Algorithmic diagram State Machine

VHDL code VHDL code Methodology & tools

Technology FPGA ASIC

VHDL simulation Aldec Active HDL & debugging ModelSim Xilinx Edition

Logic synthesis Synplicity Synopsys Synplify Pro Design Analyzer v. 8.5 X-2005.9 Implementation Xilinx ISE v. 8.1i (mapping, placing No physical & routing) implementation

All results after All results after placing & routing logic synthesis Assumptions • Only encryption/decryption, no MAC • Maximum allowed key and IV sizes

Cipher22 IV size Internal state size Grain 80 64 160 Mickey-128 128 128 320 Phelix 256 128 288 Trivium 80 80 288 A5/1 64 22 64

• Key and IV need to be reloaded each time either of them changes • No precomputations of internal state outside of the circuit. Choice of hardware architecture Three categories of stream ciphers represented among those implemented Based on Based on LFSRs / NFSRs LFSRs / NFSRs with serial inputs with parallel inputs

NFSR LFSR NFSR LFSR

Grain, Trivium, A5/1 Mickey-128

Based on basic iterative architecture and component operations of block ciphers and hash functions

Phelix, AES in OFB or CTR mode Optimizations for the first group of ciphers Grain

si+81 s i+80 si+80

si si+79 sisi+1 si+79

d=1 d=2

Si+80=si+62 + si+52 + si+38 + si+23 + si+13 + si

Si+81=si+63 + si+53 + si+39 + si+24 + si+14 + si+1 Optimizations for the third group of ciphers Phelix Key mixing Encryption Key mixing Encryption

HB HB HB Block function, HB HB Sharing HB Block function, No sharing

Key mixing Encryption Key mixing Encryption

Half-Block HB HB function, HB Sharing Half-Block function, 2 x number of clock cycles No sharing Results Ease of design as perceived by students based on the specification of each cipher

Average score Number of students ( 5 – very easy, who selected the 1 – very difficult) cipher as their first choice

Trivium 3.36 5

Mickey-128 3.32 3

Grain 3.00 4

Phelix 2.00 0 Throughput vs. area FPGA: Xilinx Spartan 3 family

Throughput [Mbit/s] Best 12000 T64

10000

Trivium 8000

T32 6000

4000 T16 G16 Phelix 2000 Grain AES G1 Worst 0 Mickey-128 0 200 400 600 800 1000 1200 1400 Area [CLB slices] Throughput vs. area: Phelix FPGA: Xilinx Spartan 3 family Throughput [Mbit/s] 1200 Block function, Block function, Sharing No sharing 1000 Half-Block function, No sharing 800 Half-Block function, 600 Sharing

400

200

0 1200 1250 1300 1350 1400 Area [CLB slices] Throughput vs. area: Grain, Mickey-128, Trivium, A5/1 FPGA: Xilinx Spartan 3 family Throughput [Mbit/s] Legend: T64 12000 A – A5/1 G – Grain 10000 M – Mickey-128 T – Trivium

8000

T32 6000

4000 T16 G16 2000 G8 T8 G4 G1 G2 T1-4 M 0 Area 0 50 100 150 200 250 300 350 400 [CLB slices] Throughput vs. area: Throughput up to 3 Gbit/s FPGA: Xilinx Spartan 3 family Throughput [Mbit/s] Legend: T16 3000 A – A5/1 G – Grain G16 2500 M – Mickey-128 T – Trivium

2000 T8 1500 G8

1000 G4 T4

500 G2 T2 A1 G1 A3 T1 M A4 0 Area 0 50 100 150 200 250 300 350 [CLB slices] Optimizations for minimum area FPGA: Xilinx Spartan 3 family Throughput [Mbit/s] Phelix (Half-block, No sharing) 800

700

600

500

400

300 Trivium-1 200 Mickey-128 Grain-1 100 A5/1-1

0 Area 0 200 400 600 800 1000 1200 [CLB slices] Optimizations for minimum area FPGA: Xilinx Spartan 3 family

Area [CLB slices] x cipher area/Grain area 1216 1200 x 9.97

1000

800

600

400 230 188 x 1.89 200 122 x 1.54 57 x 1.00 x 0.47 0 A5/1 Grain Trivium Mickey-128 Phelix (d=1) (d=1) (d=1) (d=1) (d=32, HB-NS) Optimizations for maximum throughput to area ratio FPGA: Xilinx Spartan 3 family Throughput [Mbit/s] Trivium-64 12000

10000

8000

6000

4000

Grain-16 2000 Phelix (Full block, Sharing) A5/1-1 Mickey-128 0 Area 0 200 400 600 800 1000 1200 [CLB slices] Optimizations for maximum throughput to area ratio FPGA: Xilinx Spartan 3 family Throughput/Area [Mbit/sCLB slices] x Trivium ratio / cipher ratio 31.34 30.00 x 1.0 25.00

20.00

15.00

10.00 6.97

5.00 x 4.5 3.05 x 10.3 0.96 0.83 x 32.5 x 37.7 0.00 Trivium Grain A5/1 Phelix Mickey-128 (d=64) (d=16) (d=1) (d=32, FB-SH) (d=1) Setup Time = Key & IV Loading + Initialization Time FPGA: Xilinx Spartan 3 family Setup time [ns] 6000

5000

4000

3000

2000

1000

0 k= 1 2 4 8 16 32 64 1 2 4 8 16 1 3 4 32 Trivium Mickey-128 Grain A5/1 Phelix Setup Time = Key & IV Loading + Initialization Time FPGA: Xilinx Spartan 3 family Clock cycles Nanoseconds

1200 12000

1000 10000

800 8000

600 6000

400 4000

200 2000

0 0 Trivium Mickey-128 Grain A5/1 Phelix (k=1) (k=1) (k=1) (k=1) (k=32) Setup Time = Key & IV Loading + Initialization Time FPGA: Xilinx Spartan 3 family Clock cycles Nanoseconds

400 4000

350 3500

300 3000

250 2500

200 2000

150 1500

100 1000

50 500

0 0 Trivium Mickey-128 Grain A5/1 Phelix (k=64) (k=1) (k=16) (k=1) (k=32) Results for ASICs: TSMC 90 nm library

Relative results comparable to results for FPGAs

Absolute speed increase by a factor from 3 to 10 before ASIC layout synthesis Conclusions • Very large differences among candidate ciphers (much larger than for five final candidates in the AES contest)

Possible reasons: • variety of ciphers based on different design principles • different internal state, key, and IV sizes • early stage of the contest

Trivium and Grain outperform other eSTREAM ciphers in terms of • flexibility • minimum area • maximum throughput to area ratio.

Once again ciphers based on LFSR and NFSRs show their superiority in hardware implementations

Security analysis should focus first on the most efficient ciphers Thank you!

Questions?? ?