<<

10/21/20

Lecture 8B

RTL Design Methodology Class Exercise 3 CIPHER Transition from the Pseudocode & Interface to a Corresponding Block Diagram

1 2

RC6 NIST Report: Security Security Margin RC6 (Rivest Cipher 6) - symmetric key

Designers: , Matt Robshaw, Ray Sidney, and Yiqun Lisa Yin MARS High Serpent One of five finalists of the contest for the Advanced Encryption Standard (AES) 1997-2000 Rijndael Adequate RC6 Candidate algorithm in the following other contests: NESSIE – for European standards (2000-2003) CRYPTREC – for Japanese standards (2000-2003) Simple Complex 3 Complexity 4 3 4

Security: Theoretical attacks better NIST Report: Software Efficiency than exhaustive key search Encryption and Decryption Speed

Serpent 9 23 32 32-bit 64-bit DSPs processors processors

Twofish 6 10 16 RC6 Rijndael Rijndael high Twofish Twofish Mars 11 5 16 without 16 mixing rounds Rijndael Mars Mars Mars RC6 RC6 Rijndael 7 3 10 medium Twofish

RC6 15 5 20 low Serpent Serpent Serpent

0 5 10 15 20 25 30 35 # of rounds in the attack/total # of rounds 6 5 6

1 10/21/20

Efficiency in hardware: FPGA Virtex 1000: Speed Pseudocode Throughput [Mbit/s] Split input I into four words, I3, I2, I1, I0, of the size of w bits each 500 431 444 George Mason University A = I3; B = I2; C = I1; D=I0 450 414 University of Southern California B = B + S[0] 400 353 Worcester Polytechnic Institute D = D + S[1] 350 for i = 1 to r do 294 { 300 T = (B*(2B + 1)) <<< k 250 U = (D*(2D + 1)) <<< k A = ((A ⊕ T) <<< U) + S[2i] 177 200 173 C = ((C ⊕ U) <<< T) + S[2i + 1] 149 143 (A, B, C, D) = (B, C, D, A) 150 104 112 102 88 } 100 62 61 A = A + S[2r + 2] 50 C = C + S[2r + 3] O = (A, B, C, D) 0 Serpent Rijndael Twofish Serpent RC6 Mars 8 I8 I1 7

7 8

Notation Operations

w: word size, e.g., w=32 (constant)

k: log2(w) (constant) ⊕ : XOR A, B, C, D, U, T: w-bit variables + : addition modulo 2w I3, I2, I1, I0: Four w-bit words of the input I – : subtraction modulo 2w r: number of rounds (constant) * : multiplication modulo 2w O: output of the size of 4w bits X <<< Y : rotation of X to the left by the number of positions S[j] : 2r+4 round keys stored in two RAMs. given in Y Each key is a w-bit word. X >>> Y : rotation of X to the right by the number of positions The first RAM stores values of S[j=2i], i.e., only round keys given in Y with even indices. The second memory stores values of (A, B, C, D) : Concatenation of A, B, C, and D. S[j=2i+1], i.e., only round keys with odd indices.

9 10

9 10

Modular Arithmetic Circuit Interface

clk reset

4w 4w I O

write_I CIPHER DONE

w Sj

Write_Sj

j m 11 + 4 mod 12 = 3 11 12

11 12

2 10/21/20

Interface Table Protocol (1) An external circuit first loads all round keys S[0], S[1], S[2], …, S[2r+2], [2r+3] to the two internal memories of the CIPHER unit.

The first memory stores values of S[j=2i], i.e., only round keys with even indices. The second memory stores values of S[j=2i+1], i.e. only round keys with odd indices.

Loading round keys is performed using inputs: Sj, j, write_Sj, clk.

Then, the external circuits, loads an input block I to the CIPHER unit, using inputs: I, write_I, clk. Note: m is a size of index j. It is a minimum integer, such that 2m -1 ³ 2r+3. After the input block I is loaded to the CIPHER unit, the encryption starts automatically.

13 14

13 14

Protocol (2) Assumptions

When the encryption is completed, signal DONE becomes active, and the output O changes to the new value of the ciphertext. • 2r+4 clock cycles are used to load round keys to internal The output O keeps the last value of the ciphertext at the output, RAMs until the next encryption is completed. Before the first encryption • one round of the main for loop of the pseudocode executes is completed, this output should be equal to zero. in one clock cycle • you can access only one position of each internal memory of round keys per clock cycle

As a result, the encryption of a single input block I should last r+2 clock cycles.

15 16

15 16

Addition mod 2w XOR

CIPHER: Building Blocks

18

17 18

3 10/21/20

Multiplication mod 2w Variable Rotation X<<

20 19 19 20

Memories of Round Keys – Contents

CIPHER Solutions

22 21 22

Datapath – Memories of Round Keys Datapath – Loop Counter

==r

23 24

23 24

4 10/21/20

Pseudocode Pseudocode

Split input I into four words, I3, I2, I1, I0, of the size of w bits each Split input I into four words, I3, I2, I1, I0, of the size of w bits each

A = I3; B = I2; C = I1; D=I0 A = I3; B = I2; C = I1; D=I0 B = B + S[0] B = B + S[0] D = D + S[1] D = D + S[1] for i = 1 to r do for i = 1 to r do { { T = (B*(2B + 1)) <<< k T = (B*(2B + 1)) <<< k U = (D*(2D + 1)) <<< k U = (D*(2D + 1)) <<< k A = ((A ⊕ T) <<< U) + S[2i] A’ = ((A ⊕ T) <<< U) + S[2i] C = ((C ⊕ U) <<< T) + S[2i + 1] C’ = ((C ⊕ U) <<< T) + S[2i + 1] (A, B, C, D) = (B, C, D, A) (A, B, C, D) = (B, C’, D, A’) } } A = A + S[2r + 2] A = A + S[2r + 2] C = C + S[2r + 3] C = C + S[2r + 3] O = (A, B, C, D) O = (A, B, C, D)

25 26

25 26

Datapath – Initialization & Main Loop Datapath – Finalization

27 28

27 28

Interface with the Division into Timing Analysis – Key Setup & Encryption the Datapath and Controller Notation: TCLK – clock period in ns NM – number of encrypted message blocks I

TKeySetup[cycles] = 2r+4

TKeySetup[ns] = (2r+4)∙TCLK

TEncryption(NM)[cycles] = (r+2) ∙NM

TEncryption(NM)[ns] = (r+2) ∙NM∙TCLK

29 30

29 30

5 10/21/20

Timing Analysis - Throughput Timing Analysis – Critical Path Notation: TCLK-min = dR + dlogic-max + tsetup TCLK – clock period in ns NM – number of encrypted message blocks I Critical Path – a path from an output of a register w – word size to an input of a register with the maximum value of the delay

4 ∙ w ∙NM 4 ∙ w (denoted as dlogic-max) ThrEncryption(NM)[Gbit/s] = = (r+2) ∙NM∙TCLK (r+2) ∙TCLK dR – delay of a register t – setup time of a register r & w constants, determined by the security analysis setup of the cipher For the security equivalent to AES-128, r=20 & w=32

TCLK dependent on w, returned by the FPGA tools 32 based on the static timing analysis of the critical path 31 31 32

Setup & Hold Time of a Register Datapath – Critical Path

tsetup

dR

dlogic-max Setup time - the amount of time the data at the synchronous input must be stable before the next active edge of the clock Hold time - the amount of time the data at the synchronous input must be stable after the last active edge of the clock 33 Critical Path marked in red 34 33 34

6