
10/21/20 Lecture 8B RTL Design Methodology Class Exercise 3 CIPHER Transition from the Pseudocode & Interface to a Corresponding Block Diagram 1 2 RC6 NIST Report: Security Security Margin RC6 (Rivest Cipher 6) - symmetric key block cipher Designers: Ron Rivest, Matt Robshaw, Ray Sidney, and Yiqun Lisa Yin MARS High Serpent Twofish One of five finalists of the contest for the Advanced Encryption Standard (AES) 1997-2000 Rijndael Adequate RC6 Candidate algorithm in the following other contests: NESSIE – for European standards (2000-2003) CRYPTREC – for Japanese standards (2000-2003) Simple Complex 3 Complexity 4 3 4 Security: Theoretical attacks better NIST Report: Software Efficiency than exhaustive key search Encryption and Decryption Speed Serpent 9 23 32 32-bit 64-bit DSPs processors processors Twofish 6 10 16 RC6 Rijndael Rijndael high Twofish Twofish Mars 11 5 16 without 16 mixing rounds Rijndael Mars Mars Mars RC6 RC6 Rijndael 7 3 10 medium Twofish RC6 15 5 20 low Serpent Serpent Serpent 0 5 10 15 20 25 30 35 # of rounds in the attack/total # of rounds 6 5 6 1 10/21/20 Efficiency in hardware: FPGA Virtex 1000: Speed Pseudocode Throughput [Mbit/s] Split input I into four words, I3, I2, I1, I0, of the size of w bits each 500 431 444 George Mason University A = I3; B = I2; C = I1; D=I0 450 414 University of Southern California B = B + S[0] 400 353 Worcester Polytechnic Institute D = D + S[1] 350 for i = 1 to r do 294 { 300 T = (B*(2B + 1)) <<< k 250 U = (D*(2D + 1)) <<< k A = ((A ⊕ T) <<< U) + S[2i] 177 200 173 C = ((C ⊕ U) <<< T) + S[2i + 1] 149 143 (A, B, C, D) = (B, C, D, A) 150 104 112 102 88 } 100 62 61 A = A + S[2r + 2] 50 C = C + S[2r + 3] O = (A, B, C, D) 0 Serpent Rijndael Twofish Serpent RC6 Mars 8 I8 I1 7 7 8 Notation Operations w: word size, e.g., w=32 (constant) k: log2(w) (constant) ⊕ : XOR A, B, C, D, U, T: w-bit variables + : addition modulo 2w I3, I2, I1, I0: Four w-bit words of the input I – : subtraction modulo 2w r: number of rounds (constant) * : multiplication modulo 2w O: output of the size of 4w bits X <<< Y : rotation of X to the left by the number of positions S[j] : 2r+4 round keys stored in two RAMs. given in Y Each key is a w-bit word. X >>> Y : rotation of X to the right by the number of positions The first RAM stores values of S[j=2i], i.e., only round keys given in Y with even indices. The second memory stores values of (A, B, C, D) : Concatenation of A, B, C, and D. S[j=2i+1], i.e., only round keys with odd indices. 9 10 9 10 Modular Arithmetic Circuit Interface clk reset 4w 4w I O write_I CIPHER DONE w Sj Write_Sj j m 11 + 4 mod 12 = 3 11 12 11 12 2 10/21/20 Interface Table Protocol (1) An external circuit first loads all round keys S[0], S[1], S[2], …, S[2r+2], [2r+3] to the two internal memories of the CIPHER unit. The first memory stores values of S[j=2i], i.e., only round keys with even indices. The second memory stores values of S[j=2i+1], i.e. only round keys with odd indices. Loading round keys is performed using inputs: Sj, j, write_Sj, clk. Then, the external circuits, loads an input block I to the CIPHER unit, using inputs: I, write_I, clk. Note: m is a size of index j. It is a minimum integer, such that 2m -1 ³ 2r+3. After the input block I is loaded to the CIPHER unit, the encryption starts automatically. 13 14 13 14 Protocol (2) Assumptions When the encryption is completed, signal DONE becomes active, and the output O changes to the new value of the ciphertext. • 2r+4 clock cycles are used to load round keys to internal The output O keeps the last value of the ciphertext at the output, RAMs until the next encryption is completed. Before the first encryption • one round of the main for loop of the pseudocode executes is completed, this output should be equal to zero. in one clock cycle • you can access only one position of each internal memory of round keys per clock cycle As a result, the encryption of a single input block I should last r+2 clock cycles. 15 16 15 16 Addition mod 2w XOR CIPHER: Building Blocks 18 17 18 3 10/21/20 Multiplication mod 2w Variable Rotation X<<<Y 20 19 19 20 Memories of Round Keys – Contents CIPHER Solutions 22 21 22 Datapath – Memories of Round Keys Datapath – Loop Counter ==r 23 24 23 24 4 10/21/20 Pseudocode Pseudocode Split input I into four words, I3, I2, I1, I0, of the size of w bits each Split input I into four words, I3, I2, I1, I0, of the size of w bits each A = I3; B = I2; C = I1; D=I0 A = I3; B = I2; C = I1; D=I0 B = B + S[0] B = B + S[0] D = D + S[1] D = D + S[1] for i = 1 to r do for i = 1 to r do { { T = (B*(2B + 1)) <<< k T = (B*(2B + 1)) <<< k U = (D*(2D + 1)) <<< k U = (D*(2D + 1)) <<< k A = ((A ⊕ T) <<< U) + S[2i] A’ = ((A ⊕ T) <<< U) + S[2i] C = ((C ⊕ U) <<< T) + S[2i + 1] C’ = ((C ⊕ U) <<< T) + S[2i + 1] (A, B, C, D) = (B, C, D, A) (A, B, C, D) = (B, C’, D, A’) } } A = A + S[2r + 2] A = A + S[2r + 2] C = C + S[2r + 3] C = C + S[2r + 3] O = (A, B, C, D) O = (A, B, C, D) 25 26 25 26 Datapath – Initialization & Main Loop Datapath – Finalization 27 28 27 28 Interface with the Division into Timing Analysis – Key Setup & Encryption the Datapath and Controller Notation: TCLK – clock period in ns NM – number of encrypted message blocks I TKeySetup[cycles] = 2r+4 TKeySetup[ns] = (2r+4)∙TCLK TEncryption(NM)[cycles] = (r+2) ∙NM TEncryption(NM)[ns] = (r+2) ∙NM∙TCLK 29 30 29 30 5 10/21/20 Timing Analysis - Throughput Timing Analysis – Critical Path Notation: TCLK-min = dR + dlogic-max + tsetup TCLK – clock period in ns NM – number of encrypted message blocks I Critical Path – a path from an output of a register w – word size to an input of a register with the maximum value of the delay 4 ∙ w ∙NM 4 ∙ w (denoted as dlogic-max) ThrEncryption(NM)[Gbit/s] = = (r+2) ∙NM∙TCLK (r+2) ∙TCLK dR – delay of a register t – setup time of a register r & w constants, determined by the security analysis setup of the cipher For the security equivalent to AES-128, r=20 & w=32 TCLK dependent on w, returned by the FPGA tools 32 based on the static timing analysis of the critical path 31 31 32 Setup & Hold Time of a Register Datapath – Critical Path tsetup dR dlogic-max Setup time - the amount of time the data at the synchronous input must be stable before the next active edge of the clock Hold time - the amount of time the data at the synchronous input must be stable after the last active edge of the clock 33 Critical Path marked in red 34 33 34 6.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages6 Page
-
File Size-