<<

Self Evaluation : Hierocrypt–3

Toshiba Corporation August 31, 2000

Contents

1 Introduction 2

2 Security 2 2.1 Security against differential and linear ...... 2 2.1.1 Definition of differential and lienar probabilities ...... 2 2.1.2 S-box Property ...... 2 2.1.3 active S-box number ...... 3 2.1.4 Evaluation based on the provable security theorem ...... 3 2.2 SQUAREAttack ...... 4 2.2.1 Definitions ...... 5 2.2.2 SQUAREattack to Hierocrypt–3 ...... 6 2.3 truncated differential attack ...... 8 2.3.1 Preparation ...... 8 2.3.2 Properties of the components ...... 9 2.3.3 Evaluation for multiple rounds ...... 10 2.4 higher-order differential attack ...... 10 2.5 ...... 11 2.6 Impossible differential attack ...... 11 2.7 Non-surjective attack ...... 11 2.8 Mod n attack ...... 11 2.9 χ2 attack ...... 11

3 Software Implementation Evaluation 11 3.1 Evaluation platform and implementation environment ...... 12 3.2 Speed evaluation method ...... 12 3.3 Speed evaluation ...... 13 3.4 Memory evaluation ...... 13

4 Hardware Performance Simulation 13 4.1 Implementation using Standard Cell Technology ...... 13 4.1.1 High Speed Implementation (SC-1) ...... 13 4.1.2 Small Area Implementation(SC-2) ...... 14 4.2 Implementation using FPGA ...... 14 4.2.1 High Speed Implementation(FPGA-1) ...... 14 4.2.2 Small Area Implementation(FPGA-2) ...... 15

1 Toshiba Corporation 2000, All Rights Reserved 1 Introduction

Hierocrypt is a family of block ciphers whose data randomizing parts consist of the nested SPN structure, which is a hierarchical SPN structure where a higher-level S-box consists of the lower-level SP network [11, 10, 9, 8]. It is easy for the nested SPN structure to achieve a sufficient security level against the differ- ential/, as the number of active S-boxes in each level can be assured hierarchically[10]. The most recent versions of Hierocrypt are Hierocrypt–3 (128- block) and Hierocrypt–L1 (64-bit block) [9, 8]. This paper reports the result of our evaluation on security and performance for Hierocrypt–3.

2 Security 2.1 Security against differential and linear cryptanalysis The differential and linear cryptanalyses are effective against general symmetric block , the former of which was proposed by Biham and Shamir [1] and the latter proposed by Matsui [6]. The most important security measure is the number of plaintext- pairs. The number of pairs is known to be the same of the inverse of the maximum differential / linear probability of data randomizaing part removing 2 or 3 rounds of both ends. As ti is difficult to caluculate their exact values, approximate values are often used, which are based on their characteristic probabilities where the summation for intermediate differences or mask patterns are not taken. For Hierocrypt–3 proposed in this paper, the maximum differential and linear characteristic probabilities can be easily evaluated (or bounded) by the minimum number of active S-boxes, as the cipher consists of the nested SPN structure. Furthermore, we found that the provable security for a two-round SPN structure proven by Hong et al [3, 12] is applicable to two consecutive rounds of Hierocrypt–3. The provable security leads to the rigid upper bound of the maximum differential and linear probabilities. We will show the result of evaluation and proper numbers of rounds for respective sizes.

2.1.1 Definition of differential and lienar probabilities The maximum differential probability for the function f is given as follows.

# {x|f(x) ⊕ f(x ⊕ ∆x)=∆y} dpf ≡ max . (1) ∆x=0,∆y 2n Similarly, the maximum linear probability for the function f is given as follows.    2 f  # {x|x · Γx = f(x) · Γy}  lp ≡ max 2 · − 1 . (2) Γx,Γy=0 2n The maximum linear probability is defined so that the optimal value is the same as that for the maximum differential probability.

2.1.2 S-box Property The S-box map of Hierocrypt–3 is equivalent to the combination of the following three transformations.

(a) bit permutation (b) power operation x247 over GF(28) (c) Affine transformation ax + b over GF(28)

As the distributions of differential and linear probabilities are invariant for the transformations (a) and (c),

dpS = lpS =2−6 . (3)

2 2.1.3 active S-box number At first we consider about the S-box number for differential. When a higher-level S-box XS is active, that is, there is at least one bit whose differential value is not 0, no less than 5 (lower-level) S-boxes are active. And two consecutive rounds have no less than 5 active higher-level S-boxes (XS). Therefore, two consecutive rounds contain no less than 25 active S-box as shown in Proposition. 2 of [11, 10]. Finally, the maximum differential characteristic probability of two consecutive rounds of Hierocrypt–3 DP 2R is bounded as follows.

 25  25 DP 2R ≤ dpS = 2−6 =2−150 . (4) A similar result is obtained for the maximum linear probability by the substitution : input differential bit → output mask bit. That is, the minimum active S-box number in two consecutive rounds is 25, and its maximum characteristic probability is bounded by 2−150.

 25  25 LP 2R ≤ lpS = 2−6 =2−150 . (5) When the differential cryptanalysis is applied, 2-round or 3-round attack is used. Therefore, an appropriate round number is the addition between the round number which has a sufficiently small characteristic probability and 2 or 3. As one round of Hierocrypt–3 corresponds to two rounds of the usual cipher, four rounds are regarded as sufficient for the additional rounds. Therefore, four (=2+2) rounds are sufficient for a 128-bit key. However, the round number should be longer for a longer key, as more can be used for exhaustive search. Then, we should set an appropriate round numbers for 192-bit and 256-bit keys, respectively. We assume that the cryptanalysable round number increases by one by increasing the key search for one higher-level S-box. As the key bit number for one higher-level S-box is 64 (= 32 × 2), we consider that one round should be incereased for 192-bit key and that two rounds should be incereased for 256-bit key, for the standard round number for 128-bit key. Table 1 shows appripriate round numbers for three kinds of key length.

Table 1: Appropriate minimum round numbers determined based on the minimum active S-box number

key length round number prob. 128-bit 4 2−150 192-bit 5 2−180 256-bit 6 2−300

2.1.4 Evaluation based on the provable security theorem Hong et al. proves the following theorem about the security of 2-round SPN structure with an MDS diffusion layer [3]. Theorem 1 Consider a 2-round SPN structure (SPS) with n parallel S-boxes in one layer which satisfies the following conditions. • the extended keys are independent and do not have any biases. • the branch number of diffusion layer is n + 1 (MDS) • the maximum differential(linear) probability is dp (lp). Then, the SPS structure’s maximum differential(linear) probability does not exceed dpn (lpn). ¶ Next, consider two rounds of Hierocrypt–3 (SPSP). As the second diffusion layer does not change the distribution of differential(linear) probability, the maximum differential(linear) probability for two rounds (SPSP) is the same as that for SPS in the higher-level. As the branch number of higher-level SPS, its maximum differential(linear) probability is bounded by (dpXS)4 (lpXS)4 when the maximum differential(linear) probability for the higher-level S-box (XS)isdpXS (lpXS). Similarly, as XS consists

3 of a lower-level SPS with brach number 5, its maximum differential(linear) probability is bounded as follows.    4  4 dpXS ≤ dpS = 2−6 =2−24   lpXS ≤ 2−24 Therefore, the maximum differential(linear) probability for two rounds of Hierocrypt–3 does not exceeds  4 2−24 =2−96. In summary, if there is no defficiency in the key scheduling, the minimum plaintext- ciphertext number for differential(linear) cryptanalysis against four-round Hierocrypt–3 is about 296, when two-round attack is used. But, probability 2−96 does not means that differential(linear) cryptanalysis is not applicable. We give an approximate evaluation of the maximum probability for 3 and 4 rounds in the following. We already know that the maximum probability for two rounds is bounded by 2−96. When the round number is three, at least one higher-level S-box XS of the additional round, whose maximum probability is bounded by 2−24. Thus, an approximate evaluation of the maximum probability is given as 2−96 × 2−24 =2−120. This probability is not enough to deny the applicability of the differential(linear) cryptanalysis. For four-round Hierocrypt–3, additional rounds in both ends have at least one active XS. Therefore, the maximum probability for four rounds is approximately bounded by 2−96 × 2−24 × 2−24 =2−144. Thus, six-round Hierocrypt–3 is considered to be sufficiently secure against two-round attack of the differetial(linear) cryptanalysis. By following the cosideration of the increased round number for increased key length, we cosider that the approriate round number for a 192-bit (256-bit) key is 7 (8).

Table 2: Approriate minimum round number based on provable security

key length miniumu round number probability bound − 4 2−96 − 5 2−120 128-bit 6 2−192 192-bit 7 2−216 256-bit 8 2−288

The evaluation of differential(linear) probability is considered to be more precise than the evaluation of full characteristic evaluation.

2.2 Attack The SQUAREattack is a chosen-plaintext attack, which is applied to the SQUAREcipher and other SQUARE-like ciphers. The basic attack and its extensions of applicable round numbers by key estimation have been proposed[2]. The manner of counting rounds in Hierocrypt–3is different from that of the SQUAREcipher. To avoid confusion, we introduce a definition of a number of layers instead of the numerb of rounds. The number of layers is defined as a number of S-box layers. That is, in Hierocrypt–3, two layers correspond to a round. On the other hand, in the SQUAREcipher, a layer corresponds to a round. The SQUAREattack is effective up to 6 layers for 128-bit key SQUAREcipher and Rijndael cipher[2], and up to 7 layers for 192-bit key and 256-bit key Rijndeal cipher[5]. However, we confirmed that it is effective just up to 4 layers for 128-bit Hierocrypt–3and 5 layers for 192-bit and 256-bit Hierocrypt–3. Because the Hierocrypt–3is designed as 12 layers, 14layers and 16 leyers for 128-bit, 192-bit and 256-bit key, respectively, it has enough security against the SQUAREattack.

4 2.2.1 Definitions State byte SQUARE-like ciphers has a non-linear layer composed of sixteen 8-bit input/output S- boxes. Each 8bit correspoding to a S-box is called a state byte. Ecah state byte takes 256 possible values.

Λ set According to the definiton of Daemen et al., we introduce a definition of a Λ set as follows. 1. Elements of a Λ set are states of a system composed of 16 state bytes. 2. A Λ set is a set of 256 states. 3. By restricting elements of a Λ set to each state byte, a state byte takes either all 256 psooible state or a fixed state.

Active byte and Passive byte By restricting elements of a Λ set to a state byte, if the state byte takes two or more states, we call it as an active byte. If the state byte takes only one state, we call it as a passive byte.

Baancedness over a Λ set Corresponding to all 256 states belonging to a Λ set in some layer, if a exclusive or of all states of some state byte in some layer vanishes, we call the state byte as balanced over the Λ set.

Proposition 1 A transformation layer composed of bijective mappings for every state bytes maps a Λ set to a Λ set.

For examples, a non-linear transformation layer composed of a bijective S-boxes, this non-linear transformation layer maps a Λ set ot a Λ. A key addition llayer by bitwise exclusive or maps a Λ set ot a Λ set.

Proposition 2 For a Λ set in some layer, the active bytes in the layer are balanced over the Λ set.

Proposition 3 A transoformation layer composed of bijective mappings for every state bytes maps state bytes balanced over a Λ set to state bytes balanced over the Λ set.

For example, a non-linear transformation layer is composed of bijective S-boxes, this layer maps state bytes balanced over a Λ set to byte states balanced over a Λ set. A key addition llayer by bitwise exclusive or maps state bytes balanced over a Λ set ot state bytes balanced over the Λ set.

Proposition 4 For a linear transformation composing a linear transofrmation layer, if all the active bytes input to the layer are balanced over a Λ set, state bytes output from the layer are balanced over the Λ set.

Proposition 5 If an input to some linear transformation layer is a Λ set, all the active bytes of the output from the layer balanced over the Λ set.

Basic attack Consider a Λ set composed of a single active state byte and other 15 passive state bytes in input of the first layer. Assume that the Λ set transits as a Λ set until some layer. By Proposition 2, a state byteinput into a linear transformation composing the linear transformation layer is balanced over the Λ set. By proposition 3, state bytes output from the linear transformation layer are also balanced. If are given at the layer just after passing through following linear transformation layer and key addition layer, by estimating the round key of the key addition layer, decrypting backward until the output of the linear transformation layer and checking whether the state byte is balanced over the Λ set or not.

5 Type 1 Extension We call an extension which is applicable to a case which is extended by adding one extra layer at the end layer, by estimating a round key of the additional final layer as an extension of type 1.

Type 2 Extension We call an extension which is applicable to a case which is extended by adding one extra layer at the begining layer, by estimating a round key of the additional first layer as an extension of type 2. In the following, generalizing these definitions, we call an extension by adding an extra final layer and extimating a round key of the layer as type 1 extension and an extension by adding an extra initiall layer and extimating a round key of the layer as type 2 extension. Table 3 shows the efficiency of the SQUAREattack to 128-bit key SQUAREcipher. There is no SQUAREattacks applicable to more layers than the attacks in the table[2]. In the table, the number of layers means that of S-box layers.

Table 3: SQUAREattack to SQUAREcipher Type of Attack  Layers  Plaintexts  Operations Memory Size Basic 4 29 29 small Basic+Type 1 extension 5 211 240 small Basic+Type 2 extension 5 232 240 232 Basic+Type 1 extension 6 232 272 232 +Type 2 extension

2.2.2 SQUARE attack to Hierocrypt–3 Basic Attack Because Hierocrypt–3has a nested SPN structure, two layers of the SQUAREcipher correspond to a layer of this cipher. Therefore, we can define two types of basic attacks, corresponding to a case when an function starts from the first layer of key addition and a case when an encryption function starts from the second layer of key addition. We call these as basic attack 1 and basic attack 2, respectively. Unlike the SQUAREcipher, Hierocrypt–3 has two kinds of linear transformation layers. Because of this, a Λ set in the initial layer is broken in the earlier layer. To be concrete, in both the basic attack 1 and basic attack 2, at the input of the third S-box layer, state bytes are balanced over the Λ set, however, the Λ set is broken. Furthermore, at the output of the third S-box layer, because of the non-linearlity of the S-boxes, the state byte is not balanced over the Λ set. Therefore, both the basic attack 1 and the basic attack 2 are applied to reduced versions with three S-box layers. The required number of known plaintexts is 29. This is because the required number of the 1 n 8 Λ sets of known plaintexts, denoted as n, must satisfy ( 256 ) × 2 < 1 to detemin the estimated one byte key uniquely.

Type 1 Extension A type 1 extension of the basic attack 1 is applied to a reduced version of 4 S-box layers. The total amount of the estimated keys is (1 + 4) × 8 = 40 bits. The required number of Λ set 1 n 40 of known plaintexts, n, must satisfy ( 256 ) × 2 < 1. Therefor, n>5 and it is n = 8 by round off to power of two. A double of type 1 extensions for the basic attack 1 is applied to a reduced version of 5 S-box layers. The total amount of the estimated keys is (1 + 4 + 16) × 8 = 168 bits. The required number of Λ set of 1 n 168 known plaintexts, n, must satisfy ( 256 ) × 2 < 1. Therefor, n>21. A triple of type 1 extensions for the basic attack 1 is applied to a reduced version of 6. Because the total amount of the estimated keys is (1 + 4 + 16 + 16) × 8 = 296 bits, it is not more efficient than the brute-force attack. A type 1 extension for the basic attack 2 is applied to a reduced version of 4 S-box layers. In the final round, at least 9 bytes of extended keys are corresponding to state bytes which are influenced by

6 the estimated key in the preceeding layer. Because the key esitmation for these keys are needed, the total amount of the estimated keys is not less than (1 + 9) × 8 = 80 bits. Therefor, the required number of Λ set of known plaintexts, n, must satisfy n>9. A double of type 1 extensions for the basic attack 2 is applied to a reduced version of 5 S-box layers. The total amount of the estimated keys is (1 + 4 + 16) × 8 = 208 bits. The required number of Λ set of 1 n 208 known plaintexts, n, must satisfy ( 256 ) × 2 < 1. Therefor, n>26. This attack is effective only for 256-bit key Hierocrypt–3. A triple of type 1 extensions for the basic attack 2 is applied to a reduced version of 6. Because the total amount of the estimated keys is not less than (1 + 4 + 16 + 16) × 8 = 336 bits, it is less effective than the brute-force attack even for 256-bit key Hierocrypt–3.

Type 2 Extension A type 2 extension for the basic attack 1 is applied to a reduced version of 4 S-box layers. Because additional 8 bytes of keys in the first key addition must be estimated, the total amount of the estimated keys is (1 + 16) × 8 = 136 bits. This attack is not more effective than the brute-force attack for 128-bit Hierocrypt–3. The required number of Λ sets are constructed by chosing from 2128 known plaitexts. A double of type 2 extensions for the basic attack 1 is applied to a reduced version of 5 S-box layers. Because the total amount of the estimated keys is not less than (1 + 16 + 16) × 8 = 264 bits, it is not more efficient than the brute-force attack even for 256-bit key Hierocrypt–3. A type 2 extensions for the basic attack 2 is applied to a reduced version of 54S-box layers. The total amount of the estimated keys is (1 + 4) × 8 = 40 bits. The required number ofΛ sets are constructed by chosing from 232 known plaitexts. A double of type 2 extensions for the basic attack 2 is applied to a reduced version of 5 S-box layers. The total amount of the estimated keys is (1 + 4 + 16) × 8 = 168 bits. The required number of Λ sets are constructed by chosing from 2128 known plaitexts. This attack is effective only for 192-bit and 256-bit key Hierocrypt–3. A triple of type 2 extensions for the basic attack 2 is applied to a reduced version of 6 S-box layers. Because the total amount of the estimated keys is (1 + 4 + 16 + 16) × 8 = 296 bits, it is not efficitive even for 256-bit key Hierocrypt–3.

Combination of Both Extension A combination of a type 1 extension and a type 2 extension for the basic attack 1 is applied to a reduced version of 5 S-box layers. The total amount of the estimated keys is (1 + 4 + 16) × 8 = 168 bits. The required number of Λ sets are constructed by chosing from 2128 known plaitexts. A combination of a double of type 1 extensions and a type 2 extension for the basic attack 1 is applied to a reduced version of 6 S-box layers. Because the total amount of the estimated keys is (1 + 4 + 16 + 16) × 8 = 296 bits, it is not more efficient than the brute-force attack. A combination of a type 1 extension and a double of type 2 extensions for the basic attack 1 is applied to a reduced version of 6 S-box layers. Because the total amount of the estimated keys is (1 + 4 + 16 + 16) × 8 = 296 bits, it is not more efficient than the brute-force attack. A combination of a type 1 extension and a type 2 extension for the basic attack 2 is applied to a reduced version of 5 S-box layers. The total amount of the estimated keys is not less than (1+4+16)×8= 168 bits. The required number of Λ sets are constructed by chosing from 232 known plaitexts. A combination of a double of type 1 extensions and a type 2 extension for the basic attack 2 is applied to a reduced version of 6 S-box layers. Because the total amount of the estimated keys is not less than (1 + 4 + 16 + 16) × 8 = 296 bits, it is not more efficient than the brute-force attack even for 256-bit key Hierocrypt–3. A combination of a type 1 extension and a double of type 2 extensions for the basic attack 2 is applied to a reduced version of 6 S-box layers. Because the total amount of the estimated keys is not less than (1 + 4 + 16 + 16) × 8 = 296 bits, it is not more efficient than the brute-force attack even for 256-bit key Hierocrypt–3. Tabel 4 summarizes the above results. The SQUAREattack to Hierocrypt–3 is applicable to reduced

7 versions of up to 4 layers for 128-bit key Hierocrypt–3and 5 layers for 192-bit and 256-bit key Hierocrypt– 3. There are not other effective attacks than those in this table.

Table 4: SQUAREattack to Hierocrypt–3 Type of Attack layer Plaintext Operation Memory Size basic 1 3 29 29 small basic 1+type 1 extension 4 211 240 small basic 1+type 1 extension × 2 5 213 2168 213 basic 1+type 2 extension 4 2128 2136 2128 basic 1+type 1 extension 5 2128 2168 2128 +type 2 extension basic 2 3 29 29 small basic 2+type 1 extension 4 ≥ 212 ≥ 272 ≥ 212 basic 2+type 1 extension × 2 5 ≥ 213 ≥ 2208 ≥ 213 basic 2+type 2 extension 4 232 240 232 basic 2+type 2 extension × 2 5 2128 2168 2128 basic 2+type 1 extension 5 232 2168 232 +type 2 extension

2.3 truncated differential attack Diffentials with only one bit difference are regarded as different in the differential cryptanalysis. That is, there are 2N distinct differentials when the block is N-bit length. To the contrary, differentials are distinguished by word-wise zero-nonzero pattern in the truncated differential attack, where the word size is frequently that of the S-box. Therefore, when the S-box size is 8-bit, differentials are classified into 2N/8 patterns in the truncated differential. As The truncated differential is invariant for the (bijective) S-box map and the bit-wise key addition, the encryption is characterized only by the truncated differential transition probabilities for the diffusion layers. And the transition probability for multiple rounds is given as the product sum of those for diffusion layers. Therefore, the truncated differential attack is considered to be effective if all the functions in encryption are carried out by word-wise operations. As Hierocrypt–3 consists only of byte-wise operations, its security against the truncated differential attack is indispensable.

2.3.1 Preparation The byte-wise truncated differential is defined, as all functions of Hierocrypt–3 encryption are done only by byte-wise operations. The truncated differential for 8m-bit data differential, ∆X(8m) is defined by χ ∆X(8m) as follows.   χ ∆X(8m) = δ(∆x1(8))δ(∆x2(8))···δ(∆xm(8)) ,    1 , for ∆x(8) =0 , δ ∆x(8) = 0 , for ∆x(8) =0. Let Pr (χ (∆ X) → χ (∆Y )) be the truncated differential probability for the transition χ (∆X) → χ (∆Y ). Let η X(32) be a truncated Hamming differential for a 32-bit data X(32), which is regarded as the Hamming weight of truncated differential χ ∆X(32) .

  4   η ∆X(32) = δ ∆xi(8) . i=1 The truncated Hamming differential probability is defined as follows.

8 Table 5: truncated Hamming differential probabilities for mdsL-function (power of 2 approximation)

η(∆Y(32)) 0 1 2 3 4 0 1 0 0 0 0 1 0 0 0 0 1 −8 η(∆X(32)) 2 0 0 0 2 1 3 0 0 2−16 2−8 1 4 0 2−24 2−16 2−8 1

            Pr η ∆X(32) → η ∆Y(32) = max Pr χ ∆X(32) → χ ∆Y(32) .   χ(∆X(32)),η(∆X(32) )=η(∆X(32)),   χ(∆Y(32)),η(∆Y(32))=η(∆Y(32))

The truncated Hamming differential and the truncated Hamming differential probability can be naturally generalized for a 32m-bit data as follows.   m   m−1−i η ∆X(32m) = η ∆Xi(32) 5 , i=1

     m      Pr η ∆X(32m) → η ∆Y(32m) = Pr η ∆Xi(32) → η ∆Yi(32) . i=1 The truncated Hamming differential probability is equal to the truncated differential probability for the mdsL-function and the MDSL-function.           Pr χ ∆X(32) → χ ∆Y(32) =Pr η ∆X(32) → η ∆Y(32) ,           Pr χ ∆X(128) → χ ∆Y(128) =Pr η ∆X(128) → η ∆Y(128) . These equations make the security evaluation of Hierocrypt–3 against truncated differential much simpler.

2.3.2 Propertiesof the components S-box The S-box is required to be a random bijective function. The S-box of Hierocrypt–3 has the theoret- ically minimum differential and linear probabilities, its algebraic order is seven, and the term number in polynomial expression is sufficiently large. Therefore, the S-box can be regarded as a random bijective function. mdsL-function The mdsL-function is an MDS map for four parallel 8-bit words. This property uniquely determines the truncated differential probability of mdsL-function, and leads to the fact that truncated Hamming differential is equal to truncated differential [13]. Figure 5 shows approximate values (powers of 2) for the truncated differential probabilities. MDSH-function The MDSH-function consists only of byte-wise exclusive or’s. Approximate values (powers of 2) of truncated differential are obtained by Matsui’s algorithm [7].

9 2.3.3 Evaluation for multiple rounds

MDSH-functions and the MDSL-functions are put in alternate layers in Hierocrypt–3 except for the S-boxes and the key additions. As previously stated, the truncated differential probability is equal to the truncated Hamming differential probability for MDSL-function. Thus, when the both ends of sequence are MDSL-functions (LHL···HL) the maximum characteristic truncated differential probability can be derived only by the truncated Hamming differential probabilities of MDSL-function and Mdsh-function. The use of truncated Hamming differential probabilities reduces the size of MDSH transition table. The size of transition probability table is about 232 for truncated differential probability. On the contrally, the tabel size for truncated Hamming differential probabilities is much smaller and about 58  218.58. The following is the process of analysis.

1. Make the truncated Hamming differential probability table for mdsL-function

2. Make the truncated Hamming differential probability table for MDSH-function

3. Make the truncated Hamming differential probability table for LH (MDSH-function after MDSL- function) 4. Make truncated Hamming differential probabilty for t times of (LH) 5. Make (t + 1)-round truncated Hamming differential probability table by multipling the preceding result and L 6. Fix the round number where the truncated Hamming differential probability table is the same as that for the randome function.

We confirm that the truncated differential characteristic probability table for three consecutive rounds (LHLHL) is the same as that for random function. Therefore, 5-round Hierocrypt–3 is considered to be sufficiently secure against the two-round attack of truncated differential cryptanalysis.

2.4 higher-order differential attack The higher-order differential attack is an algebraic attack, where some extended key bits are obtained by solving the equation, which is derived by the following properties for the Boule function whose algebraic order is d [4].

• All (d + 1)-th order differentials are 0 • all d-th order differentials are constants.

The security against the higher-order differential attack is assured by showing that there is no effi- cient set of plaintexts. But, it is difficult to assure analytically that the condition is satisfied, heuristic conditions are applied in designing to reduce the applicability. The following are such fundamental conditions. • the analytical property of the S-boxes is optimized, which are the only nonlinear components • the property of diffusion layers is optimized, which are linear As for Hierocrypt–3, the algebraic order of S-box is seven, which is the highest value for 8-bit surjective functions, and bit permutation is inserted to increase the complexity of algebraic structure. Furthermore, the differential layer is an MDS map where the data is sufficiently mixed, and is determined such that the combined function with the S-box has the number of terms in the polynominal expression is maximum. Therefore, we consider that the applicability of the higher-order differential attack is sufficiently low.

10 2.5 Interpolation attack The interpolation attack is an attack where encryption function is guessed by determining all coefficients of polynomial expression for the encryption function. The security against the interpolation attack is estimated by the number of terms in the polynomial expression for the encryption function. This attack is effective when the number of terms is sufficiently small for the polynomial expression over GF(28). As forHierocrypt–3, a power operation over GF(28) is used to make the S-box. But, the bit permutation at the input side is also used, thus the number of terms in polynomial expression is sufficiently large. Furthermore, the combined function of S-box and lower-level diffusion is designed so that the number of terms in the polynomial expressions is maximum. Thus, a simple application of the interpolation attack is considered to be ineffective for Hierocrypt–3.

2.6 Impossible differential attack Impossible differential attack is an attack where estimated extended key-patterns are narrowed down by discarding ones which lead to intermediate impossible differential patterns. The number of impossible differential patterns is tends to decrease rapidly, when the connetions of diffusion layers are dense. The most important difference between Hierocrypt–3 and Rijndael is the higher-level diffusion layer. For Rijndael, one byte differential spreads to all bytes after two diffusion layers, but through only one path. Therefore, when one byte is active and the others are not active, all bytes are active for the layer after two diffusion layers. To the contrary, the diffusion layer MDSH of Hierocrypt–3 is designed such that one byte is connected with all bytes on the layer after two diffusion layers through more than one paths. As all bytes there can take zero differential, and there exist many possible differential patterns, Hierocrypt–3 is considered to be much securer than Rijdael against the impossible differential attack. Therefore, Hierocrypt–3 is considered to be secure against the impossible differential attack, as the impossible differential attack can not attack full-round Rijndael.

2.7 Non-surjective attack The non-surjective attack uses patterns which can be realized because of the non-surjective property of components in encryption. As all components of Hierocrypt–3 is bijective (i.e. surjective), the attack is not applicable to Hierocrypt–3.

2.8 Mod n attack This attack uses the bias of possible bit patterns arising from the local non-surjectivity. As all components of Hierocrypt–3 are bijective (of course, surjective), the attack is not applicable to Hierocrypt–3.

2.9 χ2 attack In the χ2 attack, the transition probability distribution bias between certain input and output bit-sets is searched both theoretically and numerically at first. Then the feasibility of estimated key is determined by the χ2-test for the bias. As Hierocrypt–3 does not use operations with a high bit correlation bias such as multiplication used in MARS, it is considered to be secure against the χ2 attack.

3 Software Implementation Evaluation

In this section, we discuss the software implementation evaluation of Hierocrypt–3. More specifically, we describe the following items: encryption speeds, memory requirement (e.g. code size, work area), optimization, language and platforms for evaluation. In this evaluation, Pentium III is used as a CPU, Microsoft Windows 95 or upper is used as an OS, C language is used for coding, and speed-oriented optimization (/O2) is used.

11 #define CPUID __asm __emit 0fh __asm __emit 0a2h #define RDTSC __asm __emit 0fh __asm __emit 031h __asm { pushad CPUID RDTSC mov cycles_high1, edx mov cycles_low1, eax popad } for(i=0; i

Figure 1: Piece of speed evaluation program

3.1 Evaluation platform and implementation environment Table 6 shows the platform for evaluation of software implementation including the language and the developing environment.

Table 6: Evaluation platform specification Machine EQUIUM 5000 P55 CPU Pentium III 550 MHz CACHE(32KB), 2nd CACHE(512KB) memory 256MB(SDRAM 100MHz) OS Windows 2000 Professional build 2195 language C developing environment Visual C++ 6.0 SP3 optimization speed (/O2)

3.2 Speed evaluation method At first, we briefly describe how to evaluate the speed. By using the Time Stamp Counter (TSC) embed- ded in Pentium III, the number of cycles required for the following processes are measured: key scheduling, data encryption, and data decryption. In order to remove the ambiguity of measurement, a piece of speed evaluation program source is shown in Figure 1. The term “required CPU cycles/BENCH_COUNT” means the cycle number needed to 1 block operation including the function call, which gives a through- put corresponding to the CPU clock.

12 3.3 Speed evaluation Table 7 shows the least number of cycles in ten trials to carry out 1,000,000 block of Hierocrypt–3 in ECB mode for each key length. Table 8 shows the throughputs for respective evalu- ations on Pentium III (550MHz). For decryption, iterated loop description was more effective.

Table 7: Speed evaluation (cycles) unroll roll algorithm keylength key scheduling encrypt (decrypt)∗ encrypt decrypt remarks 128 370.0 600.5 (1012.0)∗ 615.0 1012.0 Hierocrypt–3 192 386.5 710.0 (1251.0)∗ 722.5 1251.0 256 468.0 808.0 (1420.0)∗ 848.0 1420.0 128 2402 952.0 914.0 - – 192 2449 952.0 914.0 – – [14] 256 2349 952.0 914.0 – –

(units: cycles) ∗: Loop expansion does not improve the speed

The decryption process with ∗ needs calculation nearly twice as much as the encryption, because the extended key for encryption is used. If a key scheduling for decryption is used, the decryption can be faster.

Table 8: Speed Evaluation (throughput) key scheduling unroll roll keylength (Mkeys/sec) encrypt decrypt encrypt decrypt 128 1.49 117.24 69.57 114.47 69.57 192 1.42 99.15 56.27 97.44 56.27 256 1.18 87.13 49.58 83.02 49.58

(unit: Mbps @ 550MHz)

3.4 Memory evaluation Table 9 shows the memory requirements, where the key scheduling part contains key generation codes and operation loops for respective key sizes: 128, 192, 256 bits, and where the loop of round function is not expanded. Work areas of respective processes are roughly estimated as 80 bytes, because 20 32-bit words are used in C language description. In these implementations, some redundant tables are held, as the source code was designed so that the needed memory size stays within 32KB and the size for either encryption or decryption stays within 16KB, which is the CACHEsize of Pentium III. And the table size can be reduced.

4 Hardware Performance Simulation 4.1 Implementation using Standard Cell Technology 4.1.1 High Speed Implementation (SC-1) 1. Semiconductor Technology 0.14 µ m 3 layer metal CMOS

13 Table 9: Memory requirements operation code size table size work area remarks key scheduling 7,360 1,120 80 + 216 unroll encryption 1,456 13,312 80 roll decryption 2,419 13,312 80 roll total 11,235 25,648 456 duplication removed

(unit: bytes) Note: Work area is roughly estimated by the number of used variables. part includes storage region for extended keys. Code segmentation is neglected for code size estimation. Table size estimation includes duplication between the functions.

2. Synthesis SYNOPSYS Design Compiler 1999.10-3 3. Simulation Condition(Commercial Worst-case) 1.35V, 70 degrees C(1.5V, 25 degrees C, typical-case) 4. Throughput 897Mb/s(126.1MHz, 18 clock, 8 Round) 5. Gate Count 81.5K gates

4.1.2 Small Area Implementation(SC-2) Small area implementation by sharing SBOX and MDSL.

1. Semiconductor Technology 0.14 µ m 3 Layer metal CMOS 2. Synthesis SYNOPSYS Design Compiler 1999.10-3 3. Simulation Condition(Commercial Worst-case) 1.35V, 70 degrees C(1.5V, 25 degrees C, typical-case) 4. Throughput 84.6Mb/s(185.1MHz, 280 clock, 8 Round) 5. Gate Count 26.7K gates

4.2 Implementation using FPGA 4.2.1 High Speed Implementation(FPGA-1) 1. Synthesis ALTERA Max+plus II ver. 9.6

14 Table 10: Summary of hardware implementations Round Clock data Logic Implementation Block size Gate Count (Clock) F requency T hroughput Cells SC-1 128 bit 8(18) 126.1 MHz 897 Mb/s 81.5K - SC-2 128 bit 8(280) 185.1 MHz 84.6 Mb/s 26.7K - FPGA-1 128 bit 8(18) 7.39 MHz 52.5 Mb/s - 22.7K FPGA-2 128 bit 8(280) 8.95 MHz 4.1 Mb/s - 6.3K

2. Throughput 52.6Mb/s(7.39MHz, 18 clock, 8 Round) 3. Logic Cells 22.7K Logic Cells(5 devices), ALTERA Flex 10K family (EPF10K130VGC599-2, EPF10K100ABC600- 1, EPF10K50VBC-1, EPF10K250AGC599-1, EPF10K100AFC484-1)

4.2.2 Small Area Implementation(FPGA-2) 1. Synthesis ALTERA Max+plus II ver. 9.6 2. Throughput 4.1Mb/s(8.95MHz, 280 clock, 8 Round) 3. Logic Cells 6.3K Logic Cells, ALTERA Flex 10K family (EPF10K250AGC599)

References

[1] E. Biham and A. Shamir. Differential cryptanalysis of des-like cryptosystems. Journal of Cryptology, Vol. 4, No. 1, pp. 3–72, 1991.

[2] J. Daemen, L.R. Knudsen, and V. Rijmen. The SQUARE. In Fast Software Encryption (4), LNCS 1267, pp. 149–165, 1997. [3] S. Hong, S. Lee, J. Lim, J. Sung, and D. Cheon. “Provable security against differential and linear cryptanalysis for the SPN structure”. In Fast Software Encryption 2000, LNCS 1636. Springer-Verlag, 2000. [4] L.R. Knudsen and T.A. Berson. “Truncated differentials of SAFER”. In Fast Software Encryption (5), pp. 15–25, LNCS 1039, 1996. [5] S. Lucks. “Attacking seven rounds of Rijndael under 192-bit and 256-bit keys”. In The third AES Conference, 2000. [6] M. Matsui. Linear cryptanalysis method for des cipher. In Eurocrypt’93, LNCS 765, pp. 386–397. Springer Verlag, 1994. [7] M. Matsui. Cryptanalysis of a reduced version of the block cipher . Fast Software Encryption’99, Vol. LNCS 1636, , 1999. [8] H. Muratani, K. Ohkuma, F. Sano, M. Motoyama, and S. Kawamura. Proposition of a 64-bit version of Hierocrypt. Technical Report of IPSJ(Japan) CSEC11-8, Vol. 11(this article), No. 8, 2000.

15 [9] K. Ohkuma, H. Muratani, F. Sano, M. Motoyama, and S. Kawamura. A revised nested SPN cipher. Technical Report of IPSJ(Japan) CSEC11-7, Vol. 11(in this volume), No. 7, 2000. [10] K. Ohkuma, H. Muratani, F. Sano, and S. Kawamura. The block cipher Hierocrypt. In Selected Areas in 2000, 2000. [11] K. Ohkuma, H. Muratani, F. Sano, and S. Kawamura. Specification and assessment of the cipher Hierocrypt. Technical Report of IEICE(Japan) ISEC2000-7, 2000. [12] H. Shimizu. private communication, 2000. [13] M. Sugita, K. Kobara, K. Uehara, S. Kubota, and H. Imai. Relationships among Differerential, Truncated Differential, Impossible Differential Cryptanalyses against Word-Oriented Block Ciphers like Rijndael, E2, 2000. http://csrc.nist.gov/encryption/aes/round2/conf3/papers/32-msugita.. [14] B. Gladman, AES Algorithm Efficiency, http://www.btinternet.com/∼brian.gladman/cryptography technology/aes/index.html

16