<<

On the Two sh Schedule

? ?? ??? y

Bruce Schneier , John Kelsey , Doug Whiting ,David Wagner , Chris

z x

Hall , and

Abstract. Two sh is a new blo ck cipher with a 128 bit blo ck, and a

key length of 128, 192, or 256 bits, which has b een submitted as an AES

candidate. In this pap er, we brie y review the structure of Two sh, and

then discuss the of Two sh, and its resistance to attack. We

close with some op en questions on the securityofTwo sh's key schedule.

1 Intro duction

NIST announced the Advanced Standard AES program in 1997

[NIST97a]. NIST solicited comments from the public on the prop osed standard,

and eventually issued a call for algorithms to satisfy the standard [NIST97b].

The intention is for NIST to make all submissions public and eventually, through

a pro cess of public review and comment, cho ose a new encryption standard to

replace DES.

Two sh is our submission to the AES selection pro cess. It meets all the

required NIST criteria|128-bit blo ck; 128-, 192-, and 256-bit key; ecienton

various platforms; etc.|and some strenuous design requirements, p erformance

as well as cryptographic, of our own.

Two sh was designed to meet NIST's design criteria for AES [NIST97b].

Sp eci cally, they are:

{ A 128-bit symmetric blo ck cipher.

{ Key lengths of 128 bits, 192 bits, and 256 bits.

{ No weak keys.

{ Eciency, b oth on the Intel Pentium Pro and other software and hardware

platforms.

{ Flexible design: e.g., accept additional key lengths; be implementable on

a wide varietyof platforms and applications; and be suitable for a stream

cipher, hash function, and MAC.

?

Counterpane Systems; 101 E Minnehaha Parkway, Minneap olis, MN 55419, USA;

[email protected].

??

Counterpane Systems; [email protected].

???

Hi/fn, Inc., 5973 Avenida Encinas Suite 110, Carlsbad, CA 92008, USA; dwhiting@

hifn.com.

y

University of California Berkeley, So da Hall, Berkeley, CA 94720, USA; daw@

cs.berkeley.edu.

z

Counterpane Systems; [email protected].

x

Counterpane Systems; [email protected].

{ Simple design, b oth to facilitate ease of analysis and ease of implementation.

A central feature of Two sh's security and exibility is its key schedule. In

this pap er, we will brie y review the design of Two sh, and then discuss the

security features of the key schedule. The remainder of the pap er is as follows:

First, we discuss the sp eci c design of Two sh. Next, we analyze the Two sh

key schedule in some detail. Finally, we p oint out some op en questions with

resp ect to the key schedule. Note that for space reasons, this pap er do es not

include a complete discussion of the Two sh design. Instead, we refer the reader

to http://www.counterpane.com.

2 Two sh

Two sh uses a 16-round Feistel-like structure with additional whitening of the

input and output. The only non-Feistel elements are the 1-bit rotates. The ro-

tations can b e moved into the F function to create a pure Feistel structure, but

this requires an additional rotation of the words just b efore the output whitening

step.

The plaintext is split into four 32-bit words. In the input whitening step, these

are xored with four key words. This is followed by sixteen rounds. In each round,

the twowords on the left are used as input to the g functions. One of them is

rotated by 8 bits rst. The g function consists of four byte-wide key-dep endent

S-b oxes, followed by a linear mixing step based on an MDS matrix. The results of

the two g functions are combined using a Pseudo-Hadamard Transform PHT,

and twokeywords are added. These two results are then xored into the words

on the right one of which is rotated left by 1 bit rst, the other is rotated right

afterwards. The left and right halves are then swapp ed for the next round. After

all the rounds, the swap of the last round is reversed, and the four words are

xored with four more key words to pro duce the .

More formally, the 16 bytes of plaintext p ;:::;p are rst split into4words

0 15

P ;:::;P of 32 bits each using the little-endian convention.

0 3

3

X

8j

P = p  2 i =0;:::;3

i

4i+j 

j =0

In the input whitening step, these words are xored with 4 words of the expanded

key.

R = P  K i =0;:::;3

0;i i i

In each of the 16 rounds, the rst twowords are used as input to the function F ,

which also takes the round numb er as input. The third word is xored with the

rst output of F and then rotated rightby one bit. The fourth word is rotated

left by one bit and then xored with the second output word of F . Finally, the

two halves are exchanged. Thus,

F ;F =F R ;R ;r

r;0 r;1 r;0 r;1 2

R =RORR  F ; 1

r +1;0 r;2 r;0

R =ROLR ; 1  F

r +1;1 r;3 r;1

R = R

r +1;2 r;0

R = R

r +1;3 r;1

for r =0;:::;15 and where ROR and ROL are functions that rotate their rst

argument a 32-bit word left or rightby the numb er of bits indicated by their

second argument.

The output whitening step undo es the `swap' of the last round, and xors

the data words with 4 words of the expanded key.

C = R  K i =0;:::;3

i i+4

16;i+2 mo d 4

The four words of ciphertext are then written as 16 bytes c ;:::;c using the

0 15

same little-endian conversion used for the plaintext.

 

C

bi=4c

8

mo d 2 i =0;:::;15 c =

i

8i mo d 4

2

2.1 The Function F

The function F is a key-dep endent p ermutation on 64-bit values. It takes three

arguments, two input words R and R , and the round number r used to select

0 1

the appropriate subkeys. R is passed through the g function, which yields T .

0 0

R is rotated left by 8 bits and then passed through the g function to yield

1

T . The results T and T are then combined in a PHT and twowords of the

1 0 1

expanded key are added.

T = g R 

0 0

T = g ROLR ; 8

1 1

32

F =T + T + K mod2

0 0 1 2r +8

32

F =T +2T + K mod2

1 0 1 2r +9

0

where F ;F  is the result of F .We also de ne the function F for use in our

0 1

0

analysis. F is identical to the F function, except that it do es not add anykey

blo cks to the output. The PHT is still p erformed.

2.2 The Function g

The function g forms the heart of Two sh. The input word X is split into four

bytes. Each byte is run through its own key-dep endent S-b ox. Each S-b ox is

bijective, takes 8 bits of input, and pro duces 8 bits of output. The four results

8

are interpreted as a vector of length 4 over GF2 , and multiplied by the 4  4 3

8

MDS matrix using the eld GF 2  for the computations. The resulting vector

is interpreted as a 32-bit word which is the result of g .

 

8i 8

x = X=2 mo d 2 i =0;:::;3

i

y = s [x ] i =0;:::;3

i i i

0 1 0 1

0 1

z y

0 0

  

B C B C

z y

B C

. .

1 1

B C B C

. .

= 

@ A

. . MDS

@ A @ A

z y

2 2

  

z y

3 3

3

X

8i

Z = z  2

i

i=0

where s are the key-dep endent S-b oxes and Z is the result of g .For this to b e

i

well-de ned, we need to sp ecify the corresp ondence b etween byte values and the

8 8

eld elements of GF2 . We representGF2  as GF2[x]=v x where v x=

8 6 5 3

x + x + x + x +1 is a primitive p olynomial of degree 8 over GF 2. The

P

7

i

eld element a = a x with a 2 GF2 is identi ed with the byte value

i i

i=0

P

7

i 8

a 2 . This is in some sense the \natural" mapping; addition in GF 2 

i

i=0

corresp onds to a xor of the bytes.

2.3 The Key Schedule

The key schedule has to provide 40 words of expanded key K ;:::;K , and the

0 39

4key-dep endent S-b oxes used in the g function. Two sh is de ned for keys of

length N = 128, N = 192, and N = 256. Keys of any length shorter than 256

bits can b e used by them with zero es until the next larger de ned key

length.

We de ne k = N=64. The key M consists of 8k bytes m ;:::;m . The

0 8k 1

bytes are rst converted into 2k words of 32 bits each

3

X

8j

M = m  2 i =0;:::;2k 1

i

4i+j 

j =0

and then into twoword vectors of length k .

M =M ;M ;:::;M 

e 0 2 2k 2

M =M ;M ;:::;M 

o 1 3 2k 1

A third word vector of length k is also derived from the key. This is done by

8

taking the key bytes in groups of 8, interpreting them as a vector over GF 2 ,

and multiplying them bya4 8 matrix derived from a Reed-Solomon co de. Each

result of 4 bytes is then interpreted as a 32-bit word. These words make up the 4

third vector.

1 0

m

8i

C B

m

8i+1

C B

1 0

1 0

C B

m s

8i+2 i;0

  

C B

C B C B

m s

C B

. .

8i+3 i;1

C B C B

. .

 =

A @

. .RS

C B

A @

m s

8i+4 i;2

C B

  

C B

m s

8i+5 i;3

C B

A @

m

8i+6

m

8i+7

3

X

8j

S = s  2

i i;j

j =0

for i =0;:::;k 1, and

S =S ;S ;:::;S 

k 1 k 2 0

Note that S lists the words in \reverse" order. For the RS matrix multiply,

8 8 6 3 2

GF2  is represented by GF2[x]=w x, where w x=x + x + x + x +1

is another primitive p olynomial of degree 8 over GF 2. The mapping b etween

8

byte values and elements of GF2  uses the same de nition as used for the MDS

matrix multiply.

Additional Key Lengths Two sh can accept keys of anybyte length up to

256 bits. For key sizes that are not de ned ab ove, the key is padded at the end

with zero bytes to the next larger length that is de ned. For example, an 80-bit

key m ;:::;m would be extended by setting m = 0 for i = 10;:::;15 and

0 9 i

treating it as a 128-bit key.

The Function h The function h takes two inputs|a 32-bit word X and a

list L = L ;:::;L  of 32-bit words of length k |and pro duces one word

0 k 1

of output. This function works in k stages. In each stage, the four bytes are

each passed through a xed S-b ox, and xored with a byte derived from the list.

Finally, the bytes are once again passed through a xed S-b ox, and the four

bytes are multiplied by the MDS matrix just as in g . More formally: we split the

words into bytes.

 

8j 8

l = L =2 mo d 2

i;j i

 

8j 8

x = X=2 mo d 2

j

for i = 0;:::;k 1 and j = 0;:::;3. Then the sequence of substitutions and

xors is applied.

y = x j =0;:::;3

k;j j 5

If k =4 wehave

y = [y ]  l

3;0 1 4;0 3;0

y = q [y ]  l

3;1 0 4;1 3;1

y = q [y ]  l

3;2 0 4;2 3;2

y = q [y ]  l

3;3 1 4;3 3;3

If k  3wehave

y = q [y ]  l

2;0 1 3;0 2;0

y = q [y ]  l

2;1 1 3;1 2;1

y = q [y ]  l

2;2 0 3;2 2;2

y = q [y ]  l

2;3 0 3;3 2;3

In all cases wehave

y = q [q [q [y ]  l ]  l ]

0 1 0 0 2;0 1;0 0;0

y = q [q [q [y ]  l ]  l ]

1 0 0 1 2;1 1;1 0;1

y = q [q [q [y ]  l ]  l ]

2 1 1 0 2;2 1;2 0;2

y = q [q [q [y ]  l ]  l ]

3 0 1 1 2;3 1;3 0;3

Here, q and q are xed p ermutations on 8-bit values that we will discuss shortly.

0 1

The resulting vector of y 's is multiplied by the MDS matrix, just as in the g

i

function.

0 0 1 1

0 1

z y

0 0

  

B B C C

z y

B C

. .

1 1

B B C C

. .

 =

@ A

. MDS .

@ @ A A

z y

2 2

  

z y

3 3

3

X

8i

Z = z  2

i

i=0

where Z is the result of h.

The Key-dep endent S-b oxes We can now de ne the S-b oxes in the function

g by

g X = hX; S 

That is, for i =0;:::;3, the key-dep endent S-b ox s is formed by the mapping

i

from x to y in the h function, where the list L is equal to the vector S derived

i i

from the key. 6

The Expanded Key Words K The words of the expanded key are de ned

j

using the h function.

24 16 8 0

 =2 +2 +2 +2

A = h2i; M 

i e

B =ROLh2i +1; M ; 8

i o

32

K =A + B mod2

2i i i

32

K =ROLA +2B mod2 ; 9

2i+1 i i

The constant  is used here to duplicate bytes; it has the prop erty that for

i =0;:::;255, the word i consists of four equal bytes, each with the value i.

The function h is applied to words of this typ e. For A the byte values are 2i,

i

and the second argumentofh is M . B is computed similarly using 2i + 1 as the

e i

byte value and M as the second argument, with an extra rotate over 8 bits. The

o

values A and B are combined in a PHT. One of the results is further rotated

i i

by 9 bits. The two results form twowords of the expanded key.

The Permutations q and q The p ermutations q and q are xed p ermuta-

0 1 0 1

tions on 8-bit values. They are constructed from four di erent 4-bit p ermutations

each. Wehaveinvestigated the resulting 8-bit p ermutations, q and q , exten-

0 1

sively, and b elieve them to b e at least no weaker than randomly selected 8-bit

p ermutations.

3 Analysis of The Key Schedule

The key schedule has b een designed to provide resistance to attack, while also

providing a great of exibility in implementation. For example, after S has

b een computed from M ;M , all remaining key scheduling can b e done \on the

e o

y" during encryption. This allows for very low-memory implementations, and

for implementations with excellent key agility. In implementations with more

memory, all the subkeys can b e precomputed for improved p erformance. In im-

plementations with still more memory, such as on mo dern high-end pro cessors

with a reasonable RAM cache size, the e ects of the key-dep endent S-b oxes

and the MDS matrix multiply can be precomputed, reducing the work per g

computation to four table lo okups and three xors.

Note that S is only half the size of the key. This was done so that precom-

putation of the S-b oxes and MDS matrix multiply would be suciently fast,

and so that low-memory implementations would not have to take to o large a

p erformance hit. This means that the g function is slightly di erent for longer

keys than for shorter keys. 7

3.1 Byte Sequences

The subkeys in Two sh are generated by using the h function, which can be

seen as four key-dep endent S-b oxes followed by an MDS matrix. The input to

the S-b oxes is basically a counter. In this section we analyze the sequences of

outputs that this construction can generate.

All key material is used to de ne key-dep endent S-b oxes in h, which are then

used to derive subkeys. Each S-b ox gets a sequence of inputs, 0; 2; 4;:::;38

or 1; 3; 5;:::;39. The S-b ox generates a corresp onding sequence of outputs.

The corresp onding outputs from the four S-b oxes are combined using the MDS

matrix multiply to pro duce the sequence of A and B words, and those words

i i

are pro cessed with the PHT with a couple of rotations thrown in to pro duce

a pair of subkey words. Analyzing these byte sequences thus gives us imp ortant

insights ab out the whole key schedule.

We can mo del each byte sequence generated by a key-dep endent S-b ox as

a randomly selected non-rep eating byte sequence of length 20. This allows us

to make many useful predictions ab out the likeliho o d of nding keys or pairs

of keys with various interesting prop erties. Because we will be analyzing the

key schedule using this assumption in the remainder of this section, we should

discuss how reasonable it is to treat this byte sequence as randomly generated.

Wehave not found any statistical deviations b etween our key-dep endent S-b oxes

and the random mo del in any of our extensive statistical tests.

We are lo oking at sequences of 20 bytes that are all distinct. There are

159

256!=236! of those sequences, which is close to 2 .

3.2 Equivalent S-b ox Keys

Wehaveveri ed that there are no equivalent S-b oxkeys that generate the same

sequence of 20 bytes. In the random mo del, the chance of this happ ening for the

63 159 96

N = 256 case is ab out 2  2 = 2 . This is the chance of such equiva-

lent S-b oxes existing at all. In fact, we recently completed an exhaustive search

demonstrating that no pair of key inputs to an S-b ox pro duces identical S-b ox

entries.

3.3 Byte Di erence Sequences

Let us consider the more general problem of howtogetagiven 20-byte di erence

sequence b etween a pair of S-b oxes. Supp ose wehavetwo S-b oxes, each de ned

using 32 bits of key material, which are not equal, but whichmust b e chosen to

give us a given di erence sequence in the xor of their byte sequences. We can

estimate the probability of a pair of 4-byte inputs existing with the desired xor

63 159 96

di erence sequence as 2  2 =2 . Note that this is the probability that

such a pair of inputs exists, not the probability that a random pair of keys will

have this prop erty. 8

3.4 The A and B Sequences

From the prop erties of the byte sequences, we can discuss the prop erties of the

A and B sequences generated by eachkey M .

A = MDS s i; M ;s i; M ;s i; M ;s i; M 

i 0 1 2 3

Since the MDS matrix multiply is invertible, and since i is di erent for each

round's subkey words generated, we can see that no A or B value can rep eat

itself.

Similarly,we can see from the construction of h that eachkey byte a ects

exactly one S-b ox used to generate A or B . Changing a single key byte always

alters every one of the 20 bytes of output from that S-b ox, and so always alters

every word in the 20-word A or B sequence to which it contributes.

Consider a single byte of output from one of the S-b oxes. If we cycle any one

of the key bytes that contributes to that S-b ox through all 256 p ossible values,

the output of the S-b ox will also cycle through all 256 p ossible values. If we take

four key bytes that contribute to four di erent S-b oxes, and we cycle those four

bytes through all p ossible values, then the result of h will also cycle through all

p ossible values. This proves that A and B are uniformly distributed for all key

lengths, assuming the key M is uniformly distributed.

3.5 Di erence Sequences in A and B

Let us also consider di erence sequences. If wehave a sp eci c di erence sequence

wewant to see in A,we are faced with an interesting problem: since the MDS

matrix multiply is xor-linear, each desired output xor from the matrix multiply

allows only one p ossible input xor. This means that:

1. A zero output xor di erence in A can o ccur only with a zero output xor

di erence in all four of the byte sequences used to build A.

32

2. Only 1020 p ossible output di erences out of the 2 inA can o ccur with

i

a single \active" altered S-b ox. Most di erences require all four S-b oxes

used to form A to b e active.

i

3. Each desired output xor in A requires a sp eci c output xor in each of

the four byte sequences used to form A. This means that getting any de-

sired di erence sequence into all 20 A values requires getting a desired xor

i

sequence into all four 20-byte sequences. Note that if the desired output

xor in A is an appropriate value, up to three of the four byte sequences

i

can b e identical without much trouble, simply by leaving their key material

unchanged. As mentioned ab ove, this is very unlikely to b e p ossible for a

randomly chosen di erence pattern in the A sequence. There are of course

di erence sequences of A 's that can o ccur.

i

The ab ove analysis is of course also valid for the B sequence. 9

3.6 The Sequence K ;K 

2i 2i+1

As A and B are uniformly distributed over all keys, so are all the K . As

i i i

all pairs A ;B  are distinct, all the pairs K ;K  are distinct, although it

i i 2i 2i+1

might happ en that K = K for any pair of i and j .

i j

3.7 Di erence Sequences in the Subkeys

Each di erence sequence in A and B translate into a di erence sequences in

K ;K . However, while it is natural to consider A and B di erence se-

2i 2i+1

quences in terms of xor di erences, subkeys can reasonably b e considered either

32

as xor di erences or as di erences mo dulo 2 .Thus, wemay discuss di erence

sequences:





D [i; M ; M ]=K K

i;M i;M





X [i; M ; M ]=K  K

i;M i;M



where the di erence is computed b etween the key value M and M .

3.8 XOR Di erences in the Subkeys

Each round, the subkeys are added to the results of the PHT of two g functions,

and the results of those additions are xored into half of the cipher blo ck. An

xor di erence in the subkeys has a fairly high probabilityof passing through

the addition op eration and ending up in the cipher blo ck. The probabilityof

this is determined by the Hamming weightofthexor di erence, not counting

the highest-order bit. However, to get into the subkeys, a xor di erence must

rst pass through the rst addition.

Consider

x + y = z

x   +y = z  

0 1

Let k b e the numb er of bits set in  , not counting the highest-order bit. Then,

0

the highest probabilityvalue for  is  , and the probability that this will hold

1 0

k

is 2 . This is true b ecause addition and xor are very closely related op erations.

The only di erence b etween the two is the carry b etween bit p ositions. If ipping

a given bit changes the carry into the next bit p osition, this alters the output

xor di erence. This happ ens with probability1=2 p er bit. The situation is more

complex for multiple adjacent bits, but the general rule still holds: for every bit

in the xor di erence not in the high-order bit p osition, the probability that the

di erence will pass through correctly is cut in half.

For the subkey generation, consider an xor di erence,  ,inA. This a ects

0

two subkey words:

K = A + B

2i i i

K =ROLA +2B ; 9

2i+1 i i 10

32

where the additions are mo dulo 2 .Ifwe assume these xor di erences propagate

indep endently in the two subkeys which app ears to b e the case, we see that this

k

leads to an xor di erence of  in the even subkey word with probability2 ,

0

and the xor di erence RO L ; 9 in the o dd subkey with the same probability.

0

The most probable xor di erence in the round's subkey blo ckthus o ccurs with

2k

probabiity 2 . A desired xor di erence sequence for all 20 pairs of subkey

words is thus quite dicult to get to work when k  3, assuming the desired

xor di erence sequence can b e created in the A sequence at all.

When the xor di erence is in B , the result is slightly more complicated; the

most probable xor di erence in a round's pair of subkey words may b e either

2k 1 2k

2 or 2 , dep ending on whether or not the xor di erence in B covers

the next-to-highest-order bit.

An xor di erence in A or B is easy to analyze in terms of additive di erences

32 k

mo dulo 2 :anxor di erence with k active bits has 2 equally likely additive

di erences. Note that if we have a additive di erence in A, we get it in b oth

subkey words, just rotated left nine bits in the odd subkey word. Thus, k -bit

xor di erences lead to a given additive di erence in a pair of subkey words with

k

probability2 . The rotation do es not really complicate things much for the

attacker, who knows where the changed bits are.

32

Note that when additive subkey di erences mo dulo 2 are used in an attack,

they survive badly through the xor with the plaintext blo ck. We estimate that

xor di erences are much more likely to b e directly useful in mounting an attack.

3.9 Key-dep endent Characteristics and Weak Keys

The concept of a key-dep endent characteristic seems to have b een intro duced

in [BB93] in their of , and also app ears in [DGV94a] in

1

an analysis of IDEA. The idea is that certain iterative prop erties of the blo ck

cipher useful to an attacker b ecome more e ective against the cipher for a sp eci c

subset of keys.

A di erential attackonTwo sh may consider xor-based di erences, addi-

tive di erences, or b oth. If an attacker sends xor di erences through the PHT

and subkey addition steps, his di erential characteristic probabilities will b e de-

p endent on the subkey values involved. In general, low-weight subkeys will give

an attacker some advantage, but this advantage is relatively small. Zero bits in

the subkeys improve the probabilities of cleanly getting xor-based di erential

characteristics through the subkey addition. Since there app ears to b e no sp e-

cial waytocho ose the key to make the subkey sequence esp ecially lowweight,

we do not b elieve this kind of key-dep endent di erential characteristic will have

any relevance in attacking Two sh.

A much more interesting issue in terms of key-dep endent characteristics is

whether the key-dep endent S-b oxes are ever generated with esp ecially high prob-

1

See [Haw98 ] for further cryptanalysis of IDEA weak keys. 11

ability di erential or high bias linear characteristics. The statistical analysis pre-

sented earlier shows that the b est linear and di erential characteristics over all

p ossible keys are still quite unlikely.

Note that the structure of b oth di erential and linear attacks in Two sh is

such that such attacks app ear to generally require go o d characteristics through

at least three of the four key-dep endent S-b oxes if not all four, so a single

high-probability di erential or linear characteristic for one S-b ox will not create

a weakness in the cipher as a whole. Our statistical testing has allowed us to

estimate that few or no keys result in a single S-b ox with a di erential charac-

teristic of probability higher than 24=256 for any length key, and with a linear

characteristic with bias higher than 108=256. These probabilities do not allow for

practical di erential or linear attacks. Further, for an attacker to mount a dif-

ferential or linear attack, it app ears to b e necessary to get very high-probability

di erential or linear characteristics in all four S-b oxes at once.

3.10 Related-key Cryptanalysis

Related-key cryptanalysis [Bih94,KSW96,KSW97] uses a cipher's key schedule

to break plaintexts encrypted with related keys. In its most advanced form, dif-

ferential related-key cryptanalysis, b oth plaintexts and keys with chosen di er-

entials are used to recover the keys. This typ e of analysis has had considerable

success against ciphers with simplistic key schedules|e.g., GOST and 3-Way

[DGV94b]|and is a realistic attack in some circumstances. A conventional at-

tack is usually judged in terms of the numb er of plaintexts or needed

for the attack, and the level of access to the cipher needed to get those texts e.g.,

known plaintext, chosen plaintext, adaptivechosen plaintext; in a related-key

attack, wemust add the requirement for to o ccur under two di er-

ent, but related, keys.

3.11 Resistance to Related-key Slide Attacks

A \slide" attack o ccurs in an iterated cipher when the encryption of one blo ck

for rounds 1 through n is the same as the encryption of another blo ck for rounds

s +1 to s + n. An attacker can lo ok at two encryptions, and can slide the rounds

forward in one of them relative to another. S-1 [Anon95] can b e broken with a

[Wag95a]. Travois [Yuv97] has identical round functions, and can

also b e broken with a slide attack. Conventional slide attacks allow one to break

the cipher with only known- or chosen-plaintext queries; however, as we shall

see next, there is a generalization to related-key attacks as well.

Related-key slide attacks were rst discovered by Biham in his attackona

DES variant [Bih94]. To mount a related-key slide attackonTwo sh, an attacker



must nd a pair of keys M; M such that the key-dep endent S-b oxes in g are

unchanged, but the subkey sequences slide down one round. This amounts to

nding, for each of the eight byte-p ermutations used for subkey generation, a 12

change in the keys such that:



s j; M =s j +2s; M 

i i

for n values of j . In total, this requires 8n of these relations to hold.

Let us lo ok in more detail for a xed key M . Let m 2f5;:::;8g b e the number

of S-b oxes used to compute the round keys that are a ected by the di erence



between M and M . Observe that m  5 due to the restriction that S cannot

change and the prop erties of the RS matrix that at least 5 inputs must change



8

32m128

to keep the output constant. There are at most 2 p ossible choices of

m



M .Wehave a total of nm 8-bit relations that need to b e satis ed. The exp ected



8

 8nm+32m128

number of M that satisfy these relations is thus  2 .For n  4

m

this is dominated by the case m =5;we will ignore the other cases for now. So

3840n 

for each M we can exp ect ab out 2 keys M that supp ort a slide attack for

n  4. This means that any sp eci c key is unlikely to supp ort a slide attack with

29340n 

n  4. Over all p ossible key pairs, we exp ect 2 pairs M; M for whicha

slide of n  4 o ccurs. Thus, it is unlikely that a pair exists at all with n  8.

Resistance to Related-key Di erential Attacks A related-key di erential

attack seeks to mount a di erential attack on a blo ck cipher through the key,as

well as or instead of through the plaintext/ciphertext p ort. Against Two sh, such

an attackmust control the subkey di erence sequence for at least the rounds in

the middle. For the sake of simplifying discussions of the attack, let us consider

an attacker who wants to put a chosen subkey di erence into the middle twelve



rounds' subkeys. That is, he wants to change M to M , and control D [i; M ; M ]

for i =12::35. At the same time, he needs to keep the g function, and thus the

key S , from changing. All else b eing equal, the longer the key, the more freedom

an attacker has to mount a related-key di erential attack. Wethus will assume

the use of 256 bit keys for the remainder of this section. Note that a successful

related key attack on 128 or 192 bit keys that gets only zero subkey di erences

in the rounds whose subkey di erences it must control translates directly to an

equivalent related key attack on 256 bit keys.

Consider the p osition of the attacker if he attempts a related-key di erential

attack with di erent S keys. This must result in di erent g outputs for all inputs,

since we know that there are no pairs of S values that lead to identical S-b oxes.

Assuming the pair of S values do es not lead to linearly-related S-b oxes, it will

not b e p ossible to comp ensate for this change in S with changes in the subkeys

in single rounds. The added diculty is approximately that of adding 24 active

S-b oxes to the existing related-key attack. For this reason, we b elieve that any

useful related-key attack will require a pair of keys that keeps S unchanged.

The Zero Di erence Case The simplest related-key attack to analyze is the

one that keeps b oth S and also the middle twelve rounds' subkeys unchanged. It

thus seeks to generate identical A and B sequences for twelve rounds, and thus

to keep the individual byte sequences used to derive A and B identical. 13

The RS co de used to derive S from M strictly limits the ways an attacker

can change M without altering S . The attacker must try to keep the number

of active subkey generating S-b oxes as low as p ossible, since each active S-b ox

is another constraint on his attack. The attacker can keep the numb er of active

S-b oxes down to ve without altering S , and so this is what he should do. With

only the key bytes a ecting these ve subkey generation S-b oxes active, he can

alter b etween one and four bytes in all ve S-b oxes; the nature of the RS matrix

is that if he needs to alter four bytes in any one of these S-b oxes, he must alter

bytes in all ve. In practice, in order to maximize his control over the byte

sequences generated by these S-b oxes, he must alter four bytes in all ve active

S-b oxes.

To get zero subkey di erences, the attacker must get zero di erences in the

byte sequences generated by all ve active S-b oxes. Consider a single suchbyte

sequence: The attacker tries to nd a pair of four-byte key inputs such that they

lead to identical byte sequences in the middle twelve rounds, which means the

63

middle twelvebytes. There are 2 pairs of key inputs from whichtocho ose, and

95

ab out 2 p ossible byte sequences available. If the byte sequences b ehave more-

or-less like random functions of the key inputs, this implies that it is extremely

unlikely that an attacker can nd a pair of key inputs that will get identical byte

sequences in these middle twelve rounds. We discuss this kind of analysis of byte

sequences in section 3.1. From this analysis, wewould not exp ect to see a pair

of keys for even one S-b ox with more than eight successive bytes unchanged,

and wewould exp ect even eight successivebytes of unchanged byte sequence to

require control of all four key bytes into the S-b ox. Wewould exp ect a sp eci c

pair of key bytes to b e required to generate these similar byte sequences.

To extend this to ve active S-b oxes, we exp ect there to b e, at b est, a single

pair of values for the twenty activekey bytes that leave the middle eight subkeys

unchanged.

Other Di erence Sequences An attacker who has control of the xor di er-

ence sequences in A ;B do es not necessarily have great control over the xor or

i i

32

mo dulo 2 di erence sequence that app ears in the subkeys.

First, wemust consider the context of a related-key di erential attack. The

attacker do es not generally know all of the key bytes generating either A or B .

i i

Instead, he knows the xor di erence sequence in A and B .

i i

Consider an A value with an xor di erence of  . If the Hamming weight

i

of  is k , not including the high-order bit, then the b est estimate for the xor

di erence that ends up in the two subkey words for a given round generally has

2k

probability ab out 2 . Control of the A ;B xor di erence sequence do es not

i i

make controlling the subkey xor di erences substantially easier.

Consider an A value with an xor di erence of  . If the Hamming weightof

i

32

 is k , then the b est estimate for the mo dulo 2 di erence of the two subkey

k

words for a given round has probability ab out 2 . 14

This p oints out one of the diculties in mounting any kind of successful

related-key attack with nonzero A ;B di erence sequences. If an attacker can

i i

nd a di erence sequence for A ;B that keeps k = 3, and needs to control the

i i

72

subkey di erences for twelve rounds, he has a probability of ab out 2 of getting

36

the most likely xor subkey di erence sequence, and ab out 2 of getting the

32

most likely mo dulo 2 di erence sequence.

Probability of a Successful Attack With One Related-Key Query We

consider the use of the RS matrix in deriving S from M to b e a p owerful defense

against related-key di erential attacks, b ecause it forces an attacker to keep at

least vekey generation S-b oxes active. Our analysis suggests that any useful

control of the subkey di erence sequence requires that each active S-b ox in the

attackhave all four key bytes changed.

Further, our analysis suggests that, for nearly any useful di erence sequence,

each active S-b ox in the attack has a sp eci c pair of de ning key bytes it needs

to work. At attacker sp ecifying his key relation in terms of bytewise xor has

ve pairs of sequences of four key bytes each, whichhewants to get. This leaves

him with a probability of a pair of keys with his desired relation actually leading

115

to the desired attack of ab out 2 , whichmoves the attack totally outside the

realm of practical attacks.

So long as an attacker is unable to improve this, either by nding a way to get

useful di erence sequences into the subkeys without having so many activekey

bytes, or by nding a way to mount related-key attacks with di erent S values

for the di erentkeys, we do not b elieve that any kind of related key di erential

attack is feasible.

Note the implication of this: Clever ways to control a couple extra rounds'

subkey di erences are not going to make the attacks feasible, unless they also

allow the attacker to use far fewer activekey bytes. For reference, note that with

one altered key byte p er active subkey generation S-b ox, the attacker ends up

39

with a 2 probability that a pair of related keys will yield an attack; with

78

twokey bytes p er active S-b ox, this increases to 2 ; with three key bytes p er

117

active S-b ox, it increases to 2 . In practice, this means that anykey relation

requiring more than one byte of key changed per active S-b ox app ears to be

impractical.

3.12 Conclusions

Our analysis suggests that related-key attacks against the full Two sh are not

workable. Note, however, that wehavespent less time working on resistance to

chosen key attacks, such as will b e available to an attacker if Two sh is used in

the straightforward way to de ne a hash function. For this reason, we recommend

that more analysis b e done b efore Two sh is used in the straightforward way

as a hash function, and we note that it app ears to b e much more secure to use 15

Two sh in this way with 128-bit keys than with 256-bit keys, despite the fact

that this also slows the sp eed of a hash function down by a factor of two.

4 Op en Questions

Several questions remain op en regarding the strength of the Two sh key sched-

ule. These include for following:

1. Wehave discussed di erential related key attacks within a certain set of as-

sumptions, including the assumption that the subkey generation mechanism

has certain more-or-less random prop erties. We do not have a stronger ar-

gument than our intuition and statistical tests that this is the case. A pro of

or stronger argument in either direction would b e of great interest.

2. Wehave done some analysis not re ected here for space reasons on partial

chosen key attacks on Two sh. Still remaining are issues raised by the desire

to use Two sh in some Davies-Meyer hashing mo de. What kind of collision

resistance mightwe exp ect in this case.

3. Wehave assumed that the derivation of q and q intro duces no weaknesses.

0 1

Further analysis of this construction, as well as our larger S-b ox construction

metho ds, would b e of interest.

4. Wehave discussed related-key slide attacks. There are many other ways to

reorder the round subkeys. Do any of these ways lead to attacks on the

cipher?

References

[Anon95] Anonymous, \this lo oked like it mightbeinteresting," sci.crypt Usenet p ost-

ing, 9 Aug 1995.

[BB93] I. Ben-Aroya and E. Biham, \Di erential Cryptanalysis of Lucifer," Ad-

vances in Cryptology | CRYPTO '93 Proceedings, Springer-Verlag, 1994,

pp. 187{199.

[Bih94] E. Biham, \New Typ es of Cryptanalytic Attacks Using Related Keys," Jour-

nal of Cryptology, v. 7, n. 4, 1994, pp. 229{246.

[Bih95] E. Biham, \On Matsui's ," Advances in Cryptology |

EUROCRYPT '94 Proceedings, Springer-Verlag, 1995, pp. 398{412.

[DGV94a] J. Daemen, R. Govaerts, and J. Vandewalle, \Weak Keys for IDEA," Ad-

vances in Cryptology | EUROCRYPT '93 Proceedings, Springer-Verlag,

1994, pp. 159{167.

[DGV94b] J. Daemen, R. Govaerts, and J. Vandewalle, \A New Approach to Blo ck

Cipher Design," Fast Software Encryption, Cambridge Security Workshop

Proceedings, Springer-Verlag, 1994, pp. 18{32.

[Haw98] P.Hawkes, \Di erential-Linear Classes of IDEA," Advances in

Cryptology | EUROCRYPT '98 Proceedings, Springer-Verlag, 1998, pp.

112{126.

[KSW96] J. Kelsey, B. Schneier, and D. Wagner, \Key-Schedule Cryptanalysis of

IDEA, G-DES, GOST, SAFER, and Triple-DES," Advances in Cryptology

|CRYPTO '96 Proceedings, Springer-Verlag, 1996, pp. 237{251. 16

[KSW97] J. Kelsey, B. Schneier, and D. Wagner, \Related-Key Cryptanalysis of 3-

WAY, Biham-DES, CAST, DES-X, NewDES, RC2, and TEA," Information

and Communications Security, First International ConferenceProceedings,

Springer-Verlag, 1997, pp. 203{207.

[NIST97a] National Institute of Standards and Technology, \Announcing Development

of a Federal Information Standard for Advanced Encryption Standard," Fed-

eral Register, v. 62, n. 1, 2 Jan 1997, pp. 93{94.

[NIST97b] National Institute of Standards and Technology, \Announcing Request for

Candidate Algorithm Nominations for the Advanced Encryption Standard

AES," Federal Register, v. 62, n. 117, 12 Sep 1997, pp. 48051{48058.

[Wag95a] D. Wagner, \Cryptanalysis of S-1," sci.crypt Usenet p osting, 27 Aug 1995.

[Yuv97] G. Yuval, \Reinventing the Travois: Encrytion/MAC in 30 ROM

Bytes," Fast Software Encryption, 4th International Workshop Proceedings,

Springer-Verlag, 1997, pp. 205{209. 17