On the Two sh Key Schedule
? ?? ??? y
Bruce Schneier , John Kelsey , Doug Whiting ,David Wagner , Chris
z x
Hall , and Niels Ferguson
Abstract. Two sh is a new blo ck cipher with a 128 bit blo ck, and a
key length of 128, 192, or 256 bits, which has b een submitted as an AES
candidate. In this pap er, we brie y review the structure of Two sh, and
then discuss the key schedule of Two sh, and its resistance to attack. We
close with some op en questions on the securityofTwo sh's key schedule.
1 Intro duction
NIST announced the Advanced Encryption Standard AES program in 1997
[NIST97a]. NIST solicited comments from the public on the prop osed standard,
and eventually issued a call for algorithms to satisfy the standard [NIST97b].
The intention is for NIST to make all submissions public and eventually, through
a pro cess of public review and comment, cho ose a new encryption standard to
replace DES.
Two sh is our submission to the AES selection pro cess. It meets all the
required NIST criteria|128-bit blo ck; 128-, 192-, and 256-bit key; ecienton
various platforms; etc.|and some strenuous design requirements, p erformance
as well as cryptographic, of our own.
Two sh was designed to meet NIST's design criteria for AES [NIST97b].
Sp eci cally, they are:
{ A 128-bit symmetric blo ck cipher.
{ Key lengths of 128 bits, 192 bits, and 256 bits.
{ No weak keys.
{ Eciency, b oth on the Intel Pentium Pro and other software and hardware
platforms.
{ Flexible design: e.g., accept additional key lengths; be implementable on
a wide varietyof platforms and applications; and be suitable for a stream
cipher, hash function, and MAC.
?
Counterpane Systems; 101 E Minnehaha Parkway, Minneap olis, MN 55419, USA;
??
Counterpane Systems; [email protected].
???
Hi/fn, Inc., 5973 Avenida Encinas Suite 110, Carlsbad, CA 92008, USA; dwhiting@
hifn.com.
y
University of California Berkeley, So da Hall, Berkeley, CA 94720, USA; daw@
cs.berkeley.edu.
z
Counterpane Systems; [email protected].
x
Counterpane Systems; [email protected].
{ Simple design, b oth to facilitate ease of analysis and ease of implementation.
A central feature of Two sh's security and exibility is its key schedule. In
this pap er, we will brie y review the design of Two sh, and then discuss the
security features of the key schedule. The remainder of the pap er is as follows:
First, we discuss the sp eci c design of Two sh. Next, we analyze the Two sh
key schedule in some detail. Finally, we p oint out some op en questions with
resp ect to the key schedule. Note that for space reasons, this pap er do es not
include a complete discussion of the Two sh design. Instead, we refer the reader
to http://www.counterpane.com.
2 Two sh
Two sh uses a 16-round Feistel-like structure with additional whitening of the
input and output. The only non-Feistel elements are the 1-bit rotates. The ro-
tations can b e moved into the F function to create a pure Feistel structure, but
this requires an additional rotation of the words just b efore the output whitening
step.
The plaintext is split into four 32-bit words. In the input whitening step, these
are xored with four key words. This is followed by sixteen rounds. In each round,
the twowords on the left are used as input to the g functions. One of them is
rotated by 8 bits rst. The g function consists of four byte-wide key-dep endent
S-b oxes, followed by a linear mixing step based on an MDS matrix. The results of
the two g functions are combined using a Pseudo-Hadamard Transform PHT,
and twokeywords are added. These two results are then xored into the words
on the right one of which is rotated left by 1 bit rst, the other is rotated right
afterwards. The left and right halves are then swapp ed for the next round. After
all the rounds, the swap of the last round is reversed, and the four words are
xored with four more key words to pro duce the ciphertext.
More formally, the 16 bytes of plaintext p ;:::;p are rst split into4words
0 15
P ;:::;P of 32 bits each using the little-endian convention.
0 3
3
X
8j
P = p 2 i =0;:::;3
i
4i+j
j =0
In the input whitening step, these words are xored with 4 words of the expanded
key.
R = P K i =0;:::;3
0;i i i
In each of the 16 rounds, the rst twowords are used as input to the function F ,
which also takes the round numb er as input. The third word is xored with the
rst output of F and then rotated rightby one bit. The fourth word is rotated
left by one bit and then xored with the second output word of F . Finally, the
two halves are exchanged. Thus,
F ;F =F R ;R ;r
r;0 r;1 r;0 r;1 2
R =RORR F ; 1
r +1;0 r;2 r;0
R =ROLR ; 1 F
r +1;1 r;3 r;1
R = R
r +1;2 r;0
R = R
r +1;3 r;1
for r =0;:::;15 and where ROR and ROL are functions that rotate their rst
argument a 32-bit word left or rightby the numb er of bits indicated by their
second argument.
The output whitening step undo es the `swap' of the last round, and xors
the data words with 4 words of the expanded key.
C = R K i =0;:::;3
i i+4
16;i+2 mo d 4
The four words of ciphertext are then written as 16 bytes c ;:::;c using the
0 15
same little-endian conversion used for the plaintext.
C
bi=4c
8
mo d 2 i =0;:::;15 c =
i
8i mo d 4
2
2.1 The Function F
The function F is a key-dep endent p ermutation on 64-bit values. It takes three
arguments, two input words R and R , and the round number r used to select
0 1
the appropriate subkeys. R is passed through the g function, which yields T .
0 0
R is rotated left by 8 bits and then passed through the g function to yield
1
T . The results T and T are then combined in a PHT and twowords of the
1 0 1
expanded key are added.
T = g R
0 0
T = g ROLR ; 8
1 1
32
F =T + T + K mod2
0 0 1 2r +8
32
F =T +2T + K mod2
1 0 1 2r +9
0
where F ;F is the result of F .We also de ne the function F for use in our
0 1
0
analysis. F is identical to the F function, except that it do es not add anykey
blo cks to the output. The PHT is still p erformed.
2.2 The Function g
The function g forms the heart of Two sh. The input word X is split into four
bytes. Each byte is run through its own key-dep endent S-b ox. Each S-b ox is
bijective, takes 8 bits of input, and pro duces 8 bits of output. The four results
8
are interpreted as a vector of length 4 over GF2 , and multiplied by the 4 4 3
8
MDS matrix using the eld GF 2 for the computations. The resulting vector
is interpreted as a 32-bit word which is the result of g .
8i 8
x = X=2 mo d 2 i =0;:::;3
i
y = s [x ] i =0;:::;3
i i i
0 1 0 1
0 1
z y
0 0
B C B C
z y
B C
. .
1 1
B C B C
. .
=
@ A
. . MDS
@ A @ A
z y
2 2
z y
3 3
3
X
8i
Z = z 2
i
i=0
where s are the key-dep endent S-b oxes and Z is the result of g .For this to b e
i
well-de ned, we need to sp ecify the corresp ondence b etween byte values and the
8 8
eld elements of GF2 . We representGF2 as GF2[x]=v x where v x=
8 6 5 3
x + x + x + x +1 is a primitive p olynomial of degree 8 over GF 2. The
P
7
i
eld element a = a x with a 2 GF2 is identi ed with the byte value
i i
i=0
P
7
i 8
a 2 . This is in some sense the \natural" mapping; addition in GF 2
i
i=0
corresp onds to a xor of the bytes.
2.3 The Key Schedule
The key schedule has to provide 40 words of expanded key K ;:::;K , and the
0 39
4key-dep endent S-b oxes used in the g function. Two sh is de ned for keys of
length N = 128, N = 192, and N = 256. Keys of any length shorter than 256
bits can b e used by padding them with zero es until the next larger de ned key
length.
We de ne k = N=64. The key M consists of 8k bytes m ;:::;m . The
0 8k 1
bytes are rst converted into 2k words of 32 bits each
3
X
8j
M = m 2 i =0;:::;2k 1
i
4i+j
j =0
and then into twoword vectors of length k .
M =M ;M ;:::;M
e 0 2 2k 2
M =M ;M ;:::;M
o 1 3 2k 1
A third word vector of length k is also derived from the key. This is done by
8
taking the key bytes in groups of 8, interpreting them as a vector over GF 2 ,
and multiplying them bya4 8 matrix derived from a Reed-Solomon co de. Each
result of 4 bytes is then interpreted as a 32-bit word. These words make up the 4
third vector.
1 0
m
8i
C B
m
8i+1
C B
1 0
1 0
C B
m s
8i+2 i;0
C B
C B C B
m s
C B
. .
8i+3 i;1
C B C B
. .
=
A @
. .RS
C B
A @
m s
8i+4 i;2
C B
C B
m s
8i+5 i;3
C B
A @
m
8i+6
m
8i+7
3
X
8j
S = s 2
i i;j
j =0
for i =0;:::;k 1, and
S =S ;S ;:::;S
k 1 k 2 0
Note that S lists the words in \reverse" order. For the RS matrix multiply,
8 8 6 3 2
GF2 is represented by GF2[x]=w x, where w x=x + x + x + x +1
is another primitive p olynomial of degree 8 over GF 2. The mapping b etween
8
byte values and elements of GF2 uses the same de nition as used for the MDS
matrix multiply.
Additional Key Lengths Two sh can accept keys of anybyte length up to
256 bits. For key sizes that are not de ned ab ove, the key is padded at the end
with zero bytes to the next larger length that is de ned. For example, an 80-bit
key m ;:::;m would be extended by setting m = 0 for i = 10;:::;15 and
0 9 i
treating it as a 128-bit key.
The Function h The function h takes two inputs|a 32-bit word X and a
list L = L ;:::;L of 32-bit words of length k |and pro duces one word
0 k 1
of output. This function works in k stages. In each stage, the four bytes are
each passed through a xed S-b ox, and xored with a byte derived from the list.
Finally, the bytes are once again passed through a xed S-b ox, and the four
bytes are multiplied by the MDS matrix just as in g . More formally: we split the
words into bytes.
8j 8
l = L =2 mo d 2
i;j i
8j 8
x = X=2 mo d 2
j
for i = 0;:::;k 1 and j = 0;:::;3. Then the sequence of substitutions and
xors is applied.
y = x j =0;:::;3
k;j j 5
If k =4 wehave
y = q [y ] l
3;0 1 4;0 3;0
y = q [y ] l
3;1 0 4;1 3;1
y = q [y ] l
3;2 0 4;2 3;2
y = q [y ] l
3;3 1 4;3 3;3
If k 3wehave
y = q [y ] l
2;0 1 3;0 2;0
y = q [y ] l
2;1 1 3;1 2;1
y = q [y ] l
2;2 0 3;2 2;2
y = q [y ] l
2;3 0 3;3 2;3
In all cases wehave
y = q [q [q [y ] l ] l ]
0 1 0 0 2;0 1;0 0;0
y = q [q [q [y ] l ] l ]
1 0 0 1 2;1 1;1 0;1
y = q [q [q [y ] l ] l ]
2 1 1 0 2;2 1;2 0;2
y = q [q [q [y ] l ] l ]
3 0 1 1 2;3 1;3 0;3
Here, q and q are xed p ermutations on 8-bit values that we will discuss shortly.
0 1
The resulting vector of y 's is multiplied by the MDS matrix, just as in the g
i
function.
0 0 1 1
0 1
z y
0 0
B B C C
z y
B C
. .
1 1
B B C C
. .
=
@ A
. MDS .
@ @ A A
z y
2 2
z y
3 3
3
X
8i
Z = z 2
i
i=0
where Z is the result of h.
The Key-dep endent S-b oxes We can now de ne the S-b oxes in the function
g by
g X = hX; S
That is, for i =0;:::;3, the key-dep endent S-b ox s is formed by the mapping
i
from x to y in the h function, where the list L is equal to the vector S derived
i i
from the key. 6
The Expanded Key Words K The words of the expanded key are de ned
j
using the h function.
24 16 8 0
=2 +2 +2 +2
A = h2i; M
i e
B =ROLh2i +1; M ; 8
i o
32
K =A + B mod2
2i i i
32
K =ROLA +2B mod2 ; 9
2i+1 i i
The constant is used here to duplicate bytes; it has the prop erty that for
i =0;:::;255, the word i consists of four equal bytes, each with the value i.
The function h is applied to words of this typ e. For A the byte values are 2i,
i
and the second argumentofh is M . B is computed similarly using 2i + 1 as the
e i
byte value and M as the second argument, with an extra rotate over 8 bits. The
o
values A and B are combined in a PHT. One of the results is further rotated
i i
by 9 bits. The two results form twowords of the expanded key.
The Permutations q and q The p ermutations q and q are xed p ermuta-
0 1 0 1
tions on 8-bit values. They are constructed from four di erent 4-bit p ermutations
each. Wehaveinvestigated the resulting 8-bit p ermutations, q and q , exten-
0 1
sively, and b elieve them to b e at least no weaker than randomly selected 8-bit
p ermutations.
3 Analysis of The Key Schedule
The key schedule has b een designed to provide resistance to attack, while also
providing a great deal of exibility in implementation. For example, after S has
b een computed from M ;M , all remaining key scheduling can b e done \on the
e o
y" during encryption. This allows for very low-memory implementations, and
for implementations with excellent key agility. In implementations with more
memory, all the subkeys can b e precomputed for improved p erformance. In im-
plementations with still more memory, such as on mo dern high-end pro cessors
with a reasonable RAM cache size, the e ects of the key-dep endent S-b oxes
and the MDS matrix multiply can be precomputed, reducing the work per g
computation to four table lo okups and three xors.
Note that S is only half the size of the key. This was done so that precom-
putation of the S-b oxes and MDS matrix multiply would be suciently fast,
and so that low-memory implementations would not have to take to o large a
p erformance hit. This means that the g function is slightly di erent for longer
keys than for shorter keys. 7
3.1 Byte Sequences
The subkeys in Two sh are generated by using the h function, which can be
seen as four key-dep endent S-b oxes followed by an MDS matrix. The input to
the S-b oxes is basically a counter. In this section we analyze the sequences of
outputs that this construction can generate.
All key material is used to de ne key-dep endent S-b oxes in h, which are then
used to derive subkeys. Each S-b ox gets a sequence of inputs, 0; 2; 4;:::;38
or 1; 3; 5;:::;39. The S-b ox generates a corresp onding sequence of outputs.
The corresp onding outputs from the four S-b oxes are combined using the MDS
matrix multiply to pro duce the sequence of A and B words, and those words
i i
are pro cessed with the PHT with a couple of rotations thrown in to pro duce
a pair of subkey words. Analyzing these byte sequences thus gives us imp ortant
insights ab out the whole key schedule.
We can mo del each byte sequence generated by a key-dep endent S-b ox as
a randomly selected non-rep eating byte sequence of length 20. This allows us
to make many useful predictions ab out the likeliho o d of nding keys or pairs
of keys with various interesting prop erties. Because we will be analyzing the
key schedule using this assumption in the remainder of this section, we should
discuss how reasonable it is to treat this byte sequence as randomly generated.
Wehave not found any statistical deviations b etween our key-dep endent S-b oxes
and the random mo del in any of our extensive statistical tests.
We are lo oking at sequences of 20 bytes that are all distinct. There are
159
256!=236! of those sequences, which is close to 2 .
3.2 Equivalent S-b ox Keys
Wehaveveri ed that there are no equivalent S-b oxkeys that generate the same
sequence of 20 bytes. In the random mo del, the chance of this happ ening for the
63 159 96
N = 256 case is ab out 2 2 = 2 . This is the chance of such equiva-
lent S-b oxes existing at all. In fact, we recently completed an exhaustive search
demonstrating that no pair of key inputs to an S-b ox pro duces identical S-b ox
entries.
3.3 Byte Di erence Sequences
Let us consider the more general problem of howtogetagiven 20-byte di erence
sequence b etween a pair of S-b oxes. Supp ose wehavetwo S-b oxes, each de ned
using 32 bits of key material, which are not equal, but whichmust b e chosen to
give us a given di erence sequence in the xor of their byte sequences. We can
estimate the probability of a pair of 4-byte inputs existing with the desired xor
63 159 96
di erence sequence as 2 2 =2 . Note that this is the probability that
such a pair of inputs exists, not the probability that a random pair of keys will
have this prop erty. 8
3.4 The A and B Sequences
From the prop erties of the byte sequences, we can discuss the prop erties of the
A and B sequences generated by eachkey M .
A = MDS s i; M ;s i; M ;s i; M ;s i; M
i 0 1 2 3
Since the MDS matrix multiply is invertible, and since i is di erent for each
round's subkey words generated, we can see that no A or B value can rep eat
itself.
Similarly,we can see from the construction of h that eachkey byte a ects
exactly one S-b ox used to generate A or B . Changing a single key byte always
alters every one of the 20 bytes of output from that S-b ox, and so always alters
every word in the 20-word A or B sequence to which it contributes.
Consider a single byte of output from one of the S-b oxes. If we cycle any one
of the key bytes that contributes to that S-b ox through all 256 p ossible values,
the output of the S-b ox will also cycle through all 256 p ossible values. If we take
four key bytes that contribute to four di erent S-b oxes, and we cycle those four
bytes through all p ossible values, then the result of h will also cycle through all
p ossible values. This proves that A and B are uniformly distributed for all key
lengths, assuming the key M is uniformly distributed.
3.5 Di erence Sequences in A and B
Let us also consider di erence sequences. If wehave a sp eci c di erence sequence
wewant to see in A,we are faced with an interesting problem: since the MDS
matrix multiply is xor-linear, each desired output xor from the matrix multiply
allows only one p ossible input xor. This means that:
1. A zero output xor di erence in A can o ccur only with a zero output xor
di erence in all four of the byte sequences used to build A.
32
2. Only 1020 p ossible output di erences out of the 2 inA can o ccur with
i
a single \active" altered S-b ox. Most di erences require all four S-b oxes
used to form A to b e active.
i
3. Each desired output xor in A requires a sp eci c output xor in each of
the four byte sequences used to form A. This means that getting any de-
sired di erence sequence into all 20 A values requires getting a desired xor
i
sequence into all four 20-byte sequences. Note that if the desired output
xor in A is an appropriate value, up to three of the four byte sequences
i
can b e identical without much trouble, simply by leaving their key material
unchanged. As mentioned ab ove, this is very unlikely to b e p ossible for a
randomly chosen di erence pattern in the A sequence. There are of course
di erence sequences of A 's that can o ccur.
i
The ab ove analysis is of course also valid for the B sequence. 9
3.6 The Sequence K ;K
2i 2i+1
As A and B are uniformly distributed over all keys, so are all the K . As
i i i
all pairs A ;B are distinct, all the pairs K ;K are distinct, although it
i i 2i 2i+1
might happ en that K = K for any pair of i and j .
i j
3.7 Di erence Sequences in the Subkeys
Each di erence sequence in A and B translate into a di erence sequences in
K ;K . However, while it is natural to consider A and B di erence se-
2i 2i+1
quences in terms of xor di erences, subkeys can reasonably b e considered either
32
as xor di erences or as di erences mo dulo 2 .Thus, wemay discuss di erence
sequences:
D [i; M ; M ]=K K
i;M i;M
X [i; M ; M ]=K K
i;M i;M
where the di erence is computed b etween the key value M and M .
3.8 XOR Di erences in the Subkeys
Each round, the subkeys are added to the results of the PHT of two g functions,
and the results of those additions are xored into half of the cipher blo ck. An
xor di erence in the subkeys has a fairly high probabilityof passing through
the addition op eration and ending up in the cipher blo ck. The probabilityof
this is determined by the Hamming weightofthexor di erence, not counting
the highest-order bit. However, to get into the subkeys, a xor di erence must
rst pass through the rst addition.
Consider
x + y = z
x +y = z
0 1
Let k b e the numb er of bits set in , not counting the highest-order bit. Then,
0
the highest probabilityvalue for is , and the probability that this will hold
1 0