<<

THE ADVANCED COMPUTING SYSTEMS ASSOCIATION

The following paper was originally published in the USENIX Workshop on Smartcard Technology

Chicago, Illinois, USA, May 10–11, 1999

Efficient Block Ciphers for Smartcards

Joan Daemen Proton World International

Vincent Rijmen K.U.

© 1999 by The USENIX Association All Rights Reserved

Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein.

For more information about the USENIX Association: Phone: 1 510 528 8649 FAX: 1 510 548 5738 Email: [email protected] WWW: http://www.usenix.org

Ecient Blo ck Ciphers for Smartcards

Joan Daemen

Proton World Int'l

Zweefvliegtuigstraat 10

B-1130 Brussel,

[email protected]



Vincent Rijmen

K.U.Leuven, Dept. ESAT

Kard. Mercierlaan 94

B-3001 Heverlee, Belgium

[email protected] e

April 1, 1999

Abstract designed and published: [1].

Currently, the Square family consists of

three ciphers: Square, with a blo ck length

We present a family of blo ck ciphers that

and a key length of 128 bits; BKSQ with a

can b e implemented very eciently on cheap

blo ck length of 96 bits and a variable key

Smartcard pro cessors. The ciphers use a

length 96, 144 or 192 bits; and Rijndael

very small amount of RAM and a reasonable

with a variable blo ck length and key length

amountofROM. Both cipher execution and

b oth can b e indep endently sp eci ed at 128,

key setup/key change are very fast. The ci-

192 or 256 bits. The three ciphers are de-

phers resist theoretical and practical cryptan-

signed to b e secure against all known crypto-

alytic attacks and in their design timing and

graphic attacks. For a treatment of crypto-

power analysis attacks have b een taken into

graphic security and the design rationale we

account.

refer to [1,2,3]. This pap er treats implemen-

tation asp ects and in particular those sp eci c

for Smartcards.

1 Intro duction

In Section 2 we present the common ci-

pher structure of the Square family. Section 3

discusses the implementation of the ciphers

In many applications Smartcards are used

on typical Smartcard pro cessors. Section 4

as p ortable secure devices. For their secu-

treats the features of the presented ciphers

rity the applications make use of MAC gener-

to thwart attacks that exploit typical weak-

ation/veri cation and encryption/decryption

nesses of cipher implementations on Smart-

using a blo ck cipher. We present a family of

cards. Section 5 lists concrete p erformance

blo ck ciphers that are suited for this purp ose.

gures.

Additionally, all these ciphers can be used

as ecient one-way function and the variants

with blo ck size of 196 bits or higher are ef-

cient compression function to form an iter-

ated cryptographic hash function. The fam-

ily is named after its rst member that was



F.W.O. Postdo ctoral researcher, sp onsored by

the Fund for Scienti c Research - Flanders Belgium.

be done by using op erations like 32-bit ro- 2 Cipher structure

tations, multiplications, ..., but the use of

these op erations complicates Smartcard im-

plementations. In the Square family, the dif- A Square cipher is an iterated blo ck cipher:

fusion step can b e describ ed as a matrix mul- it consists of the rep eated application of a

tiplication cf. Section 3. The co ecients of round transformation that is parameterized

the multiplication matrix have b een selected by a round key. The round keys are derived

carefully to provide di usion that is optimal

from the cipher key by means of a key sched-

in a very de nite, mathematical sense, while ule. The blo ck length is indicated by n, the

at the same time allowing very ecient imple- cipher key length by m and the number of

mentation on standard Smartcard pro cessors.

rounds by r .

2.2 The key schedule 2.1 The round transformation

The round keys have length n and r +1 The round transformation is comp osed

round keys are required: one for every round of four invertible uniform transformations,

and a nal key addition. The key schedule called steps. These steps can be describ ed

can b e thought to o ccur in two phases. most easily by thinking of the input as a rect-

angular byte array. The dimensions of this

byte arrayvary for the di erentmembers of

Generation of the expanded key: The

the family, and dep end on the blo ck size. The

expanded key is initialized by taking

four steps are describ ed as follows cf. Fig-

the m-bit cipher key. It is expanded by

ure 1.

iteratively attaching m-bit blo cks that

are computed from the previously at-

tached blo ckby means of an LFSR-like The di usion step: Every byte is replaced

computation. This is rep eated until the by a linear combination of the bytes

expanded key has length nr +1. within the same column. The bytes

are considered as elements in the eld

Extraction of round keys: Round key i is

8

GF2 .

taken from the expanded key by taking

The disp ersion step: A p ermutation of

the i-th n-bit blo ck.

the bytes over di erent columns. This is

done by shifting the rows of the byte ar-

The LFSR computations in the key expan-

rayover di erent amounts, or by a trans-

sion ensure that any pair of di erent cipher

p osition of the byte array for Square.

keys result in a pair of expanded keys with

The nonlinear step: A substitution of the

a large Hamming di erence. The addition of

round constants removes symmetry between bytes by means of a nonlinear lo okup ta-

the rounds. This is necessary in order to ble.

provide resistance against related-key attacks

The round key addition: The bytes are

and attacks where the cipher key is known,

EXORed with an n-bit round key.

e.g., if the cipher is used as the compression

function of a hash function.

The choice for the op erations in the dif-

ferent steps has b een in uenced by our wish

to make the cipher eciently implementable

3 Sp eci c Smartcard implemen-

on Smartcards. The key addition, the disp er-

tation asp ects sion and the nonlinear step all can b e imple-

mented using op erations on individual bytes,

the natural \unit" on an 8-bit pro cessor. In

In this section we discuss the implementa- the di usion step, inter-byte di usion has to

tion of the cipher on 8-bit pro cessors with a be realised. On a 32-bit pro cessor, this can

-

Di usion step

-

-

a c

S [a] S [b] S [c] S [d]

b d

Nonlinear step

-

g

e

S [e] S [f ] S [g ] S [h] f

h

j

S [i] S [j ] S [k ] S [l ]

i

k l

a c a c

b d b d

Disp ersion step

-

g g

e e

f f

h h

j j

i i

k l k l

Figure 1: Graphical illustration of some basic op erations of the Square ciphers.

limited amount of RAM and ROM available, lo cation.

i.e., typical Smartcard pro cessors.

Implementing the di usion step is less

straightforward. It takes the computation

of additions and multiplication in the eld

3.1 The round transformation

8

GF2 . Addition over this eld corresp onds

to the readily available EXOR instruction.

The multiplication factors are the elements

The round transformation can be imple-

8

of GF2  represented bybyte values 1, 2 and

mented by serially p erforming the di erent

3. The multiplication by these factors can b e

steps.

done as follows:

The nonlinear step consists of a table

8

lo okup op eration that is the same for all

 1 is the identity in GF2 andmultipli-

input bytes. The 256-byte lo okup table is

cation by it do es not require any compu-

hard-co ded in the cipher program and the ta-

tation at all.

ble lo okup can be implemented with a sim-

ple load accumulator instruction in indexed  Multiplication by 2 in the nite eld

mo de. The round key addition is imple- could b e implemented as a left shift, fol-

mented with the EXOR instruction. The byte lowed by a reduction. However, the exe-

disp ersion step do es not take dedicated in- cution time and/or the p ower consump-

structions but is emb o died in the wayinput tion pattern of a reduction dep end on the

bytes are loaded and stored in the preced- value of the op erand. If the MSB of the

ing/succeeding steps. These three steps can op erand is 1, the reduction takes place,

be implemented in the following way: the if 0, the reduction can b e skipp ed. This

byte is loaded into the index register, an in- can b e done in constant time by execut-

dexed load accumulator instruction is per- ing dummy instructions e.g., NOP in

formed, the round key byte is EXORed and the case the reduction is skipp ed. How-

the accumulator is stored to the hard co ded ever, this gives rise to two di erent se-

3.2 The inverse round transforma- quences of op erations. The op eration

tion can be implemented with a xed series

of instructions by implementing the mul-

tiplication by 2 as a table lo okup with a

dedicated lo okup table 2mult[], that is

de ned as

The Square ciphers do not have the Feis-

tel structure, like e.g. the DES. Whereas for

2mult[a]=2 a:

Feistel ciphers the op eration of the cipher can

be inverted by simply reordering the round

The fact that the execution time is inde-

keys, this is not p ossible for the Square ci-

p endent of the argumentmakes this ta-

phers. Therefore, if the inverse op eration of

ble lo okup implementation timing attack

the cipher has to be implemented, it is nec-

resistant. We explain in Section 4 howit

essary to implement the inverse round op er-

can b e protected against p ower analysis.

ation separately.

 Multiplication by 3 can be obtained by

The inverse of the round key addition is

p erforming multiplication with 2 and

the same as the round key addition: EXOR

adding EXORing the argument itself:

with the round key. The inverse of the non-

3  a =2 1  a =2mult[a]  a.

linear step is implemented like the linear step,

but uses a di erent lo okup table. The byte

In Rijndael and Square the columns consist

disp ersion step is again emb o died in the way

of 4 bytes each and the di usion step applied

input bytes are loaded and stored in the other

to a column can be describ ed by a matrix

steps.

multiplication, that is given by:

The inverse of the di usion step consists

2 3 2 3 2 3

out 2 3 1 1 in

0 0

of a multiplication with the inverse matrix.

6 7 6 7 6 7

out 1 2 3 1 in

1 1

For Rijndael and Square, the inverse of the

6 7 6 7 6 7

=  :

4 5 4 5 4 5

out 1 1 2 3 in

2 2

di usion step is given by:

out 3 1 1 2 in

3 3

2 3 2 3 2 3

This can b e eciently implemented as:

out in E 9 D B

0 0

6 7 6 7 6 7

out in B E 9 D

1 1

6 7 6 7 6 7

p = in  in  in  in ;

 = :

0 1 2 3

4 5 4 5 4 5

out in D B E 9

2 2

out = 2mult[in  in ]; out  = p  in ;

0 0 1 0 0

9 D B E out in

3 3

out = 2mult[in  in ]; out  = p  in ;

1 1 2 1 1

out = 2mult[in  in ]; out  = p  in ;

2 2 3 2 2

Here B, D and E denote hexadecimal values.

out = 2mult[in  in ]; out  = p  in ;

3 3 0 3 3

It is easy to see that this multiplication will

require more op erations than the original dif-

This implementation takes only 15 EXORs

fusion step, namely 21 EXORS and 8 table

and 4 table lo okups p er column. It requires

lo okups using only the previously de ned ta-

temp orary storage for 2 bytes: the variables

ble 2mult[]. If additional tables are used, the

p and in if the output bu er is equal to the

0

p erformance loss can b e reduced. The storage

input bu er.

requirements do not increase.

In BKSQ the columns consist of 3 bytes

Note that most applications do not require

and the di usion step is given by

the inverse op eration of the cipher to b e exe-

2 3 2 3 2 3

cuted on the Smartcard. First of all, most of

out 3 2 2 in

0 0

4 5 4 5 4 5

the applications use the cipher for the gen-

out 2 3 2 in

=  :

1 1

eration and veri cation of MACs and as a

out 2 2 3 in

2 2

one-way function in the generation of session

keys. In the cases where enciphermentisac- This can b e implemented with only 5 EXORs

tually used, the CFB or OFB mo de can be and a single table lo okup p er column and only

used, where the inverse cipher is not used. one byte of temp orary storage is needed.

If programmed as explained in the previ- 3.3 The key schedule

ous section, the cipher execution consists of a

series of instructions that is completely xed:

In a Smartcard implementation, computing

there are no conditional branches whose exe-

the expanded key in a single shot and storing

cution dep ends on the cipher key and input.

it for use during the actual encryption, would

This thwarts the following attacks:

require to o much RAM. Therefore, the key

expansion has b een de ned in suchawaythat

Timing attack: This attack extracts

it can b e implemented in a cyclic bu er with

key/plaintext information from the

a size no larger than the size of the cipher key.

CPU time consumed by the cipher.

The expansion op eration has b een kept very

For the Square ciphers the CPU time

simple and ecient to make fast just-in-time

is indep endent of the cipher key and

key generation p ossible.

plaintext.

In the case that the cipher key length is

Simple p ower analysis: This attack ex-

equal to the blo ck length, the round key is

tracts key/plaintext information by ob-

simply up dated in between rounds. In the

serving the power consumption during

other cases, a small amount of additional se-

the cipher computation. Di erent CPU

quence control is required. All op erations in

instructions have di erent power con-

the key up date can b e eciently implemented

sumption and this attackallows to deter-

using EXORs and table lo okups.

mine whether one or another conditional

branch is taken in a computation. For

the Square ciphers the series of instruc-

3.4 32-bit pro cessors

tions is xed.

On a 32-bit pro cessor, an ecient imple-

A more powerful attack is di erential

mentation of the round transformation will

power analysis. This attack exploits the fact

use four large tables 1k each that combine

the p ower consumed by the CPU not only de-

the e ect of the nonlinear and the disp ersion

p ends on the instruction that is executed, but

step. The tables will t nicely in the cache of

also on the values of the op erands. It com-

most mo dern 32-bit pro cessors.

bines the application of cryptanalytic tech-

niques, statistics and power analysis. Basi-

The p erformance is indep endent of the

cally, it allows the determination of the value

value of the multiplication factors in the dis-

of individual bits of intermediary computa-

p ersion step. The inverse op eration of the ci-

tion results of the cipher by an attacker that

pher has the same p erformance as the forward

do es not even need to knowthesequence of

op eration, but uses a di erent set of tables.

instructions. The basic aw that is exploited

is that the p ower consumed by an instruction

dep ends on the values of the bits that are

handled by the command. To thwart these

4 Smartcard-sp eci c attacks

attacks, twomechanisms are prop osed:

Recently,several attacks have b een demon-

scrambling: To complicate the exploitation

strated that exploit weaknesses of cipher

of power consumption bias, e.g., by us-

implementations, rather than the inherent

ing programming tricks, such as inser-

mathematical prop erties of the actual cipher

tion of a variable amount of NOPS, or

[4]. These attacks exploit information such

b etter, unnecessary instructions in be-

as timing and power consumption to obtain

tween rounds. Scrambling can b e applied

information on the cipher key or plaintext. In

reasonably indep endent from the cipher

this section we will explain that the ciphers

structure.

of the Square family lend themselves to im-

curing: To make the p ower consumption of plementations that provide resistance against

the relevant instructions used in a cipher this typ e of attacktypical for Smartcards.

time while guaranteeing a xed execution implementation much less dep endenton

time. Table 1 shows that b esides the storage the value of the treated bits. One plau-

of the current round key and the intermedi- sible way to cure instructions is by the

ate ciphertext, only four bytes of RAM are intro duction of symmetry. For example:

used. The numb ers compare very favorably if a bit is stored in loaded from RAM,

with the gures for the other AES candidate also store load its complement. If two

algorithms. bits are EXORed, execute all four di er-

 

entcombinations: a  b; a  b; a  b; a  b.

The timings given include the key setup Typically, additional hardware has to b e

and algorithm setup time. Only the for- intro duced for every sensitive instruc-

ward op eration of the ciphers havebeenim- tion. Therefore it is an advantage to

plemented, backwards op eration is exp ected have few di erent instructions that han-

to b e slower b ecause the inverse of the di u- dle key- or plaintext-dep endentbits.

sion step cannot b e implemented as eciently

cf. Section 3.2. The inverse di usion step is

The instructions that handle key- or plain-

between 1.5 and 2 times slower than the orig-

text dep endent bits in our implementations

inal di usion step. Since the di usion step is

of the ciphers of the Square family are:

dominant in the execution of the ciphers on a

Smartcard, ab out the same p erformance loss

can b e exp ected for the full cipher.

 EXOR with accumulator, direct address-

ing;

The implementations on the Motorola

68HC08 micropro cessor have b een done us-

 store accumulator, direct addressing;

ing the 68HC08 development to ols by P&E

 load accumulator, direct addressing;

Micro computer Systems, Woburn, MA USA,

the IASM08 68HC08 Integrated Assembler

 load accumulator, indexed addressing

and the SIML8 68HC08 simulator. No op-

o set + index register

timization of co de length has b een attempted

for this pro cessor. Execution time, co de size

and required RAM for a number of imple-

These are only four instructions. Most other

mentations are given in Table 1 1 cycle = 1

mo dern ciphers [5] use an instruction set that

oscillator p erio d = 0.25 sec.

is substantially larger, due to the use of arith-

metic op erations. Moreover, the balancing

Rijndael has also b een implemented on the

of arithmetic op erations is likely to b e more

Intel 8051 micropro cessor, using 8051 De-

complicated than the balancing of the EXOR.

velopment to ols of Keil Elektronik: Vision

It can b e seen that there is arithmetic addi-

IDE for Windows and dScop e Debug-

tion in the indexed addressing. However, if

ger/Simulator for Windows. Execution time

the lo okup tables are p ositioned at physical

for several co de sizes is given in Table 2 1

addresses that are a multiple of 256, the ad-

cycle = 12 oscillator p erio ds = 1 sec.

dress computation can b e reduced to a mere

concatenation of index and o set. Obviously,

this implies a mo di cation of the ALU hard-

ware.

6 Availability

Several implementations of Square

5 Performance

in C and Java are available from the

URL http://www.esat.kuleuven.ac.be/

~rijmen/square.

We implemented the Square ciphers on two

More information on Rijndael and a refer- di erent typ es of micropro cessors that are

ence implementation are available from the representative for Smartcards in use to day.

URL http://www.esat.kuleuven.ac.be/ These implementations have b een optimized

~rijmen/rijndael . towards minimal RAM usage and execution

Cipher Co de size Required RAM Numb er of cycles Execution time

Key, blo ck length bytes bytes msec

BKSQ96,96 900 28 6500 1.6

Square128,128 919 36 6800 1.7

Rijndael128,128 919 36 8390 2.1

Rijndael192,128 1170 44 10780 2.7

Rijndael256,128 1135 52 12490 3.1

Table 1: Co de size, required RAM and execution time for the square ciphers in Motorola

68HC08 Assembler.

Key,block length Co de size Numb er of cycles Execution time

bytes msec

768 4065 4.1

128,128 826 3744 3.7

1016 3168 3.2

192,128 1125 4512 4.5

256,128 1041 5221 5.2

Table 2: Co de size and execution time for Rijndael in Intel 8051 assembler.

References

[1] J. Daemen, L.R. Knudsen, V. Rijmen,

\The Square encryption algorithm," Dr.

Dobb's Journal, Vol. 22, No. 10, Octob er

1997, pp. 54{56.

[2] J. Daemen and V. Rijmen, \The Rijndael

blo ck cipher," presented at the First Ad-

vanced Encryption Standard Conference,

Ventura California, 1998, available from

URL http://www.nist.gov/aes.

[3] J. Daemen and V. Rijmen, \The Blo ck

Cipher BKSQ," Proc. of CARDIS'98,

LNCS, Springer-Verlag, to app ear.

[4] P.C. Ko cher, \Timing attacks on imple-

mentations of Die-Hellman, RSA, DSS

and other systems," Advances in Cryptol-

ogy, Proceedings Crypto'96, LNCS 1109,

N. Koblitz, Ed., Springer-Verlag, 1996,

pp. 146{158.

[5] NIST's AES Homepage:

http://www.nist.gov/aes.