[11] Sho ji Miyaguchi, Akira Shiraishi, Akihiro Shimizu, Fast Data

Algorithm FEAL-8, Review of electrical communications lab oratories, Vol. 36,

No. 4, pp. 433{437, 1988.

[12] Luke O'Connor, On the Distribution of Characteristics in Bijective Mappings,

Lecture Notes in Computer Science, Advances in Cryptology, pro ceedings of

EUROCRYPT'93, to app ear.

[13] Luke O'Connor, On the Distribution of Characteristics in Composite Permuta-

tions, Lecture Notes in Computer Science, Advances in Cryptology, pro ceedings

of CRYPTO'93, to app ear.

[14] , On the Security of DES, Lecture Notes in Computer Science,

Advances in Cryptology, pro ceedings of CRYPTO'85, pp. 280{281, 1985.

[15] Akihiro Shimizu, Sho ji Miyaguchi, Fast Data Encryption Algorithm FEAL,

Lecture Notes in Computer Science, Advances in Cryptology, pro ceedings of

EUROCRYPT'87, pp. 267{278, 1987.

Technion - Computer Science Department Technical Report CS0813 1994 17

to linear . We also describ ed how to sum up characteristics (which also

hold in di erential cryptanalysis).

8 Acknowledgments

Acknowledgment: This research was supp orted by the fund for the promotion of

research at the Technion. Some of the results in this pap er were found using variants

of programs, whose originals were written by Ishai Ben-Aroya.

References

[1] Eli Biham, Adi Shamir, Di erential Cryptanalysis of the Data Encryption

Standard, Springer-Verlag, 1993.

[2] Eli Biham, Adi Shamir, Di erential Cryptanalysis of DES-like Cryptosystems,

Journal of Cryptology,Vol. 4, No. 1, pp. 3{72, 1991.

[3] Eli Biham, Adi Shamir, Di erential Cryptanalysis of , Khafre, REDOC-

II, LOKI and Lucifer, technical rep ort CS91-18, Department of Applied

Mathematics and Computer Science, The Weizmann Institute of Science, 1991.

The extended abstract app ears in Lecture Notes in Computer Science, Advances

in Cryptology, pro ceedings of CRYPTO'91, pp. 156{171, 1991.

[4] Don Copp ersmith, The (DES) and its Strength

Against Attacks, technical rep ort, IBM Thomas J. Watson Research Center, RC

18613 (81421), Decemb er 1992.

[5] D. W. Davies, Investigation of a Potential Weakness in the DES Algorithm,

1987, private communication.

[6] Xuejia Lai, James L. Massey, Sean Murphy, Markov Ciphers and Di erential

Cryptanalysis, Lecture Notes in Computer Science, Advances in Cryptology,

pro ceedings of EUROCRYPT'91, pp. 17{38, 1991.

[7] Xuejia Lai, On the Design and Security of Block Ciphers, Ph.D. thesis, Swiss

Federal Institue of Technology, Zurich, 1992.

[8] , Atsuhiro Yamagishi, A New Method for Known Plaintext Attack

of FEAL Cipher, Lecture Notes in Computer Science, Advances in Cryptology,

pro ceedings of EUROCRYPT'92, pp. 81{91, 1992.

[9] M. Matsui, Linear Cryptanalysis Method for DES Cipher, Abstracts of

EUROCRYPT'93, pp. W112{W123, May 1993.

Technion - Computer Science Department Technical Report CS0813 1994

[10] Ralph C. Merkle, Fast Software Encryption Functions, Lecture Notes in Computer

Science, Advances in Cryptology, pro ceedings of CRYPTO'90, pp. 476{501,

1990. 16

characteristics is:

= 86 81 00 00 86 81 00 00

P x

0 0

A = 86 81 00 00 a = 81 00 00 00 with probability 1=2 1=8 x

x F

0 0

B = 07 81 00 00 b = 81 00 00 00 with probability 1=2 1=4 x

x F

0 0

C = 07 81 00 00 c = 81 00 00 00 with probability 1=2 1=4 x

x F

0 0

D = 86 81 00 00 d = 81 00 00 00 with probability 1=2 1=8 x

x F

= 86 81 00 00 86 81 00 00 :

T x

11

The iteration of this characteristic to seven rounds have probability 1=2 2 . A

similar characteristic exist with a reverse order of the bytes in each word. From the

112 24

tables in [9] we can see that ab out 4  2 = 2 known plaintexts are required to

25

attack Feal-8, with success rate ab out 78% and that 2 known plaintexts are required

for success rate ab out 97%. This characteristic can b e used to attack Feal-N with up

to 20 rounds, with a complexity (and known plaintexts) smaller than of exhaustive

search. The attack on Feal-8 was applied successfully on a p ersonal computer. It

24

takes ab out 10 minutes to encrypt the 2 required known plaintexts and to nd the

key.

7 Summary

In this pap er we studied Matsui's linear cryptanalysis. We showed that the formalism

of di erential cryptanalysis can b e adopted to linear cryptanalysis. In particular, we

showed that characteristics can b e de ned, concatenated, and used in a very similar

manner as in di erential cryptanalysis. Constraints on the size of S b oxes were

Technion - Computer Science Department Technical Report CS0813 1994

describ ed. Matsui's characteristic used to attack DES in his pap er is shown to b e the

b est characteristic which has only up to one active S b ox at each round; on the other

24

hand, we improved his results on Feal. We attack Feal-8 using 2 known plaintexts

with linear cryptanalysis. Davies' attack on DES[5] was shown to b e closely related 15

found two ve-round characteristic with probability 1=2 + 1=32. One of them is:

= 04 01 00 00 11 00 04 00

P x

0 0

A = 04 01 00 00 a = 01 00 00 00 with probability 1 x

x F

0 0

B = 10 00 04 00 b = 04 01 00 00 with probability 1=2 + 1=8 x

x F

0 0

= 0 c = 0 with probability 1

C F

0 0

D = 10 00 04 00 d = 04 01 00 00 with probability 1=2 + 1=8 x

x F

0 0

E = 04 01 00 00 e = 01 00 00 00 with probability 1 x

x F

= 04 01 00 00 11 00 04 00 :

T x

We have found several iterative characteristics of Feal, which can b e used to attack

24

Feal-8 using ab out 2 known plaintexts with a smaller computation complexity. This

is a much b etter tradeo than in Matsui's attacks on Feal-8, which required either

28 50 15

2 known plaintexts, for which the complexity of the analysis is 2 , or 2 known

64

plaintexts, for which the complexity of the analysis is 2 . One of these iterative

Technion - Computer Science Department Technical Report CS0813 1994 14

probabilities, among them at least eight one-round characteristics with probability

1=2  1=4.

In his attack[8] Matsui uses linearities which can b e formalized by the following

three-round characteristic with probability 1:

= 04 04 04 00 00 01 00 00

P x

0 0

A = 04 04 04 00 a = 00 01 00 00 with probability 1=2 1=2 = 0 x

x F

0 0

= 0 b = 0 with probability 1

B F

0 0

C = 04 04 04 00 c = 00 01 00 00 with probability 1=2 1=2 = 0 x

x F

= 04 04 04 00 00 01 00 00

T x

In his attack he sets this characteristic in rounds 3{5 and tries exhaustively values of

bits of the subkeys in rounds 1{2 and 6{8, with some auxiliary techniques. We have

Technion - Computer Science Department Technical Report CS0813 1994 13

0

= (00 01 04 00 ;R )

P x

0 0

A = 00 01 04 00 a = 00 00 01 01 with probability 1=2 + 1=2 = 1 x

x F

0

= (00 01 04 00 ;R  00 00 01 01 ):

T x x

and

0

= (00 00 01 04 ;R )

P x

0 0

A = 00 00 01 04 a = 00 00 00 01 with probability 1=2 1=2 = 0 x

x F

0

= (00 00 01 04 ;R  00 00 00 01 ):

T x x

The other 11 one-round characteristics with probability 1=2  1=2 can b e derived by

0

combining any numb er of these four characteristics by XORing the values of their a

0 0 0

into the new a and XORing the values of their A into the new A . For example,

the following characteristics results from a combination of the rst three of the ab ove

four characteristics:

0

= (04 04 04 00 ;R )

P x

0 0

A = 04 04 04 00 a = 00 01 00 00 with probability 1=2 1=2 = 0 x

x F

0

= (04 04 04 00 ;R  00 01 00 00 ):

T x x

These combinations are valid since no S b ox is active in two or more of the original

Technion - Computer Science Department Technical Report CS0813 1994

characteristics. Such combinations are also applicable in di erential cryptanalysis,

whenever they do not involve the same S b ox active in more than one characteris-

tic. We have also found several additional linear characteristics of Feal with smaller 12

24

characteristic with probability ab out 1=2 + 2 . We have exhaustively veri ed that

this iterative characteristic is the b est among all the characteristics with at most one

active S b ox at each round, and that Matsui's 16-round characteristic is the b est

characteristics under the same restriction (Matsui claims that his characteristic is the

b est without any restriction).

6 Application to Feal

In [8] Matsui describ ed a preliminary version of linear cryptanalysis and used it to

attack Feal[15,11 ]. For Feal there are 15 (non-trivial) one-round characteristics with

probability 1=2  1=2, based on the linearity of the least signi cant bits in the addition

op eration (a similar e ect o ccurs also in di erential cryptanalysis of Feal, in which

characteristics with probability 1 are based on the elimination of the carry from the

most signi cant bit). The Four basic one-round characteristics with probability 1=2 

1=2 are:

0

= (04 01 00 00 ;R )

P x

0 0

A = 04 01 00 00 a = 01 00 00 00 with probability 1=2 + 1=2 = 1 x

x F

0

= (04 01 00 00 ;R  01 00 00 00 ):

T x x

0

= (00 04 00 00 ;R )

P x

0 0

A = 00 04 00 00 a = 01 01 01 01 with probability 1=2 1=2 = 0 x

x F

0

= (00 04 00 00 ;R  01 01 01 01 ):

T x x

Technion - Computer Science Department Technical Report CS0813 1994 11

5 Application to DES

Matsui's 16-round linear approximation can b e viewed as a 16-round characteristic.

This characteristic is based on the following eight-round iterative characteristic:

= 21 04 00 80 00 00 00 00

P x

0 0

A = 21 04 00 80 = a = 00 00 80 00 with probability 1=2 20=64 x

x F

P (00 00 F 0 00 ) one a ected key bit

x

0 0

B = 00 00 80 00 = b = 20 00 00 00 with probability 1=2 2=64 x

x F

P (40 00 00 00 ) one a ected key bit

x

0 0

C = 01 04 00 80 = c = 00 00 80 00 with probability 1=2 + 10=64 x

x F

P (00 00 E 0 00 ) one a ected key bit

x

0 0

= 0 d = 0 always

D F

no a ected key bits

0 0

E = 01 04 00 80 = e = 00 00 80 00 with probability 1=2 + 10=64 x

x F

P (00 00 E 0 00 ) one a ected key bit

x

0 0

F = 00 00 80 00 = f = 20 00 00 00 with probability 1=2 2=64 x

x F

P (40 00 00 00 ) one a ected key bit

x

0 0

G = 21 04 00 80 = g = 00 00 80 00 with probability 1=2 20=64 x

x F

P (00 00 F 0 00 ) one a ected key bit

x

0 0

= 0 h = 0 always

H F

no a ected key bits

Technion - Computer Science Department Technical Report CS0813 1994

= 00 00 00 00 21 04 00 80 :

T x

27

This characteristic has probability ab out 1=2 + 2 . By iterating it to 16 rounds

and replacing the rst and last rounds by lo cally b etter ones, Matsui got a 16-round 10

themselves (in the rst m columns), and the output values of the S b ox (in the other

n columns).

Each column of M contains one bit from each input/output pair of the S b ox.

Linear combinations of subsets of the input/output bits of the S b ox are represented

by linear combinations of the columns. We say that a subset of bits of the input and

output of the S b ox form a linear combination if for all inputs the linear combination

of these bits is zero. We say that a subset of bits of the input and output of the S

b ox form an ane combination if for all inputs the linear combination of these bits is

a constant (either all zero, or all one). Equivalently, a subset of the bits of the input

and output of the S b ox form a linear combination if the columns of M are linearly

dep endent, and a subset of the bits of the input and output of the S b ox form an

ane combination if the columns of M and the all one vector are linearly dep endent.

0

De ne M to b e the matrix formed by M with one additional column with all the

0

values ones: M = [Mj1]. Thus, if the rank of M equals the numb er of its columns

0

m + n, there are no linear combinations in the S b ox, and if the rank of M equals the

numb er of its columns m + n + 1, there are no ane combinations in the S b ox. The

0

S b ox has an ane combination of its input and output bits if rank(M ) < m + n + 1.

0 m m

Since the numb er of rows of M and M is 2 , the maximal rank is 2 . Therefore,

m

if n  2 m the S b ox must have an ane combination of the input/output

bits. These ane combinations cause entries with probability 1=2  1=2 in the linear

approximation table, which can b e a ma jor threat to the security of the cryptosystem.

m

Similarly, if n  2 the S b ox must have an ane combination of a subset of only

output bits, which do es not dep end on the input bits at all! Such combinations cause

(in many cases of DES-like cryptosystems) the existence of a two-round iterative

characteristic with probability 1=2  1=2 (of the form 0 ! X ), and thus enable

attacks which require only a few known plaintexts!

These ane combinations also hold as ane combinations of the bits of the di er-

ences predicted in di erential cryptanalysis. We do not know whether in di erential

cryptanalysis these linearities also p ose a ma jor threat.

Technion - Computer Science Department Technical Report CS0813 1994 9

S b oxes Probability Known Plaintexts Davies' Attack

33 66 66

S1{S2 1=2 + 2 2 2

36 73 69

S2{S3 1=2 + 2 2 2

44 89 86

S3{S4 1=2 + 2 2 2

36 73 71

S4{S5 1=2 + 2 2 2

36 73 72

S5{S6 1=2 + 2 2 2

33 66 66

S6{S7 1=2 + 2 2 2

28 57 57

S7{S8 1=2 + 2 2 2

40 79 77

S8{S1 1=2 + 2 2 2

Table 2. Results of Linear Cryptanalysis of DES using Davies' Characteristics.

the attack based on linear cryptanalysis are given in Table 2, along with the numb er

of known plaintexts required for the original Davies' attacks based on the same pairs

2

of S b oxes . Notice that the results of these two attacks are very similar.

4 Constraints on the Size of the S Boxes

In this section we show new constraints on the size of S b oxes. Researchers have

already studied the di erential b ehavior of the size of S b oxes. For example, Luke

O'connor[12,13 ] analyzed the di erential b ehavior of bijective S b oxes and of comp os-

ite S b oxes, and concluded that for large enough S b oxes, even random S b oxes are

immune against di erential cryptanalysis. However, there was no result on required

relationships b etween the input size of the S b oxes and their output size. In this

section we show such a relationship.

In di erential cryptanalysis we can easily reduce the probability of all the entries

in the di erence distribution tables of the S b oxes by increasing the numb er of output

bits of the S b oxes. Whenever the numb er of output bits of an S b ox is (suciently)

larger than the numb er of its input bits, it is very likely that the entries in the

di erence distribution table will have only values 0 and 2; thus all the p ossible entries

have the same low probability.

Examples of cryptosystems which use such S b oxes are [10].

The attack on Khafre[3,1] used exactly these prop erties, but still it used the sp eci c

structure of Khafre.

Linear cryptanalysis adds a new criteria for this relationship. We identi ed that

whenever the numb er of output bits is large enough, there must b e linear and ane

relations b etween these bits, which hold for all the p ossible inputs of the S b ox.

Denote the numb er of input bits by m, and the numb er of output bits by n. We

m

can now describ e the S b ox by a binary matrix M with 2 rows, corresp onding to m

Technion - Computer Science Department Technical Report CS0813 1994

the 2 inputs of the S b ox, and with m + n columns, which contain the input values

2

The numb er of known plaintexts required for Davies' attack were calculated using the equations

given in [5]. 8

analysis, the internal information contains the information ab out the subset of key

bits participating in the linearization. Thus, if two characteristics with the same val-

ues of and and with a similar probability exist, they might cancel the e ect

P T

of each other if the parity of the subset of the key bits is not the same (or if their

probabilities are the complement of each other and the parity of the subset of their

key bits is the same). Therefore, we should b e much more careful when we claim for

linear cryptanalysis' characteristics. However, if the attacker knows of all the di erent

characteristics whose e ect might b e canceled, he can nd one (parity) bit of the key

whenever he identi es that the e ect is canceled.

Davies[5] investigated an attack against DES based on the non-uniform distri-

bution of the outputs of pairs of adjacent S b oxes, when their inputs are uniformly

distributed. He assumes the uniform distribution in the inputs to the even rounds (or

alternatively the o dd rounds), and studies the resultant distribution in the outputs of

these rounds. As a result, he receives a key-dep endent distribution, which dep ends on

the parity of several key bits. Using a large sample of known plaintexts, he can nd

this bit. His algorithm can b e applied to any pair of adjacent S b oxes, and to even

or o dd rounds, thus he can nd up to 16 p otential parity bits of the key. His attack

1

is strongly related to linear cryptanalysis , and has a linear cryptanalysis variant. In

the even rounds (o dd rounds) the characteristics have zero values in the input and

non-zero values in the outputs, since the inputs are not involved in the linearization,

but the output are involved. In the other rounds, b oth inputs and outputs have zero

values. Thus, we receive the following two-round iterative characteristic for the S

b oxes S7/S8 (and similar characteristics for other adjacent S b oxes):

= 0A 10 0C 21 00 00 00 00

P x

0 0

A = 0A 10 0C 21 = a = 0 with probability 1=2 3=64

x F

P (00 00 00 FD ) four a ected key bits

x

0 0

= 0 b = 0 always

B F

no a ected key bits

= 00 00 00 00 0A 10 0C 21 :

T x

Each of the S b oxes has a linear approximation b etween the two common bits to a

subset of the four output bits. In S7: 03 ! F with probability 1=2 + 8=64 and

x x

Technion - Computer Science Department Technical Report CS0813 1994

in S8: 30 ! D with probability 1=2 12=64. The total probabilities of these

x x

characteristics iterated to 16 rounds and the required numb er of known plaintexts for

1

Davies studies the overall distribution of the output bits of the S b oxes, while linear cryptanalysis

studies only the parity of these bits. Thus, Davies' attack is not a sp ecial case of linear cryptanalysis. 7

the average b etween all the p ossible random keys. In practice, in DES the probability

holds for all the keys, due to the design rules of the S b oxes[4 ].

We can also concatenate characteristics (and de ne n-round characteristics recur-

sively):

1 1 1 1

De nition 3 An n-round characteristic = ( ; ; ; 1=2 + p ) can be con-

1

P T K

2 2 2 2 1

catenated with an m-round characteristic = ( ; ; ; 1=2 + p ) if equals

2

P T K T

2

the swapp ed value of the two halves of . The concatenation of the character-

P

1 2

istics and (if they can b e concatenated) is the (n + m)-round characteristic

1 2 1 2

= ( ; ;  ; 1=2 + 2  p  p ).

1 2

P T K K

When we concatenate l characteristics (that can b e concatenated) the probability of

Q

l

l1

the resultant characteristic is 1=2 + p = 1=2 + 2  p .

i

i=1

A strange situation o ccurs for n-round characteristics: Whenever an XOR op era-

tion exists in the cryptosystem (excluding XORs with subkeys within the F -function),

the values of b oth its arguments in the characteristic must b e the same, and this value

is also the output of the \XOR op eration". Whenever the data is duplicated (when

the right half of the data is input to the F -function and also b ecomes the left half

of the next round), b oth \duplicated" outputs may not b e the same as the input,

only their XOR value should b e the same as the input. This phenomena is just the

opp osite to the usual op erations in the cryptosystem, where an XOR op eration XORs

the inputs and duplications duplicates the input; in our case, an XOR op eration du-

plicates the input, and duplications XOR the input with one of the original outputs

to form the second output. This phenomena causes a basic di erence b etween linear

cryptanalysis and di erential cryptanalysis, which can b e easily viewed in the one-

round characteristic with probability 1: the free variable in the linear cryptanalysis

characteristic is at the right half, while in di erential cryptanalysis it is at the left

half.

This phenomena is easily understo o d when we remind the meaning of the values

in the characteristics: they are not actual values, neither XORs of actual values; They

only describ e the subset of bits whose parity is statistically known. In order to know

the parity of bits of the output of an XOR op eration, we should know the parities

of the same subsets of bits b oth inputs, and then we known the parity of the same

subset of the output.

When we duplicate the data, we may know parity of a subset of bits. However,

since we do not wish to use these bits twice (in which case one use will cancel the

other use by the parity), we should use each set bit once, either in one output or in

the other output. It is also p ossible to use a bit which is not set in the input to the

duplication, in which case a zero bit b ecome one in b oth outputs. In this case, b oth

usages cancel each other by their parity, and thus the same e ect as of the original

zero remains.

An imp ortant di erence b etween linear cryptanalysis and di erential cryptanal-

Technion - Computer Science Department Technical Report CS0813 1994

ysis is the ability to use di erentials[6,7], in which only the values of and

P T

matter. In di erential cryptanalysis, whenever several characteristics have the same

values for and , they are develop ed on top of each other: they can b e viewed

P T

as one di erential, and their internal information can b e ignored. In linear crypt- 6

a random input blo ck P , and its one-round encryption C under a random subkey K

satis es P   C   K  = 0, where `' denotes binary scalar pro duct of two

P T K

binary vectors, is the subset of bits of the data b efore the round, is the subset

P T

of bits of the data after the round, and is the subset of bits of the key whose

K

parity is approximated.

As in di erential cryptanalysis, it is quite easy to derive one-round characteristics

with one active S b ox: we only have to cho ose a non-zero entry in one of the S b oxes,

and cho ose the subsets , , as the round-function requires. The following

P T K

one-round characteristic has only one active S b ox, and it was chosen to maximize

the probability, thus it uses the maximal entry in S5:

0

= (21 04 00 80 ;R )

P x

0 0

A = 21 04 00 80 = a = 00 00 80 00 with probability 1=2 20=64 x

x F

P (00 00 F 0 00 ) one a ected key bit

x

0

= (21 04 00 80 ;R  00 00 80 00 ):

T x x

The b est one-round characteristic do es not have any active S b ox. This characteristic

has probability 1:

0

= (0;R )

P

0 0

= 0 a = 0 with probability 1

A F

no a ected key bits

0

= (0;R ):

T

We can also derive one-round characteristics with more than one active S b oxes:

in this case we should cho ose the entries in two or more S b oxes. However, unlike in

di erential cryptanalysis, we do not need to have the same values in common input

bits of b oth S b oxes (due to the E expansion), so if we a ect bits common to two

Technion - Computer Science Department Technical Report CS0813 1994

S b oxes, it is not necessary that b oth S b oxes would b e active. Moreover, if b oth

S b oxes are active, the value of the common input bits b ecomes the XOR of their

values from b oth S b oxes, since we use the same bit twice in a linear equation, and

thus it cancels itself. Note that in theory, the probability we receive in that way is 5

0

the probabilities with the notation p by their distance from half p = 1=2 + p . Then,

i i

i

the combined probability is 1=2 + p = 1=2 + 2  p  p . In general, if an approximation

1 2

Q

l

l1

p . consists of l S b oxes, the combined probability is 1=2 + p = 1=2 + 2 

i

i=1

When a linear approximation with probability 1=2 + p is known to the attacker,

2

he can mount an attack which requires ab out p known plaintexts; these plaintexts

can b e randomly chosen, but all of them must b e encrypted under the same key, and

the ciphertexts should b e known to the attacker as well.

The basic metho d of linear cryptanalysis nds only one bit of the key, which is

a parity of a subset of the key bits. Auxiliary techniques of reducing the numb er

of rounds of the approximations, by eliminating the rst and/or last rounds, and

counting on all the key bits a ecting the data at the rounds not in the approximation

can reduce the numb er of plaintexts required, and increase the numb er of key bits

that the attack nds.

3 A Study of Linear Cryptanalysis

Before we formalize the linear approximations by de ning characteristics, we feel it

is very imp ortant to mention that the bits we set in the characteristics are not the

actual values of bits (or bit-di erences as in di erential cryptanalysis); the bits we

set denote the subset of bits whose parity is approximated. The exp ected parity itself

is not directly denoted; however, the reader can easily identify the exp ected parity

from the probability of the characteristic: if the probability is more than half, the

exp ected parity is zero, and if the probability is less than half, the exp ected parity is

one.

Another very imp ortant topic is the key space used in the analysis of linear crypt-

analysis. There is a di erence b etween the key space of the analyzed cryptosystem

and the key space that the attack can handle. In di erential cryptanalysis it was

mentioned that the attacks assume that indep endent keys are used. The indep endent

keys were de ned as follows[1]:

De nition 1 An independent key is a list of subkeys which is not necessarily deriv-

able from some key via the key scheduling algorithm.

Each key in the cryptosystem's key space has an equivalent indep endent key derived

by the key scheduling algorithm. We observe that linear cryptanalysis also assumes

the use of indep endent keys. The theoretical analysis of systems with dep endent keys

are much harder. However, in practice it can b e very well estimated by the analysis

of the indep endent key variants. Therefore, Matsui's metho d to nd 14 bits of the

subkeys still hold even if indep endent keys are used. Other auxiliary metho ds can

then b e used to nd the other bits of the rst and the last subkeys (p ossibly using

additional characteristics), and to reduce the cryptosystem to a cryptosystem with a

Technion - Computer Science Department Technical Report CS0813 1994

smaller numb er of rounds, which is easier to analyze.

De nition 2 A one-round characteristic is a tuple ( ; ; ; 1=2 + p), in which

P T K

( ) = ( ) = A, ( )  ( ) = a, and in which 1=2 + p is the probability that

P L T L P R T R 4

Input Output subset

subset 0 1 2 3 4 5 6 7 8 9 A B C D E F

x x x x x x x x x x x x x x x x

0 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

x

1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

x

2 0 4 -2 2 -2 2 -4 0 4 0 2 -2 2 -2 0 -4

x

3 0 0 -2 6 -2 -2 4 -4 0 0 -2 6 -2 -2 4 -4

x

4 0 2 -2 0 0 2 -2 0 0 2 2 4 -4 -2 -2 0

x

5 0 2 2 -4 0 10 -6 -4 0 2 -10 0 4 -2 2 4

x

6 0 -2 -4 -6 -2 -4 2 0 0 -2 0 -2 -6 -8 2 0

x

7 0 2 0 2 -2 8 6 0 -4 6 0 -6 -2 0 -6 -4

x

8 0 0 2 6 0 0 -2 -6 -2 2 4 -12 2 6 -4 4

x

9 0 -4 6 -2 0 -4 -6 -6 6 -2 0 -4 2 -6 -8 -4

x

A 0 4 0 0 -2 -6 2 2 2 2 -2 2 4 -4 -4 0

x

B 0 4 4 4 6 2 -2 -2 -2 -2 -2 2 0 -8 -4 0

x

C 0 2 0 -2 0 2 4 10 -2 4 -2 -8 -2 4 -6 -4

x

D 0 6 0 2 0 -2 4 -10 -2 0 -2 4 -2 8 -6 0

x

E 0 -2 -2 0 -2 4 0 2 -2 0 4 2 -4 6 -2 -4

x

F 0 -2 -2 8 6 4 0 2 2 4 8 -2 8 -6 2 0

x

10 0 2 -2 0 0 -2 -6 -8 0 -2 -2 -4 0 2 10 -20

x

11 0 2 -2 0 4 2 -2 -4 4 2 2 0 -8 -6 2 4

x

12 0 -2 0 -2 2 -4 -2 -8 4 6 4 6 -2 4 -6 0

x

13 0 -6 0 2 -2 4 2 0 4 -6 4 2 -6 4 -2 0

x

14 0 4 -4 0 0 0 0 0 -4 -4 4 4 0 4 -4 0

x

15 0 4 0 -4 -4 4 -8 -8 0 0 -4 4 8 4 0 4

x

16 0 0 6 6 2 -2 4 0 4 0 6 2 2 2 0 0

x

17 0 4 -6 -2 6 -2 -4 4 4 -4 -6 2 -2 2 0 4

x

18 0 6 0 2 4 -10 -4 2 2 0 -2 0 2 4 -2 -4

x

19 0 2 4 -6 0 -2 4 -2 6 8 6 4 10 0 2 -4

x

1A 0 2 2 -8 -2 4 0 2 -2 0 4 2 0 -2 -2 0

x

1B 0 2 6 -4 -6 0 0 2 6 8 0 -2 -4 -6 -2 0

x

1C 0 0 -2 2 4 0 -6 2 -2 6 -4 0 2 -2 0 0

x

1D 0 4 -2 6 -8 0 -2 2 10 -2 -8 -8 2 2 0 4

x

1E 0 -4 -8 0 -2 -2 -2 2 -2 2 -2 6 4 4 4 0

x

1F 0 -4 8 -8 2 -6 -6 -2 -2 2 -2 -2 -8 0 0 -4

x

20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

x

21 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

x

22 0 -4 -2 2 -2 2 -4 8 -4 0 -6 6 2 -2 -16 -12

x

23 0 0 -2 -2 6 -2 -4 4 0 0 -2 -2 -2 6 4 -4

x

24 0 -2 6 4 0 6 -2 4 4 -6 -2 4 0 14 2 0

x

25 0 6 2 0 0 6 2 0 -4 -6 2 -8 0 -2 6 -4

x

26 0 2 4 -2 -2 0 2 -4 4 -2 -4 -2 6 0 -2 0

x

27 0 -10 0 -2 6 4 6 -4 0 6 -12 2 2 0 6 -4

x

28 0 4 -2 -2 0 4 -6 2 2 -6 4 0 6 -2 -4 0

x

29 0 0 2 6 0 0 6 2 2 -2 -8 0 -2 -6 0 0

x

2A 0 0 -4 -8 6 6 6 -6 6 2 -2 -2 -8 4 -4 4

x

2B 0 8 0 4 6 -2 -6 6 2 6 -2 6 -4 0 4 4

x

2C 0 2 4 -6 0 -6 0 6 -2 -4 2 -4 -2 4 6 0

x

2D 0 -2 -4 -2 0 -2 -8 2 -2 0 -6 -8 -2 0 -2 4

x

2E 0 6 2 -4 6 4 4 -2 -10 -8 0 -2 4 -2 2 0

x

2F 0 6 -6 -4 6 -4 4 -2 2 4 4 -6 0 2 -2 -4

x

30 0 2 -2 0 -4 -6 -2 -4 4 2 2 0 0 2 2 4

x

31 0 2 -2 0 0 -2 2 0 0 -2 -2 -4 0 2 2 4

x

32 0 6 0 -2 -2 8 2 4 0 10 0 2 -2 4 2 0

x

33 0 -6 0 10 2 0 -2 -4 0 6 0 -10 2 4 -2 0

x

34 0 0 -12 4 -4 0 4 -8 -4 0 -4 0 -4 -4 0 0

x

35 0 -8 0 0 8 -4 4 0 0 -4 -4 0 4 4 -4 4

x

36 0 4 -2 -6 -2 -2 8 0 4 -4 -2 -2 6 2 -4 0

x

37 0 -8 -6 -6 -6 6 0 4 12 0 2 -2 2 2 4 -4

x

38 0 2 4 -6 0 -2 4 -2 -6 4 -6 0 6 4 -2 0

x

39 0 -2 8 2 -4 6 -4 -6 -2 -4 2 4 -2 0 2 0

x

3A 0 6 -10 0 2 4 0 -2 6 -4 0 2 4 -2 -2 -4

x

3B 0 -2 -6 -4 -10 0 -8 -2 -10 4 4 -2 0 2 -2 4

x

3C 0 -8 -6 -2 0 -4 2 2 -6 2 4 0 10 -2 4 4

x

3D 0 4 2 2 4 4 -2 2 -2 10 0 0 2 2 4 0

x

3E 0 -4 4 -4 2 2 -2 2 2 -2 -2 -2 4 -4 0 4

x

3F 0 -4 -4 -4 14 6 -6 -2 2 -2 6 -2 0 0 -4 0

x

Table 1. The Linear Approximation Table of S5.

Technion - Computer Science Department Technical Report CS0813 1994 3

Since all the op erations in DES, except the S b oxes, are linear, it suces to derive

linear relations of the S b oxes. These relations are derived for each S b ox by cho osing

a subset of the input bits and the output bits, calculating the parity (exclusive-or) of

these bits for each of the p ossible inputs of the S b ox, and counting the numb er of

inputs whose subset's parity is zero. If the S b ox is linear in the bits of the subset, all

the inputs must have a zero parity of the subset. If the S b ox is ane in the bits of

the subset, all the inputs must have parity 1. Usually, a subset will have many inputs

with parity 0 and many inputs with parity 1. As the numb er of zero es is closer to

the numb er of ones, we will say that the subset is more non-linear. The least linear

subset under this de nition is one whose half of the inputs have parity zero, and the

other half inputs have parity 1.

Matsui has calculated the numb er of zero parities for each of the 64  16 = 1024

p ossible subsets of the input and the output bits of each S b ox. To represent the

subsets' linearity in a simple manner, he subtracts from these numb ers the numb er of

half of the inputs. This way, zero values denote non-linear subsets, and high absolute

values denote linear/ane or close to linear/ane subsets. A table which describ es

all these values for all the p ossible subsets of an S b ox is called a linear approximation

table of the S b ox. Table 1 is the linear approximation table of S5 of DES. In this

linear approximation table, we can see that 30% of the entries have value zero.

The highest absolute value in the linear approximation table of S5 is 20 in entry

(10 ,F ). Therefore, only in 12 out of the 64 inputs, the parity of the four output bits

x x

is the same as the value of the second input bit! This entry was actually describ ed by

Shamir[14] in 1985, but it was later describ ed as a necessity from the design criteria

of DES, and nob o dy knew to p oint out whether it weakens DES. This sp eci c entry,

which is the most linear entry of all the S b oxes of DES, is actually one of the three

entries used in Matsui's attack.

Matsui's solution was to nd a statistical linear expression consisting of a parity

of subsets of the plaintext, ciphertext and the key, which is derived from similar

expressions of the various rounds. Thus, the parity of some set of data bits in each

round is known as a function of the parity of the previous set of bits in the previous

round and the parity of several key bits. The round-linearization is based on the

linearization of the S b oxes. If we would XOR the same value to the two halves of

the data, we would remain with the same parity as b efore the XOR. Since the subset

of the input bits is statistically linear/ane to the subset of the output bits, the

parity of the data after the XOR is usually the parity b efore the XOR XORed with

a particular key-dep endent constant.

The probability that the approximation in an S b ox is valid is given as the distance

0

from half; for example the probability of the ab ove entry with value 20 is p =

0

12=64 = 1=2 20=64. An entry with value 0 has probability p = 1=2; such an entry

is useless to attack an cryptosystem. Any other non-zero value (either p ositive or

negative) can b e used in attacks. An approximation may involve more than one S

Technion - Computer Science Department Technical Report CS0813 1994

b ox. We will follow Copp ersmith[4] and call the S b oxes involved in the linearization

active S b oxes. The probability of an approximation with two active S b oxes is

0 0 0 0

), since the parity is even if either b oth parities of the )  (1 p + (1 p  p p

2 1 2 1

approximations of the two S b oxes are zero, or b oth are one. For simplicity we denote 2

On Matsui's Linear Cryptanalysis

Eli Biham

Computer Science Department

Technion - Institute of Technology

Haifa 32000, Israel

Abstract

In [9] Matsui intro duced a new metho d of cryptanalysis, called Linear Crypt-

47

analysis. This metho d was used to attack DES using 2 known plaintexts. In

this pap er we formalize this metho d and show that although in the details level

this metho d is quite di erent from di erential cryptanalysis, in the structural

level they are very similar. For example, characteristics can b e de ned in lin-

ear cryptanalysis, but the concatenation rule has several imp ortant di erences

from the concatenation rule of di erential cryptanalysis. We show that the

attack of Davies on DES is closely related to linear cryptanalysis. We describ e

constraints on the size of S b oxes caused by linear cryptanalysis. New results

to Feal are also describ ed.

1 Intro duction

In EUROCRYPT'93 Matsui intro duced a new metho d of cryptanalysis, called Linear

47

Cryptanalysis [9]. This metho d was used to attack DES using 2 known plaintexts.

In this pap er we formalize this metho d and show that although in the details level

this metho d is quite di erent from di erential cryptanalysis[2,1], in the structural

level they are very similar. For example, characteristics can b e de ned in linear

cryptanalysis, but the concatenation rule has several imp ortant di erences from the

concatenation rule of di erential cryptanalysis. We show that the attack of Davies[5]

on DES is closely related to linear cryptanalysis. We describ e constraints on the

size of S b oxes caused by linear cryptanalysis. New results to Feal[15,11] are also

describ ed.

2 Overview of Linear Cryptanalysis

Technion - Computer Science Department Technical Report CS0813 1994

Linear cryptanalysis studies statistical linear relations b etween bits of the plaintexts,

the ciphertexts and the keys they are encrypted under. These relations are used

to predict values of bits of the key, when many plaintexts and their corresp onding

ciphertexts are known. 1