On Matsuis Linear

Eli Biham

Computer Science Department

Technion Israel Institute of Technology

Haifa Israel

Abstract

In Matsui introduced a new metho d of cryptanalysis called Linear Crypt

47

analysis This metho d was used to attack DES using known In

this pap er we formalize this metho d and show that although in the details level

this metho d is quite dierent from dierential cryptanalysis in the structural

level they are very similar For example characteristics can b e dened in lin

ear cryptanalysis but the concatenation rule has several imp ortant dierences

from the concatenation rule of dierential cryptanalysis We show that the

attack of Davies on DES is closely related to We describ e

constraints on the size of S b oxes caused by linear cryptanalysis New results

to Feal are also describ ed

Introduction

In EUROCRYPT Matsui introduced a new metho d of cryptanalysis called Linear

Cryptanalysis This metho d was used to attack DES using known plaintexts

In this pap er we formalize this metho d and show that although in the details level

this metho d is quite dierent from dierential cryptanalysis in the structural

level they are very similar For example characteristics can b e dened in linear

cryptanalysis but the concatenation rule has several imp ortant dierences from the

concatenation rule of dierential cryptanalysis We show that the attack of Davies

on DES is closely related to linear cryptanalysis We describ e constraints on the

size of S b oxes caused by linear cryptanalysis New results to Feal are also

describ ed

Overview of Linear Cryptanalysis

Technion - Computer Science Department Technical Report CS0813 1994

Linear cryptanalysis studies statistical linear relations b etween bits of the plaintexts

the and the keys they are encrypted under These relations are used

to predict values of bits of the when many plaintexts and their corresp onding

ciphertexts are known

Since all the op erations in DES except the S b oxes are linear it suces to derive

linear relations of the S b oxes These relations are derived for each S b ox by choosing

a subset of the input bits and the output bits calculating the parity exclusiveor of

these bits for each of the p ossible inputs of the S b ox and counting the number of

inputs whose subsets parity is zero If the S b ox is linear in the bits of the subset all

the inputs must have a zero parity of the subset If the S b ox is ane in the bits of

the subset all the inputs must have parity Usually a subset will have many inputs

with parity and many inputs with parity As the number of zero es is closer to

the number of ones we will say that the subset is more nonlinear The least linear

subset under this denition is one whose half of the inputs have parity zero and the

other half inputs have parity

Matsui has calculated the number of zero parities for each of the

p ossible subsets of the input and the output bits of each S b ox To represent the

subsets linearity in a simple manner he subtracts from these numbers the number of

half of the inputs This way zero values denote nonlinear subsets and high absolute

values denote linearane or close to linearane subsets A table which describ es

all these values for all the p ossible subsets of an S b ox is called a linear approximation

table of the S b ox Table is the linear approximation table of S of DES In this

linear approximation table we can see that of the entries have value zero

The highest absolute value in the linear approximation table of S is in entry

F Therefore only in out of the inputs the parity of the four output bits

x x

is the same as the value of the second input bit This entry was actually describ ed by

Shamir in but it was later describ ed as a necessity from the design criteria

of DES and nob o dy knew to p oint out whether it weakens DES This sp ecic entry

which is the most linear entry of all the S b oxes of DES is actually one of the three

entries used in Matsuis attack

Matsuis solution was to nd a statistical linear expression consisting of a parity

of subsets of the and the key which is derived from similar

expressions of the various rounds Thus the parity of some set of data bits in each

round is known as a function of the parity of the previous set of bits in the previous

round and the parity of several key bits The roundlinearization is based on the

linearization of the S b oxes If we would XOR the same value to the two halves of

the data we would remain with the same parity as b efore the XOR Since the subset

of the input bits is statistically linearane to the subset of the output bits the

parity of the data after the XOR is usually the parity b efore the XOR XORed with

a particular keydependent constant

The probability that the approximation in an S b ox is valid is given as the distance

0

from half for example the probability of the ab ove entry with value is p

0

An entry with value has probability p such an entry

is useless to attack an Any other nonzero value either p ositive or

negative can b e used in attacks An approximation may involve more than one S

Technion - Computer Science Department Technical Report CS0813 1994

b ox We will follow Copp ersmith and call the S b oxes involved in the linearization

active S b oxes The probability of an approximation with two active S b oxes is

0 0 0 0

since the parity is even if either b oth parities of the p p p p

approximations of the two S b oxes are zero or b oth are one For simplicity we denote

Input Output subset

subset A B C D E F

x x x x x x x x x x x x x x x x

x

x

x

x

x

x

x

x

x

x

A

x

B

x

C

x

D

x

E

x

F

x

x

x

x

x

x

x

x

x

x

x

A

x

B

x

C

x

D

x

E

x

F

x

x

x

x

x

x

x

x

x

x

x

A

x

B

x

C

x

D

x

E

x

F

x

x

x

x

x

x

x

x

x

x

x

A

x

B

x

C

x

D

x

E

x

F

x

Table The Linear Approximation Table of S

Technion - Computer Science Department Technical Report CS0813 1994

0

the probabilities with the notation p by their distance from half p p Then

i i

i

the combined probability is p p p In general if an approximation

Q

l

l

consists of l S b oxes the combined probability is p p

i

i

When a linear approximation with probability p is known to the attacker

he can mount an attack which requires ab out p known plaintexts these plaintexts

can b e randomly chosen but all of them must b e encrypted under the same key and

the ciphertexts should b e known to the attacker as well

The basic metho d of linear cryptanalysis nds only one bit of the key which is

a parity of a subset of the key bits Auxiliary techniques of reducing the number

of rounds of the approximations by eliminating the rst andor last rounds and

counting on all the key bits aecting the data at the rounds not in the approximation

can reduce the number of plaintexts required and increase the number of key bits

that the attack nds

A Study of Linear Cryptanalysis

Before we formalize the linear approximations by dening characteristics we feel it

is very imp ortant to mention that the bits we set in the characteristics are not the

actual values of bits or bitdierences as in dierential cryptanalysis the bits we

set denote the subset of bits whose parity is approximated The exp ected parity itself

is not directly denoted however the reader can easily identify the exp ected parity

from the probability of the characteristic if the probability is more than half the

exp ected parity is zero and if the probability is less than half the exp ected parity is

one

Another very imp ortant topic is the key space used in the analysis of linear crypt

analysis There is a dierence b etween the key space of the analyzed cryptosystem

and the key space that the attack can handle In dierential cryptanalysis it was

mentioned that the attacks assume that indep endent keys are used The indep endent

keys were dened as follows

Denition An independent key is a list of subkeys which is not necessarily deriv

able from some key via the key scheduling algorithm

Each key in the cryptosystems key space has an equivalent indep endent key derived

by the key scheduling algorithm We observe that linear cryptanalysis also assumes

the use of indep endent keys The theoretical analysis of systems with dep endent keys

are much harder However in practice it can b e very well estimated by the analysis

of the indep endent key variants Therefore Matsuis metho d to nd bits of the

subkeys still hold even if indep endent keys are used Other auxiliary metho ds can

then b e used to nd the other bits of the rst and the last subkeys p ossibly using additional characteristics and to reduce the cryptosystem to a cryptosystem with a

Technion - Computer Science Department Technical Report CS0813 1994

smaller number of rounds which is easier to analyze

Denition A oneround characteristic is a tuple p in which

P T K

A a and in which p is the probability that

P L T L P R T R

a random input blo ck P and its oneround C under a random subkey K

satises P C K where denotes binary scalar pro duct of two

P T K

binary vectors is the subset of bits of the data b efore the round is the subset

P T

of bits of the data after the round and is the subset of bits of the key whose

K

parity is approximated

As in dierential cryptanalysis it is quite easy to derive oneround characteristics

with one active S b ox we only have to choose a nonzero entry in one of the S b oxes

and choose the subsets as the roundfunction requires The following

P T K

oneround characteristic has only one active S b ox and it was chosen to maximize

the probability thus it uses the maximal entry in S

0

R

P x

0 0

A a with probability x

x F

P F one aected key bit

x

0

R

T x x

The b est oneround characteristic do es not have any active S b ox This characteristic

has probability

0

R

P

0 0

a with probability

A F

no aected key bits

0

R

T

We can also derive oneround characteristics with more than one active S b oxes

in this case we should choose the entries in two or more S b oxes However unlike in

dierential cryptanalysis we do not need to have the same values in common input

bits of b oth S b oxes due to the E expansion so if we aect bits common to two

Technion - Computer Science Department Technical Report CS0813 1994

S b oxes it is not necessary that b oth S b oxes would b e active Moreover if b oth

S b oxes are active the value of the common input bits b ecomes the XOR of their

values from b oth S b oxes since we use the same bit twice in a linear equation and

thus it cancels itself Note that in theory the probability we receive in that way is

the average b etween all the p ossible random keys In practice in DES the probability

holds for all the keys due to the design rules of the S b oxes

We can also concatenate characteristics and dene nround characteristics recur

sively

Denition An nround characteristic p can be con

P T K

catenated with an mround characteristic p if equals

P T K T

the swapped value of the two halves of The concatenation of the character

P

istics and if they can b e concatenated is the n mround characteristic

p p

P T K K

When we concatenate l characteristics that can b e concatenated the probability of

Q

l

l

the resultant characteristic is p p

i

i

A strange situation o ccurs for nround characteristics Whenever an XOR op era

tion exists in the cryptosystem excluding XORs with subkeys within the F function

the values of b oth its arguments in the characteristic must b e the same and this value

is also the output of the XOR op eration Whenever the data is duplicated when

the right half of the data is input to the F function and also b ecomes the left half

of the next round b oth duplicated outputs may not b e the same as the input

only their XOR value should b e the same as the input This phenomena is just the

opp osite to the usual op erations in the cryptosystem where an XOR op eration XORs

the inputs and duplications duplicates the input in our case an XOR op eration du

plicates the input and duplications XOR the input with one of the original outputs

to form the second output This phenomena causes a basic dierence b etween linear

cryptanalysis and dierential cryptanalysis which can b e easily viewed in the one

round characteristic with probability the free variable in the linear cryptanalysis

characteristic is at the right half while in dierential cryptanalysis it is at the left

half

This phenomena is easily understo o d when we remind the meaning of the values

in the characteristics they are not actual values neither XORs of actual values They

only describ e the subset of bits whose parity is statistically known In order to know

the parity of bits of the output of an XOR op eration we should know the parities

of the same subsets of bits b oth inputs and then we known the parity of the same

subset of the output

When we duplicate the data we may know parity of a subset of bits However

since we do not wish to use these bits twice in which case one use will cancel the

other use by the parity we should use each set bit once either in one output or in

the other output It is also p ossible to use a bit which is not set in the input to the

duplication in which case a zero bit b ecome one in b oth outputs In this case b oth

usages cancel each other by their parity and thus the same eect as of the original

zero remains

An imp ortant dierence b etween linear cryptanalysis and dierential cryptanal

Technion - Computer Science Department Technical Report CS0813 1994

ysis is the ability to use dierentials in which only the values of and

P T

matter In dierential cryptanalysis whenever several characteristics have the same

values for and they are developed on top of each other they can b e viewed

P T

as one dierential and their internal information can b e ignored In linear crypt

analysis the internal information contains the information ab out the subset of key

bits participating in the linearization Thus if two characteristics with the same val

of and and with a similar probability exist they might cancel the eect

P T

of each other if the parity of the subset of the key bits is not the same or if their

probabilities are the complement of each other and the parity of the subset of their

key bits is the same Therefore we should b e much more careful when we claim for

linear cryptanalysis characteristics However if the attacker knows of all the dierent

characteristics whose eect might b e canceled he can nd one parity bit of the key

whenever he identies that the eect is canceled

Davies investigated an attack against DES based on the nonuniform distri

bution of the outputs of pairs of adjacent S b oxes when their inputs are uniformly

distributed He assumes the uniform distribution in the inputs to the even rounds or

alternatively the o dd rounds and studies the resultant distribution in the outputs of

these rounds As a result he receives a keydependent distribution which dep ends on

the parity of several key bits Using a large sample of known plaintexts he can nd

this bit His algorithm can b e applied to any pair of adjacent S b oxes and to even

or o dd rounds thus he can nd up to p otential parity bits of the key His attack

is strongly related to linear cryptanalysis and has a linear cryptanalysis variant In

the even rounds o dd rounds the characteristics have zero values in the input and

nonzero values in the outputs since the inputs are not involved in the linearization

but the output are involved In the other rounds b oth inputs and outputs have zero

values Thus we receive the following tworound iterative characteristic for the S

b oxes SS and similar characteristics for other adjacent S b oxes

A C

P x

0 0

A A C a with probability

x F

P FD four aected key bits

x

0 0

b always

B F

no aected key bits

A C

T x

Each of the S b oxes has a linear approximation b etween the two common bits to a

subset of the four output bits In S F with probability and

x x

Technion - Computer Science Department Technical Report CS0813 1994

in S D with probability The total probabilities of these

x x

characteristics iterated to rounds and the required number of known plaintexts for

1

Davies studies the overall distribution of the output bits of the S b oxes while linear cryptanalysis

studies only the parity of these bits Thus Davies attack is not a sp ecial case of linear cryptanalysis

S b oxes Probability Known Plaintexts Davies Attack

SS

SS

SS

SS

SS

SS

SS

SS

Table Results of Linear Cryptanalysis of DES using Davies Characteristics

the attack based on linear cryptanalysis are given in Table along with the number

of known plaintexts required for the original Davies attacks based on the same pairs

of S b oxes Notice that the results of these two attacks are very similar

Constraints on the Size of the S Boxes

In this section we show new constraints on the size of S b oxes Researchers have

already studied the dierential b ehavior of the size of S b oxes For example Luke

Oconnor analyzed the dierential b ehavior of bijective S b oxes and of comp os

ite S b oxes and concluded that for large enough S b oxes even random S b oxes are

immune against dierential cryptanalysis However there was no result on required

relationships b etween the input size of the S b oxes and their output size In this

section we show such a relationship

In dierential cryptanalysis we can easily reduce the probability of all the entries

in the dierence distribution tables of the S b oxes by increasing the number of output

bits of the S b oxes Whenever the number of output bits of an S b ox is suciently

larger than the number of its input bits it is very likely that the entries in the

dierence distribution table will have only values and thus all the p ossible entries

have the same low probability

Examples of which use such S b oxes are

The attack on Khafre used exactly these prop erties but still it used the sp ecic

structure of Khafre

Linear cryptanalysis adds a new criteria for this relationship We identied that

whenever the number of output bits is large enough there must b e linear and ane

relations b etween these bits which hold for all the p ossible inputs of the S b ox

Denote the number of input bits by m and the number of output bits by n We

m

can now describ e the S b ox by a binary matrix M with rows corresp onding to m

Technion - Computer Science Department Technical Report CS0813 1994

the inputs of the S b ox and with m n columns which contain the input values

2

The number of known plaintexts required for Davies attack were calculated using the equations

given in

themselves in the rst m columns and the output values of the S b ox in the other

n columns

Each column of M contains one bit from each inputoutput pair of the S b ox

Linear combinations of subsets of the inputoutput bits of the S b ox are represented

by linear combinations of the columns We say that a subset of bits of the input and

output of the S b ox form a linear combination if for all inputs the linear combination

of these bits is zero We say that a subset of bits of the input and output of the S

b ox form an ane combination if for all inputs the linear combination of these bits is

a constant either all zero or all one Equivalently a subset of the bits of the input

and output of the S b ox form a linear combination if the columns of M are linearly

dep endent and a subset of the bits of the input and output of the S b ox form an

ane combination if the columns of M and the all one vector are linearly dep endent

0

Dene M to b e the matrix formed by M with one additional column with all the

0

values ones M Mj Thus if the rank of M equals the number of its columns

0

m n there are no linear combinations in the S b ox and if the rank of M equals the

number of its columns m n there are no ane combinations in the S b ox The

0

S b ox has an ane combination of its input and output bits if rankM m n

0 m m

Since the number of rows of M and M is the maximal rank is Therefore

m

if n m the S b ox must have an ane combination of the inputoutput

bits These ane combinations cause entries with probability in the linear

approximation table which can b e a ma jor threat to the security of the cryptosystem

m

Similarly if n the S b ox must have an ane combination of a subset of only

output bits which do es not dep end on the input bits at all Such combinations cause

in many cases of DESlike cryptosystems the existence of a tworound iterative

characteristic with probability of the form X and thus enable

attacks which require only a few known plaintexts

These ane combinations also hold as ane combinations of the bits of the dier

ences predicted in dierential cryptanalysis We do not know whether in dierential

cryptanalysis these linearities also p ose a ma jor threat

Technion - Computer Science Department Technical Report CS0813 1994

Application to DES

Matsuis round linear approximation can b e viewed as a round characteristic

This characteristic is based on the following eightround iterative characteristic

P x

0 0

A a with probability x

x F

P F one aected key bit

x

0 0

B b with probability x

x F

P one aected key bit

x

0 0

C c with probability x

x F

P E one aected key bit

x

0 0

d always

D F

no aected key bits

0 0

E e with probability x

x F

P E one aected key bit

x

0 0

F f with probability x

x F

P one aected key bit

x

0 0

G g with probability x

x F

P F one aected key bit

x

0 0

h always

H F

no aected key bits

Technion - Computer Science Department Technical Report CS0813 1994

T x

This characteristic has probability ab out By iterating it to rounds

and replacing the rst and last rounds by lo cally b etter ones Matsui got a round

characteristic with probability ab out We have exhaustively veried that

this iterative characteristic is the b est among all the characteristics with at most one

active S b ox at each round and that Matsuis round characteristic is the b est

characteristics under the same restriction Matsui claims that his characteristic is the

b est without any restriction

Application to Feal

In Matsui describ ed a preliminary version of linear cryptanalysis and used it to

attack Feal For Feal there are nontrivial oneround characteristics with

probability based on the linearity of the least signicant bits in the addition

op eration a similar eect o ccurs also in dierential cryptanalysis of Feal in which

characteristics with probability are based on the elimination of the carry from the

most signicant bit The Four basic oneround characteristics with probability

are

0

R

P x

0 0

A a with probability x

x F

0

R

T x x

0

R

P x

0 0

A a with probability x

x F

0

R

T x x

Technion - Computer Science Department Technical Report CS0813 1994

0

R

P x

0 0

A a with probability x

x F

0

R

T x x

and

0

R

P x

0 0

A a with probability x

x F

0

R

T x x

The other oneround characteristics with probability can b e derived by

0

combining any number of these four characteristics by XORing the values of their a

0 0 0

into the new a and XORing the values of their A into the new A For example

the following characteristics results from a combination of the rst three of the ab ove

four characteristics

0

R

P x

0 0

A a with probability x

x F

0

R

T x x

These combinations are valid since no S b ox is active in two or more of the original

Technion - Computer Science Department Technical Report CS0813 1994

characteristics Such combinations are also applicable in dierential cryptanalysis

whenever they do not involve the same S b ox active in more than one characteris

tic We have also found several additional linear characteristics of Feal with smaller

probabilities among them at least eight oneround characteristics with probability

In his attack Matsui uses linearities which can b e formalized by the following

threeround characteristic with probability

P x

0 0

A a with probability x

x F

0 0

b with probability

B F

0 0

C c with probability x

x F

T x

In his attack he sets this characteristic in rounds and tries exhaustively values of

bits of the subkeys in rounds and with some auxiliary techniques We have

Technion - Computer Science Department Technical Report CS0813 1994

found two veround characteristic with probability One of them is

P x

0 0

A a with probability x

x F

0 0

B b with probability x

x F

0 0

c with probability

C F

0 0

D d with probability x

x F

0 0

E e with probability x

x F

T x

We have found several iterative characteristics of Feal which can b e used to attack

Feal using ab out known plaintexts with a smaller computation complexity This

is a much b etter tradeo than in Matsuis attacks on Feal which required either

known plaintexts for which the complexity of the analysis is or known

plaintexts for which the complexity of the analysis is One of these iterative

Technion - Computer Science Department Technical Report CS0813 1994

characteristics is

P x

0 0

A a with probability x

x F

0 0

B b with probability x

x F

0 0

C c with probability x

x F

0 0

D d with probability x

x F

T x

The iteration of this characteristic to seven rounds have probability A

similar characteristic exist with a reverse order of the bytes in each word From the



tables in we can see that ab out known plaintexts are required to

attack Feal with success rate ab out and that known plaintexts are required

for success rate ab out This characteristic can b e used to attack FealN with up

to rounds with a complexity and known plaintexts smaller than of exhaustive

search The attack on Feal was applied successfully on a p ersonal computer It

takes ab out minutes to encrypt the required known plaintexts and to nd the

key

Summary

In this pap er we studied Matsuis linear cryptanalysis We showed that the formalism

of dierential cryptanalysis can b e adopted to linear cryptanalysis In particular we

showed that characteristics can b e dened concatenated and used in a very similar

manner as in dierential cryptanalysis Constraints on the size of S b oxes were

Technion - Computer Science Department Technical Report CS0813 1994

describ ed Matsuis characteristic used to attack DES in his pap er is shown to b e the

b est characteristic which has only up to one active S b ox at each round on the other

hand we improved his results on Feal We attack Feal using known plaintexts

with linear cryptanalysis Davies attack on DES was shown to b e closely related

to linear cryptanalysis We also describ ed how to sum up characteristics which also

hold in dierential cryptanalysis

Acknowledgments

Acknowledgment This research was supp orted by the fund for the promotion of

research at the Technion Some of the results in this pap er were found using variants

of programs whose originals were written by Ishai BenAroya

References

Adi Shamir Dierential Cryptanalysis of the Data Encryption

Standard SpringerVerlag

Eli Biham Adi Shamir Dierential Cryptanalysis of DESlike Cryptosystems

Journal of Cryptology Vol No pp

Eli Biham Adi Shamir Dierential Cryptanalysis of Snefru Khafre REDOC

II LOKI and technical rep ort CS Department of Applied

Mathematics and Computer Science The Weizmann Institute of Science

The extended abstract app ears in Lecture Notes in Computer Science Advances

in Cryptology pro ceedings of CRYPTO pp

Don Copp ersmith The DES and its Strength

Against Attacks technical rep ort IBM Thomas J Watson Research Center RC

December

D W Davies Investigation of a Potential Weakness in the DES Algorithm

private communication

Xuejia Lai James L Massey Sean Murphy Markov Ciphers and Dierential

Cryptanalysis Lecture Notes in Computer Science Advances in Cryptology

pro ceedings of EUROCRYPT pp

Xuejia Lai On the Design and Security of Block Ciphers PhD thesis Swiss

Federal Institue of Technology Zurich

Mitsuru Matsui Atsuhiro Yamagishi A New Method for Known Plaintext Attack

of FEAL Cipher Lecture Notes in Computer Science Advances in Cryptology

pro ceedings of EUROCRYPT pp

M Matsui Linear Cryptanalysis Method for DES Cipher Abstracts of

EUROCRYPT pp WW May

Technion - Computer Science Department Technical Report CS0813 1994

Ralph C Merkle Fast Software Encryption Functions Lecture Notes in Computer

Science Advances in Cryptology pro ceedings of CRYPTO pp

Sho ji Miyaguchi Akira Shiraishi Akihiro Shimizu Fast Data Encryption

Algorithm FEAL Review of electrical communications lab oratories Vol

No pp

Luke OConnor On the Distribution of Characteristics in Bijective Mappings

Lecture Notes in Computer Science Advances in Cryptology pro ceedings of

EUROCRYPT to app ear

Luke OConnor On the Distribution of Characteristics in Composite Permuta

tions Lecture Notes in Computer Science Advances in Cryptology pro ceedings

of CRYPTO to app ear

Adi Shamir On the Security of DES Lecture Notes in Computer Science

Advances in Cryptology pro ceedings of CRYPTO pp

Akihiro Shimizu Sho ji Miyaguchi Fast Data Encryption Algorithm FEAL

Lecture Notes in Computer Science Advances in Cryptology pro ceedings of

EUROCRYPT pp

Technion - Computer Science Department Technical Report CS0813 1994