<<

LIKELIHOOD ESTIMATION FOR BLOCK

CIPHER KEYS

Sean MurphyFred Pip er Michael Walkery Peter Wild

Information Security Group yVo dafone Limited

Royal Holloway The Courtyard

University of London London Road

Egham Newbury

Surrey TW EX UK Berkshire RG JL UK

May

Abstract

In this pap er we give a general framework for the analysis of blo ck

ciphers using the statistical technique of likeliho o d estimation We

showhowvarious recent successful cryptanalyses of blo ck ciphers can

b e regarded in this framework By analysing the SAFER blo ck cipher

in this framework we exp ose a cryptographic weakness of that cipher

Key Words Statistical Inference Likeliho o d Estimation Blo ck Ciphers

DES SAFER Dierential Cryptanalysis

This author acknowledges the supp ort of the Nueld Foundation

Intro duction

In this pap er we set up a general framework for analysing blo ck ciphers In

this framework the plaintext and spaces are partitioned into a

numb er of classes We consider the probabilities of a plaintext in a given

plaintext class b eing encrypted to a ciphertext in a given ciphertext class

under dierentkeys For a judicious choice of partitions of plaintext and

ciphertext spaces these probabilities give a partition of the space into

key classes whichallows the technique of likeliho o d estimation to b e used

to nd the key class of the true key We explain this idea more fully in

Section and showhow it applies to iterated blo ck ciphers in Section In

the rest of the pap er we showhow the recent cryptanalytic techniques of

Sb ox pairs analysis linear crytanalysis dierential cryptanalysis and linear

structures t naturally into this framework so providing a comparison of

these techniques Finally in Section weseehow this metho d can b e used

to show a cryptographic weakness of the SAFER blo ck cipher

Statistical Estimation of the Key

We consider a blo ck cipher to consist of a nite set of plaintexts M a nite

set of C and a nite set ff jk K g of invertible functions from

k

M onto C indexed bya setofkeys K The plain text set M the ciphertext

set C the key set K and the set of invertible functions ff M C g are all

k

public The ciphertext c corresp onding to the plaintext m under a particular

private key k is then

c f m

k

The cryptanalysts task is to nd the particular key k given some information

ab out a numb er of corresp onding plaintextciphertext pairs In particular if

the plaintexts are known then wehave a known plaintext attack and if the

plaintexts can b e chosen by the cryptanalyst then wehavea chosen plaintext

attack

We use the following framework to mo del the cryptanalysts knowledge of

the plaintexts and corresp onding ciphertexts Let M X b e a function

from the plaintext set M onto a set X and C Y a function from

the ciphertext set C onto a set Y We call and partition functions

The functions and partition the plaintext and ciphertext spaces into

equivalence classes indexed by the elements of X and Y resp ectivelyandit

is convenient to think of X and Y as sets of equivalence classes of M and

C Supp ose that the cryptanalyst observes a pair x y where x mis

the result of applying to some plaintext m and y c is the result of

applying to the ciphertext c f m obtained by enciphering m under

k

the true key k Thus x y is a pair of plaintextciphertext equivalence

classes which the cryptanalyst uses to estimate the true key k

Weshowhow the true key k can b e estimated using the statistical tech

nique of maximum likeliho o d estimation as describ ed in any textb o ok on

statistical inference for example Silvey Let P x y denote the proba

k

bility that plaintext class x and ciphertext class y o ccur with key k P

k

denes a probability mass function on the set of p ossible plaintext and ci

phertext classes X Y parameterised by elements of the key set K However

we can regard P x y as a function on the key set K which is parameterised

by elements of the set of p ossible plaintext and ciphertext classes X Y This

function is known as the likelihood function Lx y k ofthekey k corre

sp onding to the data x y and it is a measure of the plausibilitythatk is

the true key k after wehave observed the data x y Thus wehave

L x y k P x y

k

and Lx y k logLx y k log P x y

k

where Lx y k isthe loglikelihood functionFor anyxed k we can think

of the likeliho o d function Lx y k as a random variable whose distribution

is determined by the true distribution on the plaintextciphertext classes

given by the true key k Thus we can dene the exp ected value of the

loglikeliho o d function at key k as

k E fLx y k g E flog P x y g

k

The following theorem whichw e state in terms of plaintext and ciphertext

classes is a standard result for loglikeliho o d functions

Theorem For all keys k K k k with equality if and only

x y for every plaintextciphertext class pair x y if P x y P

k k

Theorem states that k attains its maximum at k and if the dis

tributions corresp onding to dierentkeys on the plaintextciphertext classes

are dierent then k is unique In any case we can dene an equivalence

relation on the key space K in whichtwokeys are equivalentiftheyhave

the same distribution on the plaintextciphertext classes and then estimate

the unique key class which maximises the likeliho o d Let K Z be a

partition function from the key space K onto a set Z such that k k

if and only if Lx y k Lx y k for all x y Z can b e regarded as

a set of key equivalence classes induced by the likeliho o d function Wecan

thus dene the likeliho o d function Lx y z ofthekey class z corresp onding

to the data x y The exp ected value of the loglikeliho o d function at key

class z is then given by

z E fLx y z g E flog P x y g

z

where P x y P x y for any k K for which k z Theorem

z k

shows that is uniquely maximised by z k the true key class

Supp ose nowthatwehave N pairs of plaintext m m m with

N

corresp onding ciphertexts c c c thatgiv eplaintext classes x

N

x x and ciphertext classes y y y resp ectively The joint

N N

likeliho o d function is given by

N

Y

P x y Lx y k

k i i

i

so the joint loglikeliho o d function is given by

N N

X X

log P x y Lx y k Lx y k

k i i i i

i i

We prop ose to estimate the true key k from the data x yby the metho d

of maxim um likeliho o d so wehave the following denition

Denition A maximum likelihood estimate MLE of the true key k is any

k K for which Lx y k orequivalently Lx y k is maximal

We can express k in terms of exp ected value of the joint loglikeliho o d

function since

E Lx y k E flog P x y g k

k

N

Thus from Theorem the exp ected value of the joint loglikeliho o d function

is maximised by the true key k Ifwe dene the key partition function

K Z as ab ove then we can dene the jointlikeliho o d function Lx y z

of the key class z corresp onding to the data x y The exp ected value of

the joint loglikeliho o d function at key class z is given by

E Lx y z E flog P x y g z

z

N

and so is uniquely maximised bythetruekey class z

Wenowgive a brief description of the prop erties of the maximum likeli

ho o d estimate that make it the optimal estimate of the key

Denition Supp ose fz g is a sequence of estimates for z Thenz is

n n

consistent ifz z in the appropriate sto chastic sense

n

Theorem The maximum likeliho o d estimate of z is consistent

Sketch Pro of We are essentially estimating z with Lx y z The

N

Lx y z law of large numb ers ensures that for large N and most x y

N

is near z Ifz is the maximum likeliho o d estimate of z based on N

N

plaintextciphertext classes then this showsz z in the appropriate

N

sto chastic sense

Denition A statistic t is sucient for z if the distribution of a sample

x ygiven the value of tx y do es not dep end on z

Equivalently t is a sucient statistic for z if the distribution within an

equivalence class of the partition induced by t is indep endentofz Thus

the distribution of t contains all the information relevant to estimating z

A necessary and sucient condition for t to b e sucientisgiven bythe

following factorisation theorem

Theorem t is a sucient statistic for the family fP jz Z g

z

if and only if P x y can b e expressed as P x y g tx y hx y

z z z

where h do es not dep end on z

Denition A statistic t is minimalsucient for z if the partition induced

by t contains every other sucient partition

Equivalently t is a minimalsucient statistic for z if it is a function of

every other sucient statistic for z Therefore a minimalsucient statistic

contains the minimum information relevant to estimating z The following

theorem concerning the maximum likeliho o d estimator is a corollary of the

ab ove theorem

Theorem The maximum likeliho o d estimate is a function of a

minimalsucient statistic

Thus the maximum likeliho o d estimate dep ends only on the minimal

relevant information in the sample Hence the maximum likeliho o d estimate

of the key is the optimal estimate of the key since it is b oth consistentand

minimalsucient

It is often convenient to express the likeliho o d in a dierent form Supp ose

P x denotes the probability that plaintext class x o ccurs then we can write

the likeliho o d function in the form

Lx y k P x y P y jxP x

k k

where P y jx denotes the probability that a plaintext in class x is encrypted

k

to a ciphertext in class y under key k In any particular attack the dis

tribution of the plaintexts induces a distribution on the plaintext classes

Thus in a ciphertextonly attack where we had some information ab out the

plaintexts for example they were in ASCI I or a natural language we could

use this information to calculate the values of P x For manyknown plain

text attack the distribution of plaintexts can b e assumed to b e uniform

and so if the plaintext classes all have equal size then P x is a constant

and Lx y k P y jx Forachosen plaintext attack P x is essentially

k

chosen by the cryptanalyst In many attacks the plaintexts are all chosen to

b e in one class in which case wehavealikeliho o d function L y k P y

k

Wesawabovehow the likeliho o d function denes an equivalence rela

tion on the key space K in whichtwokeys are equivalent if their likeliho o d

functions are identical Therefore to nd the key bymaximum likelihood es

timation we p erform a divide and conquer cryptanalytic attack By using

the plaintext and ciphertext classes we rst nd the equivalence class of the

key space which maximises the likeliho o d function We then search this key

class using the plaintexts and ciphertexts for the true key The complexity

of calculating the likeliho o d for a general pair of plaintextciphertext classes

determines the computational complexity of nding the true key class as the

jointlikeliho o d can easily b e calculated from this The numb er of plaintext

ciphertext pairs determines the error probability of estimating the true key

class The complexity of the cryptanalysis is then determined by the ab ove

factors and the sizes of the key classes

For a simple example of a cryptanalytic attack consider a known plaintext

exhaustivekey search We dene and b oth to b e identity functions so

m m and c cThus

P m if f mc

k

Lm c k P cjmP m

k

otherwise

so the maximum value of the lik eliho o d function distinguishes precisely those

keys which transform the known plaintext m to the known ciphertext c

In this example it is necessary to p erform an or decryption

for eachkey in order to evaluate the likeliho o d function so that evaluating

the likeliho o d requires jK j However in various recent successful

cryptanalytic attacks defects in certain blo ck ciphers have b een discovered

which allow and to b e chosen in suchaway that it is p ossible to nd

the key class that maximises the likeliho o d function with an acceptable error

probability and then searchthiskey class in order to determine the key more

quickly than would b e exp ected in an exhaustivekey search

For certain blo ck ciphers it is also p ossible to maximise or calculate the

likeliho o d function very eciently Andelman and Reeds considered

such a situation for b oth rotor machines stream ciphers and SP networks

blo ck ciphers For an SP network with a small numb er of rounds and an

nbit key k k k the likeliho o d function for k is wellapproximated

n

hinvolving only one key bit as by a pro duct of functions eac

n

Y

Lm c k l m c k

i i

i

Andelman and Reeds showed how to nd the true key bits k by the fast max

i

imisation of this pro duct For a general blo ck cipher including SP networks

with many rounds suchanapproximation of the likeliho o d do es not hold

Bihams related key chosen plaintext cryptanalysis exploits key scheduling

defects in certain ciphers In the likeliho o d framework this corresp onds to

sets of related keys having related likeliho o d functions given related plain

texts The calculation of the likeliho o d function can thus b e carried out more

eciently

Maximum likeliho o d estimation of the true key class can also b e thought

ofasakey partition inducing a partition on the plaintextciphertext classes

and this is the usual statistical formulation of such problems Let K Z

b e a function from the key set K onto a set Z so partitions K into equiv

alence classes indexed by elements of Z Supp ose wenow wish to estimate

the value of z k the equivalence class of k the true key given some

plaintextciphertext data m c The maximum likeliho o d estimate of z

which is optimal in the sense describ ed ab ove is necessarily a function of a

minimalsucient statistic of the data m c This minimalsucient statis

tic gives the minimal partition of the plaintextciphertext spaceM C that

allows the estimation of k The minimal partition enables us to dene

optimal partition functions M X and C Y

We conclude this Section by discussing other ways of viewing likeliho o d

estimation for blo ck cipher keys The problem of estimating the true key k

hasaBayesian formulation If the cryptanalyst has a prior distribution on

the set of keys then Bayes Theorem provides a metho d for estimating the

keyIfwehavea plaintext class x and a ciphertext class y thenwe can use

an inference form of Bayes Theorem

Bayes Theorem Supp ose p is the prior mass function on the elements

of the key space K pjx y is the p osterior mass function on the elements

of the key space K given the data x y and Lxyk is the likeliho o d of

the key k corresp onding to the data x y then

k jx y Lx y k pk p

Usually eachkey is equally likely so the prior mass function piscon

stant In this case Theorem shows that the p osterior probability pk jx y

is maximal or equivalently k is a mo de of the p osterior distribution This

is a standard result of Bayesian statistics but whatever the prior distribu

tion the mo de of the p osterior is the Bayes estimate with resp ect to the

zeroone loss function Note that if wehavemany corresp onding pairs of

plaintext and ciphertext classes the p osterior distribution obtained by using

a joint likeliho o d for many pairs is the same as that obtained by iteratively

calculating p osterior distributions one pair at a time and using them as prior

distributions for the next iteration The Bayesian approach has often b een

used in the analysis of blo ck and stream ciphers b oth explicitly for example

and implicitly in key ranking techniques for example

The problem of testing whether a key class is the true key class can

also b e expressed in terms of hyp othesis testing another routine statistical

technique Hyp othesis testing has b een used by Brynielsson to nd

keys Supp ose we wish to test the null hyp othesis H z z



that z is the true key class z against the alternativehyp othesis H z z

A

that z is not the true key class The likelihoodratio test is based on the ratio

Lx y z

x y sup

Lx y z

z z

Intuitivelywewould reject the null hyp othesis H if the ratio x y is to o



large Indeed the likeliho o d ratio test has a critical region for rejecting H



of the form fx yjx y cg where c is xed by the error probability

P x y cjz z of rejecting the true key class z Such a test is

called a test of size A full analysis of this likeliho o d test would involve

the evaluation of the power function H whichisdenedby

A

P Accept z is true key class jz H z

A

so z is the probability of rejecting the false key class z We should like

to cho ose a test of size that is optimal in some sense over all tests of

size In the case when H consists of a single key class that is H

A A

is a simple hyp othesis the NeymanPearson Lemma shows that a test

based on a critical region fx yjx y cg where c is determined by

P x y c is the most powerful test of size Even when H is a

A

comp osite hyp othesis for most cryptographic purp oses testing H against H

 A

is equivalent to a onesided test involving normal distributions with dierent

h tests the likeliho o d ratio test is uniformly most powerful means For suc

that is it has maximal p ower z over all tests of size for every z H

A

Note that wemaybeabletoperforma hyp othesis test more eciently by

p erforming a sequential likelihoodratio test

Iterated Blo ck Ciphers

Most blo ck ciphers are constructed by iterating a relatively simple crypto

graphic function a numb er of times Consider suchan nround blo ck cipher

th

For such a cipher M C and the i round transformation under a key

th

k is an invertible function from the i round message space to the

th

i round message space whichwe denote by k M M Thus for a

i

plaintext m and key k we can regard the encryption pro cess as pro ducing



a sequence of message states m m m wherem is the ciphertext

 n n

and m m k Thus the encryption function k M M from

i i i

plaintext to ciphertext is given by

n

Y

k k

i

i

For such an iterated blo ck cipher we generalise the approach outlined in

th

the previous Section by partitioning all the i round message spaces i

naswell as the plaintext and ciphertext spaces For each i n

let M Y b e a function from the message space M onto a set Y The

i i i

th

partition function partitions the i round message space into equivalence

i

classes indexed by the elements of Y i n

i

Supp ose nowthatwehave a sequence of message states m m m

 n

obtained by encrypting under the key k We can then obtain a sequence

of message classes y y y where y m This sequence can b e

 n i i i

regarded as a realisation of a random pro cess on Y Y Y Supp ose the

 n

matrix of transition probabilities from Y to Y under key k is k The

i i i

entries of the transition matrix Q k can b e easily calculated as

i

jfm y jm k y gj

i i i

Q k P Y y jY y

i y y k i i i i

i i

jy j

i

We wish to calculate the matrix Qk of transition probabilities from the

plaintext classes Y to the ciphertext classes Y If the random pro cess on

 n

the message classes Y Y under key k is wellapproximated bya rst

 n

order Markov pro cess this matrix Qk is given by the pro duct

n

Y

Qk Q k

i

i

A rst order Markov random pro cess is one in which the distribution of the

th

i message class y conditional on all the previous message classes y y

i i 

is identical to the distribution of y conditional on the most recent message

i

class y As part of the mo delling approach wemake the assumption that

i

the random pro cess on the message classes Y Y under key k is well

 n

approximated by a rst order Markov pro cess For the Sb ox pairs analysis

considered in Section this is shown to b e a valid assumption whereas

it has b een implicitly assumed in the other cryptanalyses we consider This

assumption could b e justied empirically in the dierent cryptanalyses we

consider but wegive the following heuristic justication The round partition

functions used in the various cryptanalyses are algebraic homomorphisms so

the set of round message classes form an algebraic quotient space The round

functions contain transformations which are highly nonhomomorphic with

resp ect to these quotient spaces Therefore the algebraic structure of an

th th

i round message class y is not inherited byan i message class

i

y that is the elements of y are suciently randomised within y In

i i i

cases where the round functions are partially homomorphic with resp ect to

these quotient spaces for example the dierential analysis of DES discussed

in Section therandomisation prop erty can often b e restored by using the

partial homomorphism to construct a ner partition

We can thus calculate the likeliho o d function as ab ovewith and



since the relevantentry of Qk gives the probability P y jx For

n k

such ciphers the equivalence classes of the key space K induced bythe

likeliho o d function are indexed by the set of matrices fQk jk K gthatis

k and k b elong to the key class if Qk Qk

SBoxPairs Cryptanalysis of DES

For a general example of a cryptanalysis that can b e thoughtofintheabove

framework consider the cryptanalysis of DES byDavies and Murphy

using pairs of Sb oxes Consider a pair of adjacent DES Sb oxes with



resp ective bit inputs a a and b b Over all p ossible inputs

 

to this pair of DES Sb oxes each p ossible bit output o ccurs equally often

times However the expansion phase of DES E means that the values

of a b and a b are determined solely bythekeyIfwe x these bits then

  



over the remaining p ossible inputs we obtain a nonuniform distribution



of the p ossible outputs Furthermore this nonuniform distribution has one

of two forms dep ending on the value of a a b b the XOR of the middle

  

four bits To calculate the distribution of the XOR of the outputs of n pairs

of Sb oxes given random inputs wehave to calculate the nfold convolution of

this distribution with itself and this distribution is also one of only twonon

uniform distributions dep ending on the total XOR of a a b b from

  

the input of each pair of Sb oxes From a plaintextciphertext pair it is easy

to obtain a realisation of the XOR of the outputs of pairs of Sb oxes with

approximately random inputs With many suchplaintextciphertext pairs

we can construct an estimate of the XOR of all the common key bits We

can do this for all pairs of Sb oxes on b oth the left and righthalves of the

cipher so this p otentially gives us bits of key information However only

Sb ox pair gives us this information faster than an exhaustivekey search

so this attack only yields two bits of key information Weshowhow this

cryptanalysis can b e viewed in the likeliho o d framework For the purp oses

of this cryptanalysis of DES it is convenient to ignore the initial and nal

permutations of DES IP and IP as there is no loss of generality We

actually analyse a variant of the DES given by Desmedt

whichwe call EXPDES This variantisablock cipher which can b e regarded

as DES with bit registers rather than bit registers If E S and P denote

the usual expansion Sb oxandpermutation of the encryption function f of

 

DES then we can dene T Z Z by T SP E with maps on the

 



th

right EXPDES has message space M Z andwe denote the i round





message byl r for l r Z An encryption round of EXPDES is given

i i i i



by

l r l r k T r i

i i i i i i

l r l r l k T i

i i i i i i

th

where k is the usual bit i round DES subkey EXPDES encrypts plain

i

text l r tol r and by analogy with DES the ciphertext is r l

     

Note that if we encrypt a bit message by rst expanding b oth bit halves

to bits using the expansion function E then encrypting with EXPDES

and nally unexpanding the bits to a bit cryptogram using E this

is equivalent to encrypting the original bit message using DES

Atany particular stage EXPDES is divided into a left and a right reg

ister and we regard each of them as a dimensional vector space over Z



Let denote the pro jection of this space onto an dimensional subspace

corresp onding to the bit output of the adjacentpairofSboxes after the

functions P and E have b een applied Let I I and J J denote

 

the unit vectors whichhave a in the appropriate p osition corresp onding to

the bit inputs to a pair of adjacent Sb oxes and let denote the pro jec

tion of the dimensional space onto the dimensional subspace generated

by I I J J Note that the linear transformation E from a

  

dimensional to a dimensional space satises E and so the nonlinear

transformation T of the dimensional space satises T SP E

We are going to use the pro jection to track the bit XOR of the output of

the adjacent pair of Sb oxes and to trac k the xed value a a b b

  

given ab ove Let Y b e the dimensional vector space which is the direct

i

sum Im Im and dene partition functions X Y by

i i i

l r l r i

i i i i i

th th

The encryption transformation from the i round to the i round

leaves the message classes unaltered since

l r l r

i i i i i

l k r Tr

i i i i

l r l r i

i i i i i

which means that the transition matrix is the identity matrix

th th

The encryption transformation from the i round to the i

round p ermutes the message classes according to the output of the twoad

jacent Sb oxes Wehave

l r l r

i i i i i

l r k l T

i i i i

l r k l T i

i i i i i

giving a blo ck diagonal transition matrix of the form

M k

 i

M k

i

where the subscript denotes the value of l T is the output of the pair of

i

adjacent Sb oxes and given that the value of l is known the distribution

i

of T dep ends on k It can b e shown as each DES Sb ox consists of

i

 

four p ermutations that the matrices M k can b e expressed as

i

k k

i i

M k E W M k E W

 i i

  

where E denotes the matrix with entries and W w hasro w

ij

and column sum zero constant diagonals ie w w for all

ij dd ij



d Z and is easily calculated Thus we can write



k

i

E W

M k

i

k

i

E W

th th

as the transition matrix from the i round to the i round message

classes

The random pro cess on round message classes is wellapproximated bya

Markov pro cess so the overall transition matrix is wellapproximated by

the pro duct of the round transition matrices Now EW WE sofor

any n n wehave

j

j

n

i

l

Y

W E

M n

n i

i

l

W E

 i

Thus we can write the transition matrix from plaintext classes to ciphertext

classes as



Y

 

Qk M k M k M k

i i i

i i

i

s



E W



so Qk where s k

s

i

 i

E W

Our analysis of DES has used an expanded form of DES EXPDES How

ever the generalisation of DES to EXPDES means that we use plaintext

ciphertext classes corresp onding to the top left blo ckofQk giving a

s



transition matrix b etween plaintext and ciphertext classes of E W



where s k We can thus dene two classes on the key space de

i

i



p ending on the value of k and since we need only to discriminate

i

i

between two cases we can use a likeliho o d ratio hyp othesis test for twosim

ple hyp othesis as describ ed in Section The two approaches are equivalent



since this hyp othesis test selects the value of k giving the higher

i

i

value of the likeliho o d function We can also dene a function

Y Y Y Im

 

from the plaintextciphertext classes onto a set Y by addition of the plaintext

and ciphertext classes so

y y y y r r

     

This denes a partition of the plaintextciphertext classes into equivalence

classes on which the likeliho o d function is constant for eachkey since E

s



has constant diagonals This partition corresp onds to the minimal W



sucient statistic for estimating s k

i

i

Linear Cryptanalysis

In this Section we showhowthetechnique of linear cryptanalysis applied by

Matsui and Yamagishi to FEAL and by Matsui to DES can b e

viewed in the likeliho o d framework Linear cryptanalysis is very similar to the

analysis of Sb ox pairs analysed in the previous Section Linear cryptanalysis

essentially considers pro jections onto linear dimensional subspaces rather

than the pro jection onto the dimensional subspace used in the previous

Section In our analysis we consider the analysis of DES given by Matsui

and for consistency we use the notation of that pap er We regard the



and for left and right registers as a dimensional binary vector space Z





th

X Z we dene X i to b e the pro jection onto the i p osition of X and



X i i X i X i

j j

In the annex to Matsui the following equation is given

P P C C k

H L H L

where P P C C are the plaintext left and right halves and the cipher

H L H L

text left and right halves resp ectivelyand k is the sum of a certain collection



of key bits This equation holds with probability pwherep



In the notation of Section if we dene the partition function on the plain

text by P P P P and the partition function

H L H L

on the ciphertext by C C C C

H L H L

to b e the ciphertext class then the transition matrix b etween plaintext and

ciphertext classes is given by

k k

p p

 

k k

p p

 

th

This transition matrix can b e calculated by using i round message classes as

explained in Section and this is essentially how the probabilistic equation

given ab ovewas derived We need to test whether k ork The

loglikeliho o d ratio statistic for this test is given by

N np for small p

N

This is the statistic used by Matsui Therefore wecho ose k ifn and



k otherwise This gives for example for a onesided test of size



a sample size N of plaintextciphertexts pairs the value given by

Matsui also gives another probabilistic equation namely

P P C C F C k k

H L H L  L 

where k is a certain sum of key bits and F C k denotes the output of

 L 

the DES F function in bit p osition This equation holds with probability



p where p Since this probabilistic equation dep ends



on k and the bits of K used as input to Sb oxwenowhavethe



p otential to estimate bits of key information These are k the true value

of k andh the true key input bits of K to Sb oxwhich aect the



th

value of F C K This probabilistic equation can b e derived using i

 l 

round message classes as describ ed in Section with plaintext and ciphertext

  

resp ectivelywhere Z Z and Z classes dened by Z



  

P P P P

H L H L

C C C C C

H L H L

L

and C denotes the input bits from C to Sb oxWecannow estimate

L

L

the true key class k h bylikeliho o d estimation Matsuis estimate of

the true key class is the key class k hwhich has the largest count for the

numb er of times the ab ove equation holds

In Matsui improves the eciency of this cryptanalysis by using the

technique of key ranking In this technique the key classes are ranked ac

cording to their count for the numb er of times the equation holds The key

classes are then tested in rank order As mentioned in Section this tech

nique is the calculation of an empirical Bayesian p osterior distribution on

the key classes

Kaliski and Robshaw describ e a multiple approximations technique

whichusesn linear approximations simultaneously This corresp onds to

dening the partition functions to b e pro jections onto certain ndimensional

subspaces rather than dimensional ones

Dierential Cryptanalysis

In this Section we showhow dierential cryptanalysis rst describ ed by

Biham and Shamir and generalised by Lai Massey and Murphycan

b e expressed in the likeliho o d framework Supp ose wehave a blo ck cipher

th

with message space M andi round subkey k M in which m is the

i 

th

plaintext and m is the ciphertext and the i round encryption is given b y

n

m m k T i n

i i i

for some invertible function T on the message space M where M isa

group We note that whilst many iterated blo ck ciphers can b e describ ed

in this form DES cannot b ecause of the expansion phase E However

the expanded form of DES describ ed in Section EXPDES do es t this

formulation so results for DES can b e obtained by analysing EXPDES

We can dene another double blo ck cipher based on the original cipher

with message space A M M and key space K by running a pair of

th

encryptions in parallel For this cipher the i round encryption is given by

k T m k T m m m

i i i i

i i

th

Thus we can dene a group A M M an i round subkey

k k k and an invertible function S on A suchthatforx x A an

i i i i i

encryption round can b e dened by

x x k S

i i i

th

An i round dierential can b e regarded as a surjective function A

i

th

M M M on the i round message classes of the double cipher dened

by

m m m m

i i i

i i

Supp ose now that and were used as partition functions in a likeliho o d

i i

cryptanalysis We rst note that for any constant d M

d d m d m m d m m m m m

i i i i i i

i i i i

Thus in particular the intro duction of a subkey k k k to this double

i i i

cipher on A leaves the value of unaltered Hence the transition probabili

i

th th

ties b etween the i round and the i round message classes dened by

and are keyindep endent Biham and Shamir term such dieren

i i

tial classes characteristicsFor many ciphers though not EXPDES it is a

reasonable assumption as discussed in Section that the random pro cess on

the round dierential classes forms a rst order Markov pro cesses Thus we

can calculate keyindep endent transition probabilities b etween dierential

characteristics several rounds apart

If we dened all the partition functions i n to b e the dif

i

ferential functions wew ould obtain a keyindep endent transition matrix

i

between the plaintext classes and the ciphertext classes Whilst this would

give us information ab out the correlation b etween plaintext and ciphertext

dierentials it would give no information ab out the key Supp ose we dene

partition functions by

i

Y M i n l

i i i

Indicator for D Y fD M n D g

nl nl

Identity Y A i n l n

i i

where D M and m m D D With if and only if m m

nl i

i

these partition functions we obtain a keyindep endent transition matrix

th

from the plaintext classes to the n l round message classes This enables

us to nd a transition matrix dep ending on a small set of key classes for the

th

transition from the n l round message classes to the ciphertext We

can use the likeliho o d estimation pro cess outlined earlier to nd the true key

class and then search this class for the true key In practice for a dierential

attackwewould use a chosen plaintext attackby sp ecifying a given plaintext

dierence This gives keyindep endent probabilities p and p for the

D D

th

n l round message classes D M n D The plaintext dierence and the

set D would b e chosen so as to maximise the transition probability p This

D

makes the likeliho o d estimation pro cess more ecient and may reduce the

number of key classes In this case if we observe the ciphertext y Awe

can write the likeliho o d and the loglikeliho o d function as

Ly k P y Ly k logP y

k k

If we observe N ciphertexts y y y then the joint loglikeliho o d is

N

given by

N

X

Ly k log P y

k i

i

Wenowshowhow some of the metho ds used in the recent dierential attacks

may b e regarded in the likeliho o d framework

In order to simplify the following discussion we assume that l This

corresp onds to what Biham and Shamir call a Rattack For a given plain

th

text equivalence class dierential the n round message class D oc

th

curs with probability p and the other n round message class M n D

D

o ccurs with probability p For y Awe can dene

D

D yS k uk P

n k

n

which can b e expressed in terms of the following two probabilities

q k P y D j y k D and r k P y D j y k D

n n n n

th

denote the true value of the n round message If we let w yS k

n

wehave

yS uk P k D

n k

n

w k P k D

k

n n

Pw k k D jw D P w D

k

n n

Pw k k D jw D P w D

k

n n

r k k p q k k p

D D

n n n n

Thus wehave

D y S k D P EP y EP y j y S k

i k k i k i i

n n

y S k D EP y jy S k D P

i k i i k

n n

u k

n

if y S k D

i

n

jD j

u k

n

if y S k D

i

n

jM jjD j

and so the exp ected value of the joint loglikeliho o d for N ciphertexts y

y y isgiven by

N

P

N

Elog P y ELy k

k i

i

u k u k

n n

Em k log N Em k log

D n D n

jD j jM jjD j

jM jjD j u k

uk n

n

N log Em k log

D n

jM jjD j u k jD j

n

th

where m k isthenumb er of times the n round message class o ccurs

D n

th

by decrypting the ciphertexts under n round subkey k

n

For dierential cryptanalyses jD j jM j uk p andwe can usu

D

n

jD j

ally make the simplifying assumption that for k k uk Thus

n n

n

jM j

for k k

ELy k N log jM j

whereas for the true key k

jM j p

D

ELy k N log jM j Em k log

D

p jD j

D

However Em k is maximal when k k so maximising m k gives an

D D

approximate maximum likeliho o d estimate

Gilb ert and Chasses dierential analysis of FEAL is one in which

the true key is found by calculating whichkey gives rise to a distribution

that b est ts a theoretical distribution We note that this attack is obtained

if we take in the ab ove mo del

nl nl

Biham and Shamir use m k to estimate the true key class Their

D n

estimate of the sample size is based on a signal to noise ratio SN In

likeliho o d terms this is the ratio of the likeliho o d under the true key class

to the average value of the likeliho o d In the theory of statistical hyp othesis

tests outlined in Section we consider the ratio of the likeliho o d under

the true key class to the largest likeliho o d under all the other key classes

Thus the signal to noise ratio SN doesnotallow for correlated counts for

incorrect key classes and so is an approximation whichmay underestimate the

sample size in situations where counts for dierentkeys are highly correlated

As wementioned ab ove the random pro cess on the dierential classes

for EXPDES is not approximated by a rst order Markov pro cess This is

b ecause the invertible function S on the double cipher with message space

A is partially homomorphic due to the DES Sb oxes mapping bits to

with resp ect to the dierential classes By using S to construct a ner

partition by pro jecting onto certain bit p ositions as well as the dierence

we obtain an approximate rst order pro cess Biham and Shamir termed

such partition functions for DES enhanced characteristics The transition

probabilities b etween enhanced characteristics are therefore keydep endent

but the analysis of enhanced characteristics is similar to that describ ed ab ove

The probabilities used for ordinary characteristics for DES are averages across

keys of the probabilities of the enhanced characteristics

Linear Structures

In a series of pap ers Reeds and Manferdelli Chaum and Evertse and

Evertse consider how to analyse a blo ck cipher if the encryption function

has some partial linearity asso ciated with it The basic idea is to nd linear or

ane transformations of the plaintext ciphertext and key spaces resp ectively

such that the mapp ed ciphertext is a function of the mapp ed plaintext and

mapp ed key If these transformations have less than full rank then it may

b e p ossible to cryptanalyse the smaller mapp ed cryptosystem and determine

a subspace of the key space con taining the true key

If a blo ck cipher has this partial linearity then we can dene partition

functions based on the various linear or ane transformations and obtain

a set of keydep endentpermutation matrices describing the transitions b e

tween the plaintext and ciphertext classes In terms of the likeliho o d frame

work these attacks can b e regarded as the sp ecial case when the transition

matrix is a p ermutation matrix In this case the likeliho o d function only

takes the values zero or one and the algebraic structure mayallowusto

evaluate the likeliho o d function quickly

The likeliho o d framework however allows us to relax the condition that

the transition matrices must b e p ermutation matrices and still enables us to

nd the keyThus for a general blo ck cipher when the partition functions

have b een dened in terms of linear or ane transformations we are really

considering probabilistic linear structures This is certainly the case with

the three examples Sb ox pairs linear and dierential cryptanalysis that

are considered ab ove In the analysis of SAFER in the next Section the

partition functions are based on a mo dule homomorphism

Analysis of the SAFER Blo ckCipher

In this Section we consider the SAFER blo ck cipher and showthat

viewing SAFER in the likeliho o d framework exp oses a cryptographic weak

ness This cryptographic weakness is describ ed more fully in

SAFER K is a blo ckcipherthatoperatesonbyte bit blo cks

under the control of an byte bit key It is a byteoriented cipher

in that all of its basic op erations are on bytes or pairs of bytes Thus we

consider the message space to b e a Z mo dule of rank V say SAFER is



an r round cipher that uses r byte bit subkeys K K The

r

th

key scheduling is such that the j byte of any subkey dep ends only on the

th th

j byte of the keyThus the subkey bytes are indep endent The i round

function splits naturally into two parts

Keyed ByteSeparatedLayer Eachofthebytes is pro cessed sepa

th

rately from the other bytes A byte is obtained from the j byte using

th

the j bytes from subkeys K and K bitXOR addition mo dulo

i i

andabyte function

Pseudo HadamardTransform PHT Layer This is a mo dule homo

morphism of V sothebytes are mixed linearlyIf has matrix M

with resp ect to the standard basis then

B C

B C

B C

B C

B C

B C

B C

M B C

B C

B C

B C

B C

B C

A

th

After r rounds there is a nal output transformation for whichthej

th th

byte output dep ends only on the j byte input and the j byte from subkey

K

r

In the likeliho o d framework weareinterested in dening partition func

tions on V which determine a partition on the key space suchthatthe

i

likeliho o d function on the key classes is easily calculated and can b e used to

nd the true key class of the true keyWe therefore consider the following

two submo dules of V

R h e e e e e e e e i

      

S h e e e e e e e e i

     

th

where e denotes the element with as its i co ordinate and everywhere

i

else Now V R S and R and S are in variant submo dules of rank

Wethus dene partion functions to b e the mo dule homomorphisms V

i

R VS which are dened by the matrix

C B

C B

C B

C B

C B

C B

C B

C B

C B

C B

C B

C B

C B

A

with resp ect to the standard basis The class of an element v V is its

pro jection onto the submo dule R

th

Wenow consider the eect of the i round function on the classes For

an input in a given class to the keyed byteseparated layer the probability

st th

of an output class is indep endentofthe and bytes of subkey K

i

and K as neither e nor e is included in any of the basis elements of R

i 

Thus the output class of the keyed byteseparated layer is indep endentof

st th

the and bytes of the keyThePHTlayer p ermutes the classes as R

is an invariant subspace Therefore the transition probability from any

th st

input class to the i round to any output class is indep endentofthe

th

and bytes of the key By using this argument iteratively and handling

the nal output transformation similarly wehaveshown that the transition

st

probability from plaintext class to ciphertext class is indep endentofthe

th

and bytes of the k ey In the language of Section wehave found a

probabilistic algebraic structure

 

We can therefore dene a key partition function K Z Z on

 

the key space K by

k k k k k k k k k k k k k k

          

For plaintext class x and ciphertext class y the loglikeliho o d function is

given by

Lx y k Lx y k for all k

 

There are key classes and so at most evaluations of the loglikeliho o d

function Thus if there are any nonnegligible correlations b etween plaintext

and ciphertext classes wehave a reduced key search A detailed analysis

is given in In any case wehavegiven partitions of the plaintext and

ciphertext spaces which are indep endent of a quarter of the key bits

Shannons principle of confusion which is to make the relationship

between simple statistics of the ciphertext and simple statistics of the key

avery complex and involved one is one of SAFERs design criteria

Our analysis of SAFER in the likeliho o d framework has given partitions of

the plaintext and ciphertext spaces which are indep endent of a quarter of

the key bits Wehavethus exp osed a cryptographic weakness of the SAFER

algorithm and shown that it do es not satisfy one of its own design criteria

References

D Andelman Maximum Likelihood Estiamtion Applied to Cryptanaly

sis PhD thesis Stanford University

D Andelman and J Reeds On the Cryptanalysis of Rotor Machines

and SubstitutionPermutation Networks IEEE Transactions on Infor

mation Theory IT

B Preneel M Nuttin V Rijmen and J Buelens Cryptanalysis of the

CFB mo de of DES with a Reduced Numb er of Rounds In Advances

in Cryptology Proceedings of CRYPTO pages Springer

Verlag LNCS

E Biham New Typ es of Cryptanalytic Attacks using Related Keys

Journal of Cryptology

E Biham and A Shamir Dierential Cryptanalysis of DESlike Cryp

tosystems Journal of Cryptology

L Brynielsson Hyp othesenprufung in der Kryptologie Personal Com

munication

D Chaum and JH Evertse Cryptanalysis of DES with a Reduced

Numb er of Rounds Sequences of Linear Factors in Blo ck Ciphers In

Advances in Cryptology Proceedings of CRYPTO pages

SpringerVerlag LNCS

D Davies and S MurphyPairs and Triplets of DES SBoxes Journal

of Cryptology

Y Desmedt Analysis of the Security and New Algorithms for Modern

Industrial Cryptogr aphy PhD thesis KatholiekeUniversiteit of Leuven

JH Evertse Linear Structures in Blo ck Ciphers In Advances in

Cryptology Proceedings of EUROCRYPT pages Springer

Verlag LNCS

H Gilb ert and G Chasse A Statistical Attack of the FEAL Cryp

tosystem In Advances in Cryptology Proceedings of CRYPTO pages

SpringerVerlag LNCS

BS Kaliski and MJB Robshaw Linear Cryptanalysis using Multiple

Approximations In Advances in Cryptology Proceedings of CRYPTO

pages SpringerVerlag LNCS

JL Lai X Massey and S Murphy Markov Ciphers and Dierential

Crytpanalysis In Advances in Cryptology Proceedings of EUROCRYPT

pages SpringerVerlag LNCS

JL Massey SAFER K A ByteOriented Blo ckCiphering Algo

rithm In Fast Software Encryption Proceedings of Cambridge Security

Workshop pages SpringerVerlag LNCS

M Matsui Linear Cryptanalysis Metho d for DES Cipher In Ad

vances in Cryptology Proceedings of EUROCRYPT pages

SpringerVerlag LNCS

M Matsui The First Exp erimental Cryptanalysis of the Data Encryp

tion Standard In Advances in Cryptology Pr oceedings of CRYPTO

pages SpringerVerlag LNCS

M Matsui and A Yamagishi A new Metho d of Known Plaintext At

tack of the FEAL cipher In Advances in Cryptology Proceedings of

EUROCRYPT pages SpringerVerlag LNCS

MJ Mihaljevic and JD Golic Convergence of a Bayesian Iterative

Errorcorrection Pro ceedure on a Noisy Shift Register In Advances in

Cryptology Proceedings of EUROCRYPT pages Springer

Verlag LNCS

S Murphy An Analysis of SAFER Journal of Cryptology submitted

National Bureau of Standards US De

partment of Commerce FIPS pub

JA Reeds and JL Manferdelli DES has no Per Round Linear Factors

In Advances in Cryptology Proceedings of CRYPTO pages

SpringerVerlag LNCS

CE Shannon Communication Theory of Secrecy Systems Bel l System

Technical Journal

SD Silvey Statistical Inference Chapman and Hall