<<

ECE596C: Handout #3

Cryptanalysis of Classical Cryptosystems

Electrical and Computer Engineering, University of Arizona, Loukas Lazos

Abstract. In this lecture we illustrate possible attacks against the substitution , the Vigen´erecipher and the .

1 Cryptanalysis of the Vigen´ereCipher

1.1 Known Plaintext Attack

If Eve knows at least m (cipher,plaintext) pairs then by subtracting plaintext from cipher she can get the vector of m keys.

1.2 Chosen plaintext attack

Choose aa..a as plaintext and get K as cipher. Note that one need not choose x=aa...a as plaintext,

a a a ... a 0 0 0 ... 0 + K1 K2 K3 ... Km K1 K2 K3 ... Km

but any known plaintext will reveal the K.

1.3 Chosen Attack

Choose AAA..A as cipher and the plaintext obtained is then the negative of the key K.

A A A ... A 0 0 0 ... 0 - K1 K2 K3 ... Km −K1 −K2 −K3 ... −Km

Again, one need not choose AAA..A as the ciphertext but any known ciphertext will do. 2 ECE 596C: for Secure Communications with Applications to Network Security

1.4 Ciphertext only attack We left this attack last as it is the hardest to launch. In general, an exhaustive search is very slow to launch due to the big cardinality of the keyspace. We can however perform statistical analysis based on the structure of the English language. The statistical analysis is more difficult than the affine and cases since a) the Vigen´erecipher is a polyalphabetic cryptosystem and b) the length of the key m is not known to Eve.

Conside the following example where the plaintext is x=weed:

PLAINTEXT: 22 4 4 3 KEY: 2 4 6 7 CIPHER: 24 8 10 10 YIKK

In the above example, e is mapped to I as well as K. Furthermore, e and d both map to the same cipher K. For long text, we can expect that all the letters may have equal frequency of occurrence and hence, the letter frequencies may not be particularly useful. Eve can still attempt to break the cryptosystem by the following attack. Break the problem to (a) Finding the key vector length m; (b) Finding the key vector K.

Finding key vector length m : The main idea is to use trial and error in increasing length of vector to find the correct frequency. Find plaintext + key = cipher. Then arrange the cipher as a vector of size m as follows.

y1 = y1 ym+1 y2m+1... y2 = y2 ym+2 y2m+2... . . ym = ym y2m y3m...

Every element of yi is shifted by the same key Ki of the vector key K. That means that each yi is the result of with a monoalphabetic cryptosystem. of x: Let x = (x1, ...., xn) be a string of alphabetic characters. Let the number of times A, B, C,...,Z appearing in x be denoted as f0, f1, .., f25. The Index of coincidence Ic(x) is the probability that two random elements of x are identical. We can write the index of coincidence as:   fi 25 X 2 I (x) = (1) c  n  i=0 2

P25 2 Ideally, Ic(x) = i=0 pi where pi is the empirical probability (Table 1) for alphabet with index i.

So if we have the correct key vector length m, the elements of y1 = y1 ym+1 y2m+1...; will have P25 2 P25 1 2 1 Ic(y1)≈ i=0 pi ≈ 0.065. Else it will be close to i=0 26 = 26 = 0.038, which is the index of Handout # 3 3 coincidence for random character distribution.

Kasiski Test: The idea behind the Kasiski test is that it is quite infrequent to find pairs of identical segments of ciphertext of length at least three, unless these segments are the result of the encryption of the same plaintext. In that case, the distance δ of occurrence of the identical segment must be a multiple of m. That is: δ ≡ 0 (mod 0). To find the period of the Vigen´erecipher using the Kasiski test we execute the following steps: 1. Search ciphertext for pairs of identical segments with length at least 3. 2. Record distances between the starting positions of the segments. 3. Take Greatest Common Divisor (gcd) of these distances as the key vector length m. Let us illustrate the use of these techniques with an example. The following is a ciphertext obtained from Vigen`ere Cipher. CHREEVOAHMAERATBIAXXWTNXBEEOPHBSBQMQEQERBW RVXUOAKXAOSXXWEAHBWGJMMQMNKGRFVGXWTRZXWIAK LXFPSKAUTEMNDCMGTSXMXBTUIADNGMGPSRELXNJELX VRVPRTULHDNQWTWDTYGBPHXTFALJHASVBFXNGLLCHR ZBWELEKMSJIKNBHWRJGNMGJSGLXFEYPHAGNRBIEQJT AMRVLCRREMNDGLXRRIMGNSNRWCHRQHAEYEVTAQEBBI PEEWEVKAKOEWADREMXMTBHHCHRTKDNVRZCHRCLQOHP WQAIIWXNRMGWOIIFKEE CHR cipher appears at 1, 166, 236, 276, 286 start locations. So the distances from 1st occurence to other four occurences are 165, 235, 275, 285 respectively. According to the Kasiski Test the gcd of these distances being 5, is the most likely length of the key vector.

Using the Index of Coincidence method, when m = 1, Ic(y) = 0.045. For the case m = 2, Ic(y1) = 0.046 and Ic(y2) = 0.041. With m = 5 we get the indices as 0.063,0.068,..,0.072. Hence, when m = 5, the indices are close to the expected value of 0.065. We can therefore provide strong evidence to conclude that the key vector length is 5.

Finding the key vector K : n 0 ˆ ˆ ˆ Look at yj = yj ym+j ...y n . Let = n . Let f0, f1, .., f25 be the number of times ( m −1)m+j m (frequency) A, B, ..., Z appear in yj. Then the probability of alphabet corresponding to index i is:

fˆ fˆ m pˆ = i = i . (2) i n0 n ˆ Each yj has been associated with fixed key Kj. i.e. fi = fi+Kj where i, i + Kj correspond to the indices of alphabets. Let 0 ≤ g ≤ 25 and define Mg to be:

25 X pifi+g M = . (3) g n0 i=0

Based on this we can try different shifts Kj ∈ {0, ..., 25} and check if:

25 25 X pifi+Kj X M = ≈ p2 = 0.065. (4) g n0 i i=0 i=1 4 ECE 596C: Cryptography for Secure Communications with Applications to Network Security where pi is the empirical probability (Table 1) for alphabet with index i. The resulting shift satis- fying eq. (4), is the value for key Kj. Similarly we can find values for the remaining keys to finally get the key vector K.

Continuing with the example above, we can find the value of the keys (K1,K2,K3,K4,K5) as shown in Table 1. As seen, upon trying all the 26 values of shift, we find 0.061 as the closest match corresponding to K1 = 9. Similarly, we can finally get K = (9, 0, 13, 4, 19).

j Mg

.035 .031 .036 .037 .035 .039 .028 .028 .048 1 .061 .039 .032 .040 .038 .038 .045 .036 .030 .042 .043 .036 .033 .049 .043 .042 .036 .069 .044 .032 .035 .044 .034 .036 .033 .029 2 .031 .042 .045 .040 .045 .046 .042 .037 .032 .034 .037 .032 .034 .043 .032 .026 .047 .048 .029 .042 .043 .044 .034 .038 .035 .032 3 .049 .035 .031 .035 .066 .035 .038 .036 .045 .027 .035 .034 .034 .036 .035 .046 .040 .045 .032 .033 .038 .060 .034 .034 .034 .050 4 .033 .033 .043 .040 .033 .029 .036 .040 .044 .037 .050 .034 .034 .039 .044 .038 .035 .034 .031 .035 .044 .047 .037 .043 .038 .042 5 .037 .033 .032 .036 .037 .036 .045 .032 .029 .044 .072 .037 .027 .031 .048 .036 .037

p f P25 i i+Kj Table 1. Qj = i=0 n0 .

2 Cryptanalysis of the Hill Cipher The Hill cipher is difficult to break with a ciphertext only attack, but a known plaintext attack can be easily launched. Assume that Eve, knows that m = 2 and the plaintext is friday yielding a ciphertext PQCFKU. Given that Eve knows at least two (plaintext, ciphertext) pair, it can create a matrix equation Y = XK and solve for K by inverting matrix X. For our example  5 17  X = . (5) 8 3 and the inverse is

 9 1  X−1 = . (6) 2 15 We can then compute the key K to be:  9 1   15 16   7 19  K = = . (7) 2 15 2 5 8 3 If m is unknown, Eve can proceed using trial and error for different values of m.