EE 418 Network Security and Lecture #6 October 18, 2016

Cryptanalysis. Lecture notes prepared by Professor Radha Poovendran. Tamara Bonaci Department of Electrical Engineering University of Washington, Seattle

Outline:

1. Review: Introduction to 2. Remarks on Letter Distribution of the English Language 3. Cryptanalysis of the Affine Cipher 4. Cryptanalysis of the Vigen`ereCipher 5. Cryptanalysis of the

1 Review: Introduction to Cryptanalysis

Last lecture, we started our discussion on how secure cryptosystems are, and how could one go about break- ing them. In doing so, we turned to cryptanalysis, and started by considering one of the most important assumptions in the modern cryptography, namely the Kerchoff’s principle, which states that in assessing the security of a cryptosystem, one should always assume that an attacker knows the details of the cryp- tosystem being used. Therefore, the security of the system should always be based on the key, and not on the obscurity of a cryptographic algorithm.

1.1 Attack models We then considered different goals that an attacker can have when attacking a channel between communi- cating parties. For example, an attacker may wish to: 1. Read one specific message. 2. Find the /decryption key, and thus read all of the exchanged messages. 3. Corrupt Alice’s message into another message in such a way that Bob thinks that Alice has sent the altered message. 4. Masquerade as Alice in order to communicate with Bob such that Bob believes he is communicating with Alice. For each of these goals, there are four main types of attacks that an attacker can use, and those types differ in the amount of information an attacker has available when trying to determine the key. Those four attack types are as follows.

Type of attack Description only attack Eve only observes the ciphertext y Known plaintext attack Eve knows the ciphertext y corresponding to plaintext x Chosen plaintext attack Eve has temporary access to an encryption box. The encryption box takes as input any chosen plaintext x and outputs the ciphertext y Chosen ciphertext attack Eve has temporary access to a decryption box. The decryption box takes as input any chosen ciphertext y and outputs the plaintext x Based on these models, we can analyze the security of every cryptosystem.

1 2 Cryptanalysis of the Shift Cipher

– Ciphertext only: Let K = 3 and the plaintext be shift. We then get VKLIW as the cipher (for a right shift). Assume Eve knows only the ciphertext V KLIW . Eve also knows that a shift cipher algorithm is used for encryption. Given the small cardinality of the key space, Eve can try all the possible 26 shifts in right direction. Upon shifting, the following plaintexts are obtained:

1stleft shift 2ndleft shift 3rdleft shift vkliw −→ ujkhv −→ tijgu −→ shift, and so on. Since “shift” is the only dictio- nary word in the list of 26 possible words, Eve assumes that it is indeed the plaintext that was encrypted. Therefore, Eve can also infer the original key K = 3.

– Known plaintext: If Eve knows a (plaintext, ciphertext) pair, then Eve can find the key by subtracting the plaintext from the ciphertext mod 26. For instance, if Eve knows that plaintext b corresponds to ciphertext E, then Eve can determine that K = 3.

– Chosen plaintext: Choose letter a as plaintext; the resulting ciphertext will be the key. For example, if the ciphertext is P then K = 15.

– Chosen cipher: Choose A as the ciphertext. The plaintext is then the negative of the key K.

3 Remarks on Letter Distribution of the English Language

English language text has different frequencies for different alphabetic characters. An estimate of relative frequencies (probabilities) of the 26 letters are presentedin Table 3. Note that letter e has the maximum relative frequency of 0.127.

Table 1. Probabilities of occurrence of the 26 letters of the English language alphabet.

A B C D E F G H I J K L M 0.082 0.015 0.028 0.043 0.127 0.022 0.020 0.061 0.070 0.002 0.008 0.040 0.024 N O P R S T U V W X Y Z 0.067 0.075 0.019 0.001 0.060 0.063 0.091 0.028 0.010 0.023 0.001 0.020 0.001

Similarly we can define frequencies of digrams, trigrams, initial letters, final letters, etc. More generally, we can use the statistical properties of the English language to perform cryptanalysis. A key observation here that the vowels ”a, e, i, o” and the letters ”t, s, b, h, d” have relatively high probability of appearance compared to other characters. Table 3 indicates the rank order of vowels based on their frequencies, and Table 3 the rank order of consonants ”t, s, d, n, h” based on their frequencies.

Table 2. Rank order of the probabilities of occurrence of the vowels.

E 0.127 A 0.082 I 0.075 O 0.070 U 0.028

2 Table 3. Probabilities of most frequently occurring consonants.

T 0.091 S 0.063 N 0.067 H 0.061 D 0.043

4 Cryptanalysis of the Affine Cipher – Ciphertext only attack: Let’s assume Eve that has intercepted the following ciphertext: FMXVEDKAPHFERBNDKRXRSREFMORUDSDKDVSH VUFEDKAPRKDLYEVLRHHR The most frequent letters are R with 8 occurrences, D with 7, E,K,H with 5 and F,V,S with 4. First guess is that R = e and D = t. Given the encryption function

eK (x) = ax + b (1) we get the following linear system: 4a + b = 17 (2) 19a + b = 3. (3)

Solving the system we obtain the unique solution a = 6, b = 19 (note that a solution must be in Z26). But for the affine cipher a has to be relatively prime to 26. Given that gcd(26, 6) = 2, a = 6, b = 19 is not a valid key. Second guess R = e and E = t. Solving the linear system yields a = 13 which again is not a legal key. Third guess is R = e and K = t, which yields a = 3, and b = 5. Since this is a valid key we decrypt the entire ciphertext to see if we get a meaningful English text.

algorithms are quite general definitions of arithmetic processes

Note: Besides the statistical analysis, Eve could have tried all possible 312 pairs (a, b) that constitute a valid key for the affine cipher.

– Known plaintext attack: Let Eve know that uw = 20 22, has cipher KQ = 10 16. She can then setup the following system of linear equations: 10 = 20a + b (mod 26), (4) 16 = 22a + b (mod 26). (5) Equations 4 and 5 give: 6 = 2a mod 26. i.e. 2a = q × 26 + 6 ⇒ a = 3, 16. But gcd(16, 26) 6= 1 ⇒ a = 3. From Equation 4 we can now get b as follows: 10 = 20 × 3 + b (mod 26), (6) i.e. − 50 = b (mod 26) (7) i.e. b = q × 26 + (−50) ⇒ q = 2 ⇒ b = 2. (8) Hence Eve only needs to know two pairs of (cipher, plaintext) pairs.

– Chosen plaintext: If Eve can choose ab = 0 1 as plaintext, the cipher will be: 0 × a + b ≡ b (mod 26), (9) 1 × a + b ≡ a + b (mod 26). (10) and Eve can easily find the key K.

– Chosen ciphertext: Eve chooses AB as cipher, and proceeds as above.

3 5 Cryptanalysis of the Vigen´ereCipher

5.1 Known Plaintext Attack If Eve knows at least m (ciphertext, plaintext) pairs then by subtracting the plaintext from the ciphertext she can get the vector of m keys.

5.2 Chosen plaintext attack

Choose aa..a as plaintext, and get K as the ciphertext. | {z } m

a a a ... a 0 0 0 ... 0 + K1 K2 K3 ... Km K1 K2 K3 ... Km

Note 1: One does not need to choose x = aa...a as plaintext, as any known plaintext will also reveal the | {z } m key K.

5.3 Chosen Ciphertext Attack Choose AAA..A as a ciphertext, and the obtained plaintext is then the negative of the key K. | {z } m

A A A ... A 0 0 0 ... 0 - K1 K2 K3 ... Km −K1 −K2 −K3 ... −Km

Note 2: Again, one does not need to not choose AAA..A as the ciphertext. Any chosen ciphertext will do. | {z } m

5.4 Ciphertext only attack

We left this attack last as it is the hardest to launch. In general, an exhaustive search is very slow due to the large cardinality of the keyspace. We can, however, perform a statistical analysis based on the structure of the English language. The statistical analysis is more difficult than the affine and cases because: (a) the Vigen´erecipher is a polyalphabetic cryptosystem, and (b) the length of the key m is not known to Eve.

4 Consider the following example where the plaintext is x=weed: In the given example, alphabet e is mapped

PLAINTEXT: 22 4 4 3 KEY: 2 4 6 7 CIPHER: 24 8 10 10 YIKK

to I the first time, and to K the second time. Moreover, alphabets e and d both map to the same cipher K. For long text, we can expect that all the letters may have equal frequency of occurrence and hence, the letter frequencies may not be particularly useful. Eve can still attempt to break the cryptosystem by executing the following attack in two stages:

1. Finding the key vector length m; 2. Finding the key vector K.

Finding key vector length m using Kasiski Test: The key length m can be found using the Kasiski test. The idea behind the Kasiski test is that it is quite improbable to find pairs of identical segments of ciphertext of length at least three, unless these segments are the result of the encryption of the same plaintext. In that case, the distance δ of occurrence of the identical segment must be a multiple of m. That is, δ ≡ 0 (mod m). To find the period of the Vigen´erecipher using the Kasiski test, we execute the following steps:

1. Search ciphertext for pairs of identical segments with length at least 3. 2. Record distances between the starting positions of the segments. 3. Take Greatest Common Divisor (gcd) of these distances as the key vector length m.

Let us illustrate the use of these techniques with an example. The following is a ciphertext obtained from Vigen`ere Cipher. CHREEVOAHMAERATBIAXXWTNXBEEOPHBSBQMQEQERBW RVXUOAKXAOSXXWEAHBWGJMMQMNKGRFVGXWTRZXWIAK LXFPSKAUTEMNDCMGTSXMXBTUIADNGMGPSRELXNJELX VRVPRTULHDNQWTWDTYGBPHXTFALJHASVBFXNGLLCHR ZBWELEKMSJIKNBHWRJGNMGJSGLXFEYPHAGNRBIEQJT AMRVLCRREMNDGLXRRIMGNSNRWCHRQHAEYEVTAQEBBI PEEWEVKAKOEWADREMXMTBHHCHRTKDNVRZCHRCLQOHP WQAIIWXNRMGWOIIFKEE CHR cipher appears at 1, 166, 236, 276, 286 start locations. So the distances from 1st occurence to other four occurences are 165, 235, 275, 285 respectively. The gcd of these distances is 5, and so the most likely length of the key vector is 5 according to the Kasiski test.

Finding the key vector K leveraging language: Let yi be the i-th character of the ciphertext, and let xi be the corresponding character of the plaintext. If m is the key length, then for a sufficiently long plaintext, the characters xi, xi+m, xi+2m,... will have the distribution of the English language. Furthermore, since, under the Vigenere cipher, yi = xi +Ki, yi+m = xi+m +Ki, and so on, the characters yi, yi+m, yi+2m,... will have the distribution of the English language, plus some fixed shift. Hence finding the correct shift is a matter of finding the Ki such that subtracting Ki from yi, yi+m,... will result in a string with the distribution of English. A formal description of this approach is presented in Figure ??

5 VIGENERE CRYPTANALYSIS ALGORITHM Input: Ciphertext y = y1y2 . . . yn encrypted using Vigenere cipher Output: Plaintext x = x1x2 ... m ← KasiskiTest(y) //Key length m for all i = 1, . . . , m yi ← yiyi+myi+2m ... Generate rank ordering of letters in yi, denoted yi1, yi2,... Solve shift cipher equation xil + Kil ≡ yil mod 26 for each l end for z ← 0 while z == 0 Pick l1, l2, . . . , lm

K ← K1l1 K2l2 ...Kmlm x ← dK (y) if x resembles English text z ← 1 end if end while return x

Fig. 1. Algorithm for cryptanalysis of Vigenere cipher.

Let’s analyze the presented approach by continuing the example from above. We begin by generating the vectors yi:

y1 = CVABWEBQBUAWWQRWWXANTBDPXXRDWBFAXCWMNJJFAIACNRNCATBWKDMCDCQQXWK y2 = HOEITESEWOOEGMFTIFUDSTNSNVTNDPASNHESBGSEGEMRDRSHEAIEORTHNHOANOE y3 = RARANOBQRASAJNVRAPTCXUGRJRUQTHLVGRLJHNGYNQRRGINRYQPVEEBRVRHIRIE y4 = EHAXXPQEVKXHMKGZKSEMMIMEEVLWYXJBLZEIWMLPRJVELMRQEEEKWMHTRCPIMI y5 = EMTXBHMRXXXBMGXXLKMGXAGLLPHTGTHFLBKKRGXHBTLMXGWHVBEAAXHKZLWWGF

For y1, the most frequent letters are W (9 times), A (7 times), B (6 times), and C (6 times). Based on the information about the letter frequencies, we make the first guess that ciphertext W is mapped to plaintext e. Using the encryption rule yi = Ki +xi mod 26 (1 ≤ i ≤ m), we write K1 = 22−4 = 18 mod 26. For y2, the most common ciphertext letters are E (10 times), T (7 times), O (6 times), and N (6 times). We therefore guess that ciphertext E maps to plaintext e. For y3, the most common ciphertext characters are R (13 times), N (5 times), A (5 times), and V (4 times), and so we guess that ciphertext R maps to plaintext e. In the ciphertext string y4, the most common characters are E (10 times), M (8 times), X (4 times), and L (4 times). Hence we guess that ciphertext E maps to plaintext e for y4. Finally, in y5, the most common characters are X (10 times), G (7 times), L (6 times), and H (6 times), and so we guess that ciphertext X maps to plaintext e in y5. Our initial guess of the key is therefore given by K1 = 18, K2 = 0, K3 = 13, K4 = 0, and K5 = 19. Decrypting with this key, we obtain plaintext:

kheeldonhtieeaajinxeetaximebpojsoqtyedeyjweveconkeiofxeeenhiegwmtymak nzfigeetezeeinksffcsriugetvdpmnbskmejthihlntmnxseesfnwesfvevwzthlolnd waedgynjpuxanayjoisibmfntlskhezieeyeruswirvbuwyrgamnrstlenelpoigariqe djaimevskreetvdtlezrvmnvsardkheqoielecbadeijiceleeikhsorwhlrrmeutohok hetrlnirgkhecsyoupdyavidfnemneovimser

So far it doesn’t look like English. Suppose we were wrong about the cipher W being e. Lets try A to correspond to plain text e. under this mapping, the key K1 = 0 − 4 = −4 = 22 mod 26. The decryption then goes as “yheeldon...” which also does not seems like regular English. We keep trying this and find that none of the cipher letters W, A, B, C can map to plaintext letters e. We then try if the cipher letters

6 W, A, B, C can map to the second most frequent plaintext letter t. After checking each of them, we find that the mapping of cipher letter C to t works well and the key for this case is K1 + 19 = 2 mod 26, leading to K1 = 2 − 19 = −17 = 9 mod 26. Now the text reads as: theelmonhtreeaasinxentaxivebpossoqthedeyswevelonkerofxenenhingwmthmak nifigentezeninksofcsrrugetedpmnkskmesthihuntmngseesonwesovevwithlound wandgynspuxawayjorsibmontlsthezineyerdswirebuwyagamnastlewelporgarize djarmevstreetedtleirvmnesardtheqoreleckadeisicelneikhborwhurrmedtohot hetrunirgthecshoupdhavidonemnnovimber

Now the message makes a bit more sense. But the fourth vector y4 looks like it has not been decrypted well into regular English. It seems that the most frequent ciphertext letter E is not mapped to plaintext letter e. If we try to map it to the plaintext letter t the text does not make sense either. But if we map the ciphertext E to plaintext a then the key is K1 = 4 and the text reads as “thealmondtree...” and the text looks like regular English.

thealmondtreewasintentativeblossomthedayswerelongeroftenendingwithmag nificenteveningsofcorrugatedpinkskiesthehuntingseasonwasoverwithhound sandgunsputawayforsixmonthsthevineyardswerebusyagainasthewellorganize dfarmerstreatedtheirvinesandthemorelackadaisicalneighborshurriedtodot hepruningtheyshouldhavedoneinnovember

We have a message which seems regular English, based on the key K = (9, 0, 13, 4, 19). With correct spacing and punctuation, the message looks like:

The almond tree was in tentative blossom. The days were longer, often ending with magnificent evenings of corrugated pink skies. The hunting season was over with hounds and guns put away for six months. The vineyards were busy again as the well-organized farmers treated their vines and the more lackadaisical neighbors hurried to do the pruning they should have done in November.

6 Cryptanalysis of the Hill Cipher

The Hill cipher is difficult to break with a ciphertext only attack, but a known plaintext attack can be easily launched.

6.1 Known Plaintext Attacks

Assume that Eve knows that m = 2 and that the plaintext friday yields ciphertext PQCFKU. Given that Eve knows at least two (plaintext, ciphertext) pairs, she can create a matrix equation Y = XK and solve for K by inverting matrix X, so that K = X−1Y . For our example  5 17  X = . (11) 8 3 and the inverse is

 9 1  X−1 = . (12) 2 15 We can then compute the key K to be:

 9 1   15 16   7 19  K = = . (13) 2 15 2 5 8 3

If m is unknown, Eve can proceed using trial and error for different values of m.

7 Sources for Today’s Lecture:

1. Douglas R. Stinson, Cryptography, Theory and Practice, 3rd edition. CRC Press, 2005, p. 1–39. 2. Wade Trappe and Lawrence C. Washington Introduction to Cryptography with Coding Theory. Prentice Hall, 2002, p. 1–26 and 59–74. 3. Neil Daswani, Christoph Kern, and Anita Kesavan Foundations of Security, What Every Programmer Needs to Know. Apress, 2007, p. 203–221.

8