<<

Cryptology: An Historical Introduction DRAFT

Jim Sauerberg

February 5, 2013 2

Copyright 2013 All rights reserved Jim Sauerberg Saint Mary’s College Contents

List of Figures 8

1 Caesar 9 1.1 Saint Cyr Slide ...... 12 1.2 Running Down the Alphabet ...... 14 1.3 ...... 15 1.4 Linquist’s Method ...... 20 1.5 Summary ...... 22 1.6 Topics and Techniques ...... 22 1.7 Exercises ...... 23

2 Cryptologic Terms 29

3 The Introduction of Numbers 31 3.1 The Remainder Operator ...... 33 3.2 Modular Arithmetic ...... 38 3.3 Decimation Ciphers ...... 40 3.4 Deciphering Decimation Ciphers ...... 42 3.5 Multiplication vs. Addition ...... 44 3.6 Koblitz’s Kid-RSA and Public Codes ...... 44 3.7 Summary ...... 48 3.8 Topics and Techniques ...... 48 3.9 Exercises ...... 49

4 The Euclidean Algorithm 55 4.1 Linear Ciphers ...... 55 4.2 GCD’s and the Euclidean Algorithm ...... 56 4.3 Multiplicative Inverses ...... 59 4.4 Deciphering Decimation and Linear Ciphers ...... 63 4.5 Breaking Decimation and Linear Ciphers ...... 65 4.6 Summary ...... 67 4.7 Topics and Techniques ...... 67 4.8 Exercises ...... 68

3 4 CONTENTS

5 Monoalphabetic Ciphers 71 5.1 Keyword Ciphers ...... 72 5.2 Keyword Mixed Ciphers ...... 73 5.3 Keyword Transposed Ciphers ...... 74 5.4 Interrupted Keyword Ciphers ...... 75 5.5 Frequency Counts and Exhaustion ...... 76 5.6 Basic Letter Characteristics ...... 77 5.7 Aristocrats ...... 78 5.8 Summary ...... 80 5.9 Topics and Techniques ...... 81 5.10 Exercises ...... 81

6 Decrypting Monoalphabetic Ciphers 89 6.1 Letter Interactions ...... 90 6.2 Decrypting Monoalphabetic Ciphers ...... 91 6.3 Sukhotin’s Method for Finding Vowels ...... 97 6.4 Final Monoalphabetic Tricks ...... 99 6.5 Summary ...... 101 6.6 Topics and Techniques ...... 102 6.7 Exercises ...... 102

7 Vigen`ereCiphers 109 7.1 Alberti, the Father of Western Cryptology ...... 110 7.2 Trithemius, the Father of Bibliography ...... 111 7.3 Belaso, the Unknown and Porta, the Great ...... 113 7.4 Vigen`ereCiphers ...... 114 7.5 Variants and Beaufort ...... 116 7.6 How to Break Vigen`ereCiphers ...... 117 7.7 The Kasiski Test ...... 120 7.8 Summary ...... 123 7.9 Topics and Techniques ...... 124 7.10 Exercises ...... 124

8 Polyalphabetic Ciphers 135 8.1 Coincidences ...... 135 8.2 The Measure of Roughness ...... 138 8.3 The Friedman Test ...... 142 8.4 Multiple Encipherings ...... 145 8.5 Vigen`ere’sAuto Key ...... 149 8.6 Perfect Secrecy ...... 152 8.7 Summary ...... 154 8.8 Terms and Topics ...... 155 8.9 Exercises ...... 156 CONTENTS 5

9 Digraphic Ciphers 167 9.1 Polygraphic Ciphers ...... 167 9.2 Hill Ciphers ...... 170 9.3 Recognizing and Breaking Polygraphic Ciphers ...... 174 9.4 Playfair ...... 176 9.5 Summary ...... 179 9.6 Topics and Techniques ...... 179 9.7 Exercises ...... 180

10 Transposition Ciphers 189 10.1 Route Ciphers ...... 189 10.2 Geometrical Ciphers ...... 190 10.3 Turning Grilles ...... 190 10.4 Columnar Transposition ...... 192 10.5 Transposition vs. Substitution ...... 195 10.6 Letter Connections ...... 196 10.7 Breaking the Columnar ...... 198 10.8 Double Transposition ...... 201 10.9 Transposition during the Civil War ...... 202 10.10 The Battle of the Civil War Ciphers ...... 207 10.11 Summary ...... 207 10.12 Topics and Techniques ...... 208 10.13 Exercises ...... 209

11 Knapsack Ciphers 219 11.1 The Knapsack Problem ...... 219 11.2 A Related Knapsack Problem ...... 220 11.3 An Easy Knapsack Problem ...... 221 11.4 The Knapsack Cipher System ...... 223 11.5 Public Key Cipher ...... 227 11.6 Summary ...... 228 11.7 Topics and Techniques ...... 228 11.8 Exercises ...... 228

12 RSA 231 12.1 Fermat’s Theorem ...... 232 12.2 Complication I: a small one ...... 234 12.3 Complication II: a substantial one ...... 235 12.4 Complication III: a mini one ...... 238 12.5 Complication IV: the last one ...... 239 12.6 Putting It All Together ...... 241 12.7 Exponential Problems (and answers) ...... 241 12.8 RSA ...... 242 12.9 RSA and Public Keys ...... 245 12.10 How to break RSA ...... 245 12.11 Authenticity – Proof of Authorship ...... 248 6 CONTENTS

12.12 Summary ...... 249 12.13 Topics and Techniques ...... 250 12.14 Exercises ...... 251

Bibliography 253

Index 257 List of Figures

1.1 Saint Cyr Slide ...... 12 1.2 Decrypting a by running down the alphabet . . . . 15 1.3 Letter Frequency, in percent. From Sinkov...... 16 1.4 English Letter Frequency Chart ...... 16 1.5 Letter Frequency Charts for Several Languages ...... 18

2.1 Alice, Bob and Eve: the three names of ...... 30

3.1 The Standard Translation of Letters into Numbers ...... 31 3.2 Enciphering/Deciphering pairs modulo 26...... 43

5.1 Letter Frequencies – Anywhere...... 77 5.2 Letter Frequencies – Initial Letters...... 77 5.3 Letter Frequencies – Final Letters...... 78 5.4 Characteristics of etaoinshr...... 78 5.5 Most Common Short Words ...... 78 5.6 The 100 Most Common Words in English ...... 79

6.1 Some Basic Letter Behaviors ...... 91 6.2 Digraph Table ...... 93 6.3 Digraph Table ...... 96 6.4 Letter Behaviors ...... 97

7.1 Trithemius’ ...... 112 7.2 Beaufort’s Tableaux ...... 116 7.3 A Kasiski Table ...... 123

8.1 Frequency Counts: Same quote, different keylengths...... 138 8.2 Larrabee’s Cipher ...... 163

9.1 A Simple Digraphic Substitution Chart ...... 169 9.2 A More Complicated Digraphic Substitution Chart ...... 170 9.3 18 Most Frequent , in percent ...... 171 9.4 Repetitions in the unknown cipher...... 175

7 8 LIST OF FIGURES

10.1 A 3 × 3 turning ...... 191 10.2 Appearances before and after the given letter, in %...... 197 10.3 Frequencies (in %%), from the Brown Corpus...... 198 10.4 Codewords for keyword McClellan...... 212 10.5 Five similar dispatches from 1876...... 216

11.1 Binary Equivalents for the Alphabet ...... 223 Chapter 1

Caesar Ciphers

There are also letters of his to Cicero, as well as to his intimates on private affairs, and in the latter, if he had anything confidential to say, he wrote it in cipher, that is, by so changing the order of the letters of the alphabet, that not a word could be made out. If anyone wishes to decipher these, and get at their meaning, he must substitute the fourth letter of the alphabet, namely D, for A, and so with the others. Suetonius De Vita Caesarum (The Lives of the Caesars)

The first true use of secret writing in recorded history is due to Julius Cae- sar, at least as explained by the Roman historian Suetonius. There had been earlier uses of what David Kahn, in his masterwork The Codebreakers: The His- tory of Secret Writing, calls “proto-cryptography,” such as complex Egyptian hieroglyphics and certain stories in the biblical book of Jeremiah (see verses 25:26 and 51:41).1 But it is the Roman general Julius Caesar who apparently actually invented cryptography, the art and science of designing methods to send secret messages. Caesar’s method for making his messages secret is straightforward: replace every letter in a message with the one three letters down the alphabet. So, as Suetonius explains, a is replaced by D, b is replaced by E, etc. Of course, Caesar would have probably sent his message in Latin, but the point is clear. This rather simple idea, replace the letters in your message by other letters according to some rule, constitutes a cipher. One enciphers a message to make it (hopefully!) secret, and deciphers a secret message to make it readable.

1It is all but impossible to write a book involving the without making extensive use of The Codebreakers [Kahn]. In fact, most uncredited references in this book come from [Kahn].

9 10 CHAPTER 1. CAESAR CIPHERS

Examples: Encipher or Decipher the following names.

(1) Encipher Julius. For each letter in the name, we count three letters down the alphabet and replace the original letter by this new letter. – l – M. J is replaced by M. u – . u is replaced by X. l – m – . l is replaced by O. Finish the rest of this example, and then check your answer by looking at footnote 2 part (1). (2) Encipher Caesar. (3) Decipher EUXWXV. Since enciphering is counting forwards, deciphering must be counting back- wards. E – d – – b. U – t – s – r. X – w – v – u. Finish this example, and again check your answer against the footnote.2 (4) Decipher FLFHUR. 3

 When one must encipher messages of more than a couple of words, it be- comes quite bothersome to count three forwards and three backwards over and over. To somewhat automate the process we write out the usual alphabet, the alphabet, and underneath it the alphabet we use for enciphering, the alphabet:

plaintext alphabet a b c d e f i j k l m n o p r s t u v w x ciphertext alphabet D E F G H I J K L M N O P Q R S T U V W X Y Z A B C

Then to encipher replace the plaintext letter by the ciphertext letter (underneath it), and to decipher replace the ciphertext letter by the plaintext letter (above it). Thinking “read down” or “read up” will lead to trouble later. Think instead about moving from plaintext letters to ciphertext letters, or from ciphertext letters back to plaintext letters.4

2Some of the examples in this book will be completely worked out. But some will be only partially finished, and others not even begun. These latter types are for you to work out. Do so with paper and pencil, and then check your answers against the answer in the footnote, and you will find the exercises to be much easier. 3(1) MXOLXV, (2) FDHVDU, (3) Brutus, (4) Cicero 4The fonts used in the alphabets just above are not accidental: we will always use lower case letters like these ones to represent plaintext letters, and upper case letter LIKE THESE ONES for ciphertext letters. 11

Examples: Encipher or decipher the following names.

(1) Encipher Gaius. To encipher we move from plaintext alphabet to ciphertext alphabet, that is, replace each of the plaintext letters gaius by the corresponding cipher- text letters (underneath them):

g a i u s JDLXV

So JDLXV is the answer.

(2) Encipher Cleopatra.

(3) Decipher SRPSHB. To decipher we move from the ciphertext alphabet back to the plaintext alphabet. SRPSHB p o m p e y So pompey is the answer.

(4) Decipher FUDVVXV. 5



In Caesar’s time his cipher was likely a good one. After all, it was the first one ever invented!6 But once it is known that shifting three forwards and three back is the key, the secrecy is lost. To try to regain it we might instead agree ahead of time on a secret number that tells the amount we will shift each letter.

Examples: Encipher or decipher using the given shift amount.

(1) Encipher Augustus using a shift amount of 1. The counting here is simple – replace each letter by the next one, so a by b, u by V, etc.7

(2) Encipher Quintillis using a shift amount of 10. Quintillis, meaning “five” in Latin, was the name of the 5th month used Julius Caesar named it after himself.

5(2) FOHRSDWUD, (3) Pompey, (4) Crassus 6There is no reason, however, to believe that Caesar used these ciphers for long, or for important messages. Cicero, with whom Caesar would have used the system, changed political sides, making the system no longer secret. [ATTRIBUTION?] 7Seutonius: “When Augustus wrote in cipher he simply substituted the next letter of the alphabet for the one required, except that he wrote AA for x” (the last letter of the Roman alphabet). [Kahn, pg 84] 12 CHAPTER 1. CAESAR CIPHERS

(3) Decipher ZLEAPSSH. It was enciphered with a shift of 7. This is the same way we deciphered before, we just count more: Z -y-x-w-v-u-t- s L -k-j-i-h-g-f- e So the word starts se. (It is the name of the sixth month of the year before it was named for Caesar Augustus.) (4) Decipher FTKRMZLJ. It was enciphered using a shift amount of 17. (It may be easiest to write down the plaintext alphabet with ciphertext alphabet underneath.) 8

 Ciphers such as these are called Caesar Ciphers or Shift Ciphers, and the shift amount is called the key. While these ciphers are quite simple, they are the foundation upon which most ciphers are built.

1.1 Saint Cyr Slide

So far we have two methods for implementing a shift cipher: either count through the alphabet letter after letter, or write down the plaintext alpha- bet with a new ciphertext alphabet for each shift amount. While not exactly difficult, the first method is prone to silly errors and the second is very tedious. We remedy this by building a Saint Cyr Slide9, a simple device that will allow us to use any Caesar cipher we wish.

Figure 1.1: Saint Cyr Slide

To make one, first, on a thin strip of paper, or on something stronger like tag board, put the 51 upper case letters

ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXY in capital letters, repeated twice (the Z in the second alphabet is unnecessary). We need to have the letters a constant distance apart, so it is best to use lined

8(1) BVHVTUVT, (2) AESXDSVVSC, (3) Sextilla, (4) Octavius 9Slides similar to this have been used for centuries, but their modern name dates from 1883 when Auguste Kerchoff named them for the famed French Military Academy [Kahn, pg 289]. 1.1. SAINT CYR SLIDE 13 paper turned sideways and put one letter per space. Leave a couple inches of empty paper on each end of the letters. Under the first 25 letters put the corresponding shift number: 1 below B, 2 below C, ... 25 below Z. Write on the left edge that this is the “Ciphertext Alphabet”. On a larger rectangular piece of paper write in lower case the letters of the alphabet abcdefghijklmnopqrstuvwxyz Also write the numbers 1 through 26 above the corresponding letters. Make sure to spread these letters at the same constant width that the ciphertext letters were spread. This paper should be entitled “Plaintext Alphabet”. Then below and just to the left of the a, and below and just to the right of the z, make incisions in the paper of the same height as the cipherstrip. (It also wouldn’t hurt to add an arrow entitled “key” on the plaintext page pointing to the spot under the plaintext a to remind us where the key will appear.) The cipherstrip can then be inserted into the plaintext piece and slid back and forth, making it easy to form the Caesar alphabet of choice. Simply choose a key number, slide the ciphertext strip so that this letter is under the plaintext a, and then read from plaintext to ciphertext and back. This allows us to relatively quickly and easily encipher and decipher using any key. Because of the way the Saint Cyr Slide is constructed, we may also use letters to indicate keys. The key as a number we understand: a key of 5 says move each letter five letters down the alphabet. It is actually more common to use a key letter. The key letter is the letter the plaintext a is enciphered into. So a key of R says that a will become R in the ciphertext (and b becomes S, c becomes T, etc.). In either case, we line up the key under the plaintext a and use the plaintext alphabet – ciphertext alphabet pair the Saint Cyr Slide displays.

Examples:

(1) Encipher Roman Holiday using key S. What shift amount is this? To encipher, first line up the cipherstrip’s first S under the a of the plain alphabet. Under S you should have written 18, so the shift amount is 18. Now encipher the title by finding the letters of roman holiday in the plaintext alphabet and replace them one by one with the letters of the ciphertext alphabet (underneath). You should see J under r, G under o and E under m. If so, keep going. If not, make sure you positioned your slide correctly and double check to see if you left out any letters of the alphabet on your slide! (2) Encipher Marc Antony, key 6. (3) Decipher PEXANEQO, key W. Line up the key letter W under the plaintext letter a. Remembering to read from ciphertext back to plaintext (“up”), you should see t above P and i above E. 14 CHAPTER 1. CAESAR CIPHERS

(4) Decipher RXRTGD, key 15.

(5) Decipher SQJE, key Q. 10



Examples: Here are a few instances of a Caesar cipher turning one word into another.

(1) Encipher wheel with key H.

(2) Encipher jolly with key T.

(3) Decipher SLEEP with a key R.

(4) Decipher KNOW with key 10. 11



1.2 Running Down the Alphabet

Now let us look at ciphers from a new direction. Suppose our enemy has sent the message PX PBEE FXXM TM FBWGBZAM and somehow we capture it. How can we determine what it says when we don’t have the key? A simple way is to “decipher” the message 26 times, using each possible key once. Hopefully the deciphered message will make sense using one and exactly one of those keys. If so, we will have decrypted the message by determining the message and key without being told. This is called the method of exhaustion, because we exhaust all possi- bilities. For Caesar ciphers this can be done quickly by running down the alphabet: under each ciphertext letter write the next 25 letters of the alphabet, following Z with A if need be. In at most twenty-five steps something legible will appear, and we will have determined the message. In Figure 1.2 we do this for the “captured” ciphertext. Reading across the rows, We will meet at midnight must be the plaintext. Since it took us seven steps forward to find the plaintext, and we usually move backwards in the alphabet to decipher, the key number must be -7 or 19.12 Why does running down the alphabet work? Because letters that are ad- jacent in the plaintext remain adjacent in the ciphertext. In our last example, the consecutive letters ghi appearing in midnight became ZAB or ABC or BCD,

10(1) JGESF ZGDAVSQ, (2) SGXI GTZUTE, (3) tiberius, (4) cicero, (5) cato 11(1) DOLLS, (2) CHEER, (3) bunny, (4) aden 12Yes, −7 = 19 when we are dealing with the 26 letters of the alphabet! We will explore this in Chapter 3. 1.3. FREQUENCY ANALYSIS 15

PX PBEE FXXM TM FBWGBZAM QY QCFF GYYN UN GCXHCABN RZ RDGG HZZO VO HDYIDBCO SA SEHH IAAP WP IEZJECDP TB TFII JBBQ XQ JFAKFDEQ UC UGJJ KCCR YR KGBLGEFR VD VHKK LDDS LHCMHFGS WE WILL MEET AT MIDNIGHT XF XJMM NFFU BU NJEOJHIU YG YKNN OGGV CV OKFPKIJV

Figure 1.2: Decrypting a Caesar cipher by running down the alphabet etc. There is no mixing of the alphabet – every letter moves the exact same distance. This is why Caesar ciphers are not very secure ones.

Examples: Decrypt the following by running down the alpha- bet.

(1) NQYFP YIWFI WE.

(2) MBZXK. 13



Since before long we will have ways of enciphering that have hundreds of keys, rather than only twenty-six, the method of exhaustion will be too exhausting to do by hand. To develop a better method for decrypting ciphers we must think more carefully about the ciphertext message we are trying to read.

1.3 Frequency Analysis

Let’s reconsider the situation in which we have captured a message written in cipher. What might we be able to figure out about this message without reading it? Well, first, we probably know that it is in English (as all of our messages will be) and we know a lot about English. For instance, that e, t, and a are all very common, and that x, z and q are among the least common letters in English. Abraham Sinkov, an important codebreaker during World War II, made a count of the letters appearing in 16410 words, and Figure 1.3 contains the results. What do we see in this chart? First, the two most common letters are e and t. Nearly 13% of all letters in an English text are e’s, and over 9% are t’s. Almost always, e or t will be the most common letter in a text.

13(1) twelve o’. This took six steps, so the key is −6 = 20. (2) Sorry, but it’s a trick. Either a shift of 3 to produce pecan, or a shift of 7 to give tiger. 16 CHAPTER 1. CAESAR CIPHERS

8.2 1.5 2.8 4.3 12.7 2.2 2.0 6.1 7.0 0.2 0.8 4.0 2.4 abcdefghijklm

6.7 7.5 1.9 0.1 6.0 6.3 9.1 2.8 1.0 2.3 0.1 2.0 0.1 nopqrstuvwxyz

Figure 1.3: Letter Frequency, in percent. From Sinkov.

The next most common letters are a, o, i, n, s, h, and r, each occurring between 6% and 9% of the time. With e and t, they make up 70% of all letters in English. So if you know which ciphertext letters stand for the letters e, t, a, o, i, n, s, h and r then you have probably already figured out 7 out of every 10 letters in a text. To remember these high frequency letters, use a mnemonic, like a sin to er(r), trainhose or satinhero. Of the remaining letters, d and l appear about 4% of the time each, while the letters cumwfgypb occur between 1.5% and 3% each, for a total of 20%. Finally vkjxqz each seldom occur, less than 1% of the time, and less than 2.2% in total. A useful way of representing Sinkov’s frequency information is with a fre- quency chart. Above each letter put one dash for each percentage the letter has in the frequency count. So above e will be 13 bars, and above v there will be only one. We have done this to produce Figure 1.4.

\ \ \ \ \ \ \ \ \ \ abcdefghijklmnopqrstuvwxyz

Figure 1.4: English Letter Frequency Chart

Several patterns appear in this so-called normal profile. As David Kahn so dramatically put it

“The single most durable and detectable feature of the normal profile is the long, low peneplane of uvwxyz, which extends almost a quarter of the profile and is extremely depressed. This basin is sharply walled off by the rst cordillera at one end and the single peak of a at the other. The other features of the profile are more easily eroded by decreases in size of sample. The pinnacle of e normally soars midway between a and the double tower of hi, which is followed by the severe drop to jk. High-frequency n and o also rise to twin peaks. In short samples, however, the troughs of the profile are often more reliable indicators than the crests”. [Kahn, page 210] 1.3. FREQUENCY ANALYSIS 17

A bit more concretely, a, e and i form a set of high frequency letters 4 steps or letters apart. Next, hi and no are high pairs and rst a high triplet. Finally, uvwxyz is a set of six very low values that directly follows the rst triplet and occurs directly before a. To see how these patterns are used let’s look at an example.

Example: Use frequency analysis to decrypt YPYH NBCM MBILN GYMMUAY XIYM HIN LYGUCH MYWLYN ZIL FIHA. We first construct a frequency chart by counting the number of times each letter appears:

\ \ \ ciphertext ABCDEFGHIJKLMNOPQRSTUVWXYZ

The most common letters are M and Y, so one of these is probably e. (Of course, e might instead be N, H or L, but these are less likely.) If M were e, we’d expect to find the aei-triple of high counts separated at intervals of four at IMQ. There are several I’s, but no Q’s. Worse, H would be z, and four z’s is unlikely. On the other hand, if Y were e, the aei-triple would be at UYC. This message is very short, so a “high”-triple is not all that high, and so UYC might be reasonable fit for aei. Where might rst fit? The highest consecutive triple is at LMN. This fits pretty well, since the low septet uvwxyz would fit at OPQRST. Finally, just before LMN is a high pair HI, so we might guess that HI is no. Putting this plaintext guess above the frequency chart will help us see if everything fits:

plaintext ghijklmnopqrstuvwxyzabcdef \ \ \ ciphertext ABCDEFGHIJKLMNOPQRSTUVWXYZ

The fit is good – the peaks at e, no, and rst, and the long valley at uvwxyz give us confidence we’ve done this correctly. Finally, the keyletter is the ciphertext letter that a becomes, so the key is U. Now we simply decipher to find that the message is Even this short message does not remain secret for long. 

A message’s frequency chart will seldom have all of the patterns, but it should have good evidence of several. When trying to fit the patterns generally we start with the tallest peaks of e and t, and the long low valley of uvwxyz. Then we look for the aei and rst triples. Finally, we see if the no pair is . If there are several “probably”’s when trying to fit these patterns, then the key likely is the ciphertext letter posing as a. Try this possible key by deciphering ten or so letters from the ciphertext. If a message seems to be appearing, we’ve 18 CHAPTER 1. CAESAR CIPHERS found the key. If no message appears, try again, repeating until you are able to decipher the message. This process of making a frequency count and then trying to fit the hills and valleys to the proper letters is called doing a frequency analysis. It is cer- tainly the most powerful tool in all of , the science of decrypting other people’s ciphers. In fact, we don’t even really need to be able to read the language of the cipher to use frequency analysis. Consider the frequency charts for French, German, Italian and Spanish found in Figure 1.5 [Nichols, page 70]. Each of these

\ \ \ \ \ \ \ \ \ \ \ \ French abcdefghijklm nopqrstuvwxyz

\ \ \ \ \ \ \ \ \ \ \ \ German abcdefghijklm nopqrstuvwxyz

\ \ \ \ \ \ \ \ \ \ \ \ \ Italian abcdefghijklm nopqrstuvwxyz

\ \ \ \ \ \ \ \ \ \ \ Spanish abcdefghijklm nopqrstuvwxyz

Figure 1.5: Letter Frequency Charts for Several Languages languages has its own characteristics, the extreme peak of e in German, or the predominance of vowels in Italian, for example. But in each case the specifics of the frequency chart allows a cryptanalyst to attack and break Caesar ciphers in that language. The use of frequency analysis, really nothing more than counting letters, makes Caesar ciphers very easy to decrypt, no matter the original language of the sender or the breaker. Having developed a method for decrypting Caesar ciphers, we are said to have broken that cipher system. Notice the difference between breaking and decrypting – breaking a cipher system means producing a method that allows you to decrypt most messages sent in that system. Conversely, one may be able to decrypt a particular message without knowing how to decrypt messages enciphered using that system in general. 1.3. FREQUENCY ANALYSIS 19

Examples: Use Frequency Analysis to decrypt the following Caesar ciphers.

(1) ZU SGQK G VXUVKX YKTZKTIK KBKXE RKZZKX SAYZ VRGE OZY VXUVKX XURK We begin by making a frequency count:

\ \ \ \ \ ciphertext ABCDEFGHIJKLMNOPQRSTUVWXYZ

The giant peak at K surrounded by a vast wasteland jumps out at us. If we guess that the six letters LMNOPQ are uvwxyz, then the peak of K is t, which would be fine, and RVXYZ=aeghi. Putting our guess at the plaintext alphabet next to the ciphertext alphabet we may compare.

\ \ \ \ \ plaintext? jklmnopqrstuvwxyzabcdefghi ciphertext ABCDEFGHIJKLMNOPQRSTUVWXYZ

The peaks and valleys seem reasonable. To test, we decipher the message. Since our prediction is that a became R, we use the key of R. Unfortunately, the message then begins id bpzt p edgetg. This is not right. We must try again. Perhaps the low sextet belongs at the beginning of the ciphertext alphabet. If we guess that K = e, then a must be G, and before G is a sextet of nothingness. This is likely uvwxyz. Let’s again add the plaintext alphabet to compare:

\ \ \ \ \ plaintext? uvwxyz abcdefghijklm nopqrst ciphertext ABCDEFGHIJKLMNOPQRSTUVWXYZ

The lmnop and rst sets are both relatively high since in such a short cipher appearing even three or four times counts as “many”. We ought to be quite confident of our plaintext placement. As always, the key is the letter a becomes, which is G. To finish, we decipher the first couple letters using a key G. This gives to make a proper, which seems correct. We’ve found the key! (Now finish up the decryption by deciphering the rest of the message using the key G.) 20 CHAPTER 1. CAESAR CIPHERS

(2) MVIP WVN JVTIVK DVJJRXVJ TRE JLIMZMV NYVE RKKRTBVU SP WIVHLVETP RERCPJZJ Here the frequency count is given to you:

\ \ \ \ ciphertext ABCDEFGHIJKLMNOPQRSTUVWXYZ Look for high aei and rst sets bracketing the low uvxwyz. (3) Decrypt the following message. Don’t be bothered by the five letter groups, the analysis remains the same. HMALY HDOPS LPAIL JVTLZ CLYFL HZFAV YLJVN UPGLD OHAAO LZOPM APZMY VTAOLMYLXB LUJFA HISL. (4) Here is a trickier one. HDBIH DYHDB SMUHI YEHLI HZEDD SXQHO HDBKH VODDO BCHSX HDROH WOCCK QO 14

 It has become tradition to transmit the ciphertext with word spacing elim- inated and the letters recombined in 5-letter segments. Doing this hides word beginnings and endings, which makes a cipher more difficult to decrypt. Fur- ther, it takes only a glance for a person to determine that a “word” consists of exactly five letters. In the days when messages were sent via telegraph, this helped prevent the person doing the actual transmission from accidentally for- getting letters from ciphertexts. (Why 5 rather than 4 or 6? The 1875 tariff regulations of the International Telegraph Union limited the length of a word to 10 letters [Kahn, pg 842]. Dividing a 10 letter word in half leaves two 5 letters words. Perhaps had these regulations limited word length to 8 or 12 then it would now be common to send ciphers in 4 or 6 letter segments.)

1.4 Linquist’s Method

Once a person has become comfortable with using frequency analysis to decrypt Caesar ciphers he/she needs only about 50 letters of the ciphertext to accurately determine the key. What if we have significantly less than this, say only 15 or 25 letters?15 Then we use Linquists’s method. [Gaines, page 133] Consider ZWUM PIA AMDMV PQTTA, a text of 18 letters. We start, as always, with the frequency count, and use it to determine the most common letters in

14(1) to make a proper sentence every letter must play its proper role, key = G. (2) very few secret messages can survive when attacked by frequency analysis, key = R. (3) after a while it becomes very easy to recognize what the shift is from the frequency table, key = H. (4) xtry xto xtrick xyou xby xputting xextra xletters xin xthe xmessage, key = K. 15This will often be the case later when we study the very important Vigen`erecipher. 1.4. LINQUIST’S METHOD 21 the ciphertext. In this example, the only letters occurring more than once are A, P and T, twice each, and M, which appears three times. Even for such a short message (when the frequencies can be quite messed up) it many of these letters must come from etaoinshr. Linquist’s idea is to see if there is one shift amount that would produce all or most of the most common ciphertext letters from the common plaintext letters. To do this, make a chart with the common plaintext letters across the top and common ciphertext letters down the side. For each of the etaoinshr letters determine what key produces the ciphertext letter from this plaintext letter. For instance, e must be shifted by 22 letters to become A, so we enter 22 in the e-th column and A-th row. To find the shift amount using the Saint Cyr slide, line up A under e and then see what the key is. Similarly, lining up M under e produces an 8, so 8 is the second entry in the first column.

e t a o n i r s h A 22 7 0 12 13 18 9 8 19 M 8 19 12 24 25 4 21 20 5 P 11 22 15 1 2 7 24 23 8 T 15 0 19 5 6 11 2 1 12

We are looking for one shift amount that would cause four of etaoinshr to become AMPT. So we are searching for numbers that appear in every row, or, failing that, occur in at least in three of the four rows. No number appears in each row, but 8, 12 and 19 appear in appear in three of the four rows. It is most likely that one of these three is the proper shift amount. We simply decipher the message using each to see which one it is.16

Example: Decrypt WZRVM ZOCSD YZNGA HVMXC using Linquist’s method. The only letters than appear more than once are C, M, V and Z, so our chart is set up as

e t a o n i r s h C 24 11 M 19 4 V 8 3 Z 25 17

Some of the chart has been filled in for you. Complete it, find the most common letter (i.e., the possible keys) and try them to determine the key. Finally, decipher the message.17 

16It is 8. Deciphering gives Rome has seven hills. 17Beware the Ides of March, key = 17. 22 CHAPTER 1. CAESAR CIPHERS

1.5 Summary

The Caesar Cipher, sometimes called the Shift Cipher, is history’s first example of a . In the Caesar version the keyletter indicates what ciphertext letter the plaintext a becomes, while in the Shift version the keynumber tells how many letters down the alphabet each plaintext letter is moved. The Saint Cyr Slide is a simple device that allows for rapid enciphering and deciphering of either version of this cipher. Decrypting a Caesar Cipher starts with a frequency count. If the ciphertext is sufficient, comparing the frequency count with that of standard English and looking for the characteristic peaks and valleys leads one to the key needed to decipher the cipher. When one has only a small amount of ciphertext, say under thirty letters, Linquist’s chart will generally give the key, albeit with a bit more work. Since we have methods that allow for the routine decryption of Caesar Ciphers we can be said to have broken the cipher. Caesar ciphers are very rigid; the alphabet is only shifted during enciphering and is not mixed. This is the reason this cipher system is easy to break. Despite their lack of security, they remain fundamental for almost all that we will do. The ideas of letter-for-letter replacement, keys, frequency count and frequency analysis will appear over and over again.

1.6 Topics and Techniques

1. Why would we use a secret writing system? Give at least two examples.

2. Why do we call it a Caesar Cipher? Why do we call it a Shift Cipher?

3. What is the difference between enciphering and deciphering?

4. What is a Saint Cyr Slide? How is it used to encipher? How is it used to decipher?

5. What is a key to a Caesar Ciphers?

6. Are letter keys different than number keys? How?

7. What is the meaning of the letter in a key letter?

8. What is the meaning of the number in a key number?

9. What does it mean to decrypt a message? How does this differ from deciphering a message?

10. What does it mean to break a cipher system? How does this differ from decrypting a message?

11. What is the Method of Exhaustion? Why do we call it that? 1.7. EXERCISES 23

12. What is the most common letter in English? Second most common? 13. What are the nine most common letters in English? 14. What are some of the least common letters in English? 15. What sorts of hills and valleys does the letter frequency count of a portion of normal English have? 16. How do we use a frequency count to break a Caesar Cipher? 17. Are Caesar ciphers secure or not? Why? 18. When would we use Linquist’s method? 19. How does Linquist’s method work? Why does it work?

1.7 Exercises

1. Encipher the following emperors’ names using a Caesar Cipher with the given key.

(a) Pompeii, key = D. (b) Vespasian, key = P. (c) Caligula, key = H. (d) Nero, key = T.

2. Encipher the following emperor’s names using a Shift Cipher with the given key.

(a) Damitian, key = 11. (b) Trajan, key = 5. (c) Hadrian, key = 19.

3. Decipher the following names. They have been enciphered using a Caesar cipher with the given key.

(a) GULWOM UOLYFCOM, key = U. (b) IRGAJOAY, key = G. (c) TYESBUJYQD, key = Q. (d) HFXXNZX, key = F.

4. Decipher the following names. They have been enciphered using a Shift Cipher with the given key.

(a) RDCHIPCIXCT, key = 15. 24 CHAPTER 1. CAESAR CIPHERS

(b) MXVWLQLDQ, key = 3. (c) QAGNGM YDPGAYLSQ, key = 24. (d) GVOREVHF TENPPUHF, key = 13.

5. Encipher the following words using a Caesar Cipher with the given key.

(a) Appian Way, key = F. (b) Punic Wars, key = B. (c) Carthage, key = N. (d) Gladiator, key = W.

6. Encipher the following words using a Shift Cipher with the given key.

(a) Atrium, key = 1. (b) Colosseum, key = 25. (c) Po Valley, key = 18. (d) Equestrian Order, key = 9.

7. Decipher the following. They have been enciphered using a Shift Cipher with the given key.

(a) OMFMOAYNE, key = 12. (b) TJNXWNVM, key = 19. (c) LHAXEWJ, key = 22. (d) DBSLEXO, key = 10. (e) GDBPC ATVXDC, key =15.

8. Decipher the following. They have been enciphered using a Caesar Cipher with the given key.

(a) VKDSEW, key = K. (b) VTKTGOG, key = V. (c) RCVTKEKCP, key = C. (d) OJSBHWBS, key = O. (e) AYPBTCPYHAL, key = H.

9. Decipher the following. They have been enciphered using a Caesar Cipher with the given key.

(a) QSBHIFWCB, key = O. (b) CQN ADKRLXW ARENA, key = J. (c) ZKBYQD SQBUDTQH, key = Q. 1.7. EXERCISES 25

(d) VTXLTKBTG LXVMBHG, key = T. 10. A part of a message of Caesar’s to Cicero (in Latin!) was MDEHV RSNQNRQNV PHDH XHVXNPRQNZP. Decipher it. (Remember, Latin has no J, K, W or Y.) 11. Use frequency analysis to decrypt the following messages.

(a) VG ZNL OR EBHAQYL NFFREGRQ GUNG UHZNA VATRAHVGL PNAABG PBAP- BPG N PVCURE JUVPU UHZNA VATRAHVGL PNAABG ERFBYIR RQTNE NYYRA CBR (b) DHVIV BZYVA OZMNJ HZRJM MTOJN JGQZO CZHZN NVBZV IYQZM TAZRO CDIBN DIVAO ZMGDA ZBVQZ HZVNH PXCKG ZVNPM ZVNYD YOCZP IMVQZ GDIBJ AOCVO XJYZC VMMTC JPYDI D (c) STHTI JTWHN UMJWN XNRUW JLSFG QJYTF YYFHP ZSQJX XNYNX NSXTQ ZGQJG DYMJN SAJSY TWMNR XJQK 12. Use frequency analysis to decrypt the following messages.

(a) RKXXSLKV KXXYEXMON DYNKI DRKD RO KXN DRO OVOZRKXDC GYEVN LO KBBSFSXQ SXBYWO CYWODSWO XOHD GOOU (b) GFWSZ RFSST ZSHJI YMFYM JFSIG FNQJD BTZQI FZINY NTSYM JJQJU MFSYX KTWUT XXNGQ JNSHQ ZXNTS NSYMJ NWHNW HZX 13. Decrypt the following ciphertexts using Linquist’s method.

(a) DROOX OWIUX YGCDR OCICD OWLOS XQECO NMVKE NOCRK XXYX (b) WUDJB UCUDT EDEJH UQTUQ SXEJX UHICQ YB This is a quote of Henry L. Stimson, President Hoover’s Secretary of State, writing later about his shutting down the US Black Chamber, the government cryptoanalytic service, in 1929. (c) IJXMT KOJBM VHRVN ZQZMN JGQZY WTNOV MDIBV ODO (d) BJUUH BNUUB NJBQN UUBKH CQNBN JBQXA N. (e) FHWDU YTLWF RNXFR JXXFL JBWNY YJSNS HNUMJ W (f) KVVDR ODRSX QCSXD ROGYB VNMYX CDSDE DOKMS ZROB (g) “The goal of a cryptosystem is to: BPZTI WTBTH HPVTX CRDBE GTWTC HXQAT IDJCP JIWDG XOTSE TGHDC H 14. (a) What happens if we use a Shift Cipher with a shift of 26? (b) What happens if we use a number larger than 26 for the shift amount?

(c) How many different Shift Ciphers are there? 15. According to David Gaddy, “on 1 January 1863 while touring in the west, [Confederate] President [Jefferson] Davis sent [a person a] telegram which used a simple slide of one” [Gaddy]. If the telegram began TFDSF UBSZP GXBSKBNF TBTFE EPOSJ DINPO E, to whom was it for? 26 CHAPTER 1. CAESAR CIPHERS

16. The Confederate Army used several rather insecure cipher systems during the Civil War [Gaddy]. For example,

“To Gen. Beauregard, 11 March 1862: Sir: Your dispatch just received. The day of the month on which it is written will indicate the letter of the alphabet corresponding with A. Yesterday 10th J-A. I repeat it that we may know if the operator conveyed it correctly. EFNTI FJJZE XIZMV IIVRI FEJRK LIURP NYVIV TREKY VK- IFF GJAFZ EPFLN ZKYLK DFJKV WWVTK On the 27th of the month “A” will correspond to “C”. Gen. Albert Sidney Johnston (Decataur)”

(a) Use frequency analysis to find the key and decrypt the message. (b) What date was it sent on?

17. Some years after the Civil War, Joseph Willard Brown described the fol- lowing incident [Brown, pages 210–2].

“One [message] from Gen. Beauregard just after the battle of Shiloh, (6-7 April 1862) giving the number and condition of his forces at Corinth, was formed by merely putting the last half of the alphabet first, that is, substituting M for A, N for B, O for C, etc. This dispatch fell into our hands and first reached Richmond in a northern newspaper.”

Perhaps this quote refers to a long (5000-ish) word report of Beauregard’s sent on April 11th. Here is different, brief report of Beauregard’s, writ- ten during the battle, that has been enciphered via this method. Please decipher it. IQFTU EYADZ UZSMF FMOWQ PFTQQ ZQYKU ZEFDA ZSBAE UFUAZ UZRDA ZFARB UFFEN GDSMZ PMRFQ DMEQH QDQNM FFXQA RFQZT AGDEF TMZWE NQFAF TQMXY USTFK SMUZQ PMOAY BXQFQ HUOFA DKPDU HUZSF TQQZQ YKRDA YQHQD KBAEU FUAZX AEEAZ NAFTE UPQET QMHKU ZOXGP UZSAG DOAYY MZPQD UZOTU QRSQZ QDMXM EVATZ EFAZI TARQX XSMXX MZFXK XQMPU ZSTUE FDAAB EUZFA FTQFT UOWQE FARFT QRUST F 18. Here are quotes in French. Use frequency analysis to decrypt them.

(a) FU ALUHXYOL XY FBIGGY YMN ALUHXY YH WY KOCF MY WIHHUBUNCN GCMYLUVFY OH ULVLY HY MY WIHHUCN JUM GCMYLUVFY. (b) DXOLH XGHPH SODLQ GUHGH FHTXH ODURV HDGHV HSLQH VMHPH IHOLF LWHGH FHTXH OHSLQ HHVWV XUPRQ WHHGH URVHV HWGHF HTXLH OHEXL VVRQS RUWHG HIOHX UV. (c) VY AL N CNF ZBVAF QVAIRAGVBAF N OVRA NCCYVDHRE HAR CRAFRR DHR YBA GEBHIR QNAF HA YVIER DHN UNGRGER YR CERZVRE NHGRHE QR PRGGR CRAFRR. 1.7. EXERCISES 27

(d) IKSUT JKIOT KYZWA ATKUK ABXKI USOWA KUAIN GIATL GOZYK YXNGZ URKYJ OLLKX KTZYR GYAXR GYIKT KKTNG HOZJX GSGZO WAKHX ORRKT ZVXKR GZYSO TOYZX KYIUT WAKXG TZYVU AXTUA YBORV KAVRK GYYOY GADJK XTOKX YXGTM YZXUA VKLAZ ORKKZ JKYMX GTJYX KHAZK KVGXT UAYJK THGYR GVOKI KKYZK IUAZK K.

19. Here are several quotes in German. Use frequency analysis to decrypt them. (a) LQM XWTQBQS QAB SMQVM EQAAMVAKPINB EQM DQMTM LMZ PMZZMV XZW- NMAAWZMV AQKP MQVJQTLMV AWVLMZV MQVM SCVAB. (b) XL BLM UXLLXK WTL ZXKBGZLMX WBGZ OHG WXK PXEMD TEL XBGX ATEUX LMNGWX YNK ZXKBGZ ATEMXG. (c) YCQBJ UHLUH IJUXJ CQDRU IIUHT YUKDW BKSAI VQBBU PKLUH XKJUD YDTUH ZKDWU DTIYU PKUHJ HQWUD. 20. Here are several quotes in Italian. Use frequency analysis to decrypt them. (a) LQR JLLNLJCX MJUU JVKRIRXWN BR LXWMDLN RW UDXPX MXEN WXW YDX YRD JUCX BJURA N YXR LXW VJBBRVX MJWWX MR LJMNAN WNLNBBRCJCX. (b) XTDPCL WL GZWRLCP P NTPNL RPYEP NSP AZY BFT DFP DAPCLYKP TY NZDP ELWT NSP W EPXAZ WP YP AZCEL DT CPAPYEP. (c) ES TWSLJAUW KA TWDDS W JAVWFLW EA KA EGKLJG UZW LJS IMWDDW NWVMLW KA NMGD DSKUASJ UZW FGF KWYMAJ DS EWFLW. 21. What is the question that goes along with this quote? Think frequency analysis. Upon this basis I am going to show you how a bunch of bright young folks did find a champion; a man with boys and girls of his own; a man of so dominating and happy individuality that Youth is drawn to him as is a fly to a sugar bowl. It is a story about a small town. It is not a gossipy yarn; nor is it a dry, monotonous account, full of such customary “fill-ins” as “romantic moonlight casting murky shadows down a long, winding country road.” Nor will it say anything about twinklings lulling distant folds; robins carolling at twilight, nor any “warm glow of lamplight” from a distant cabin window. No. It is an account of up-and-doing activity; a vivid portrayal of Youth as it is today; and a practical discarding of that worn-out notion that “a child don’t know anything.” 22. Use frequency analysis to decrypt the following Caesar ciphers. They are a bit trickier. (a) W LWNWCNWLD SEPD JK AO EO ZEBBEYQHP PK YKJOPNQYP WJZ YKJBQOEJC PK YNULPWJWHUV (b) BLABX LOMZF DKFTL FDUOW ARDLB XMOUZ SMHLD KBABG XMDXL FFLDI UFTAZ LFTMF UEZAF ZLMDX KMEBA BGXMD NGFMO MDLRG XRDLC GLZOK MZMXK EUEOM ZEFUX XPLOA PLFTL YLEEM SL 28 CHAPTER 1. CAESAR CIPHERS

(c) TWIPG TDUTG PCHBX TTXCV AITTI GHLXT WDJTA ITTXC VDTWI GHZCD LLWPT TWDHI AITTI GHPGI BPCNT XBIHX CKDAK IHTGX RZH (d) GDQDIB JM IJO GDQDIB OCVO DN RCVO D VNF DA ODN V NOVHK JA CJIJPM OJ NPWHDO OJ NGDIBN VIY VMMJRN RVAOY PN WT DGG RDIYN JM WMVIYDNC VMHN VB VDINO V AGJJY JA VAAGDXODJIN RCDXC WT JPM JKKJNDODJI DN NPWYPY YTDIB YMJRNDIB RVFDIB IJO

7 7 6 23 0 2 7 3 15 19 3 0 9 18 14 6 2 7 0 3 0 15 5 2 9 0 ABCDEFGHIJKLMNOPQRSTUVWXYZ

Here are two ciphertexts to be decrypted. They are not Caesar Ciphers, but they are Caesar-like.

(a) ZM ZNYZHHZWLI SZH ML MVVW LU HKRVH SRH XSZIZXGVI RH ZODZBH HZXIVW (b) The following was written by George Washington in 1799, describing to his spy Samuel Woodhull the invisible ink he was sending him. Use careful frequency analysis to determine how to decrypt this ci- pher. (You do not necessarily need to decipher the entire thing, but decipher the first several words and explain how you determined how to decipher it.)

ETTLX AIXWL AWRUW RQIXE KA(WR BAABE TTLXE LLXAN AWMER GPNQM PACLQ ZYALL WRYMQ QR)WM MARLW RPXWE TRQ1D GCQTI ADDLX ATWOK WBWRR Q2WML XACQK RLANP ENLIX WCXNA RBANM LXAQL XANJW MWDTA GDIAL LWRYL XAPEP ANIWL XEZWR ADNKM XEZLA NLXAZ WNMLX EMDAA RKMAB ERBWM BNG

28 7 4 7 12 0 4 0 6 1 5 22 13 12 1 6 10 17 0 8 1 0 21 17 3 4 ABCDEFGHIJKLMNOPQRSTUVWXYZ Chapter 2

Cryptologic Terms

Cryptography is the art or science of designing methods to send secret mes- sages. (In Greek crypt or kryptos means “secret” or “hidden” and graphy means “writing”). Cryptologia, meaning “secrecy in speech”, and Cryp- tographia meaning “secrecy in writing” were used first by John Wilkens in his 1641 book “, of the Secret and the Swift Messenger,” the first En- glish book about cryptography. Wilkins went on to be the first secretary of the Royal Society, which he co-founded with John Wallis. The words took their modern forms, cryptology and cryptography, in 1645 and 1658, respectively.

Cipher or cypher comes from the same Arabic word that also provides the root for zero, and has been used to mean a secret manner of writing in English since the 1500’s. Once we have a cipher algorithm, telling us which cipher is going to be used and how to use that cipher, and a key, we have a cipher system or cryptosystem. For example, in the Caesar cryptosystem, the algorithm is “shift the alphabet by a given amount” and the key is the amount. The sender composes the plaintext, the original message which understandable to all, uses the key to encipher it into the ciphertext, the “secret” version of the message that is, hopefully, understandable only to those who have the key and are able to decipher the message back into plaintext. The word cryptanalysis was coined by William Friedman in 1923 and is the name for the art or science of reading another person’s message without the key. One decrypts a message and breaks a cipher system. While cryptography held its meaning, cryptology no longer refers to speech but is the name for the subject that combines cryptography and cryptanalysis.1 It has become traditional to attach the names Alice, Bob and sometimes Eve

1It will be clear that each of our secret messages is indeed a secret message. That is, we will not worry about keeping the existence of the secret message secret, but rather about keeping the meaning of the message secret. Stenanography (Greek steganos, meaning covered, ap- parently first used by Trithemius in 1499) is the study of methods to keep the very existence of the secret message secret, and is clearly a very interesting subject to spies.

29 30 CHAPTER 2. CRYPTOLOGIC TERMS to cryptography. As is seen in Figure 2.2, Alice takes plaintext and enciphers it to produce ciphertext that she then transmits. Bob, the intended receipient, presumably has the knowledge needed to decipher the ciphertext to obtain his copy of the plaintext. Eve, the eavesdropper, is without this knowledge, and so will try to break the .2

Figure 2.1: Alice, Bob and Eve: the three names of Cryptography

A cipher system is somewhat analogous to a high-school locker. That system consists of (1) a lock of some sort (a padlock or a keyed lock) with instructions on how to use such a device (turn the knob, turn the key), (2) the exact information of how to lock and unlock (the combination, the correct key), and the actual locker spacing holding books or gym shorts. In a cipher system (1) is the “algorithm”, (2) is the “key” and (3) is the message. Traditional are like the keyed lock system in that the en- ciphering and deciphering methods are very similar, once you have the key. (Hence the name “key”.) One must have the key to either encipher or decipher. Further, if you can either encipher or decipher, then you, in fact, must have the key and can do both. This differs from the padlock system, in which one only needs the key (for example, 24 left - 12 right - 19 left) to open the lock, but anyone can close it. Later we will study several modern cryptosystems that work in this way. Although it perhaps seems intuitively obvious that if someone can encipher a message they should also be able to decipher it, that is, in fact, not always the case. But this must wait for a later chapter. Experts differentiate between ciphers, when the message is made secret letter-by-letter a→D, b →E, c→F, etc, and codes, where the message is made secret word-by-word, bad→1211, ball→3214, bat→4790, etc. (or, occasion- ally, phrase-by-phrase). So one should speak and write of enciphering and deciphering when using a cipher, and encoding and decoding when using a code. However, code is much more pervasive in English than cipher, and people tend to use encoding or decode, when they really means enciphering and decipher. For this reason, books and movies that involve “Breaking the Code” probably are actually about breaking a cipher.

2Sometimes one speaks of Oscar, the opponent, rather than Eve. Chapter 3

The Introduction of Numbers

“We shall see that cryptography is more than a subject permitting mathematical formulation, for indeed it would not be an exaggeration to state that abstract cryptography is identical with ab- stract mathematics.” A. Adrian Albert, 1941 Professor of Mathematics University of Chicago

The Caesar Ciphers we studied in Chapter 1 were very easy to use, and seemed to offer some secrecy. But we quickly found several ways to decrypt any message enciphered in this manner. What should we try to regain some secrecy? The Caesar method of “shift by three” or “add three” having failed, to regain some secrecy we are going to try “multiply by three.” Before we can study this new method, we need to first carefully analyze our Caesar ciphers. Suppose our message is Stop, Turn Back. To encipher it with a key of 3=D, we simply replace each letter in our message with the one three letters further down the alphabet. To be very explicit, we think of which position of the alphabet each letter is in (as in Figure 3.1), add three to that number, and use the letter in this latter number’s position. Of course, since we are comfortable with this process, we generally simply say “shift three” or “add three”.

abcdefghijklmnopqrstuvwxyz 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Figure 3.1: The Standard Translation of Letters into Numbers

31 32 CHAPTER 3. THE INTRODUCTION OF NUMBERS

Example: Encipher Stop, Turn Back with a shift of 3. We make a chart, giving the plaintext letters and their numerical equivalents. Then we add 3 to each number and translate back into letters:

plaintext stopt urnback plainnumbers 19 20 15 16 20 21 18 14 2 1 3 11 ciphernumbers 22 23 18 19 23 24 21 17 5 4 6 14 ciphertext VWRSWXUQEDFN

The ciphertext is VWRS WXUQ EDFN. 

What if we had shifted by a larger number?

Example: Encipher Stop, Turn Back with a shift of 7.

plaintext stopturnback plainnumbers 19 20 15 16 20 21 18 14 2 1 3 11 ciphernumbers 26 27 25 26 27 28 25 24 9 8 10 18 ciphertext Z?WX??YXIHJR

Which letters are the 27th and 28th of the alphabet? Since 27 follows 26, and 26 represents Z, we start over again, and so the 27th letter of the alphabet is A, the 28th is B, and so on. So a shift of 7 gives the ciphertext ZAWX QBYX IJHF. 

We can now (tentatively) define the mathematical version of Shift ciphers.

Mathematical Caesar Ciphers: To use a Caesar cipher 1) Convert plainletters and keyletter into plainnumbers via Figure 3.1. 2) Add the keynumber to each plainnumber. 3) For plainnumbers larger than 26, replace 27 by A, 28 by B, 29 by C, etc. 4) Convert the ciphernumbers into cipherletters.

Example: Encipher Julius with a shift of 9.

plaintext j u l i u s plainnumbers 10 21 12 9 21 19 ciphernumbers 19 30 21 18 30 28 ciphertext SDURDB

The ciphertext is SDURBD. 

With the translation between letters and numbers now clear, we return to our goal of changing from “adding three” to “‘multiplying by three.” 3.1. THE REMAINDER OPERATOR 33

Example: Encipher Stop Turn Back with a multiplication of 3. We start as usual, by listing the letters of the message as well as their numerical values. But rather than adding three, we multiply these values by three.

plaintext stopturnback plainnumbers 19 20 15 16 20 21 18 14 2 1 3 11 ciphernumbers 57 60 45 48 60 63 54 42 6 3 9 33 ciphertext EHSVHKBPFCIG

Turning this ciphernumbers back into letters takes a bit of work. Replacing 6 by F and 3 by C was easy. Replacing 42 and 45 by G and J demands some counting, and figuring out what to do with 54, 57, 60 and 63 means we have to count through the alphabet two-and-a-half times. It’s a good thing we only multiplied by 3 and not 13! The answer is EHSV HKBP FCIG. 

Was all the counting worth it? Remember that a Caesar cipher is a weak cipher because of a lack of mixing – letters that are adjacent in the alphabet move to letters that are also adjacent. In the two Stop Turn Back examples the letters rstu from the message became UVWX and YZAB, respectively, and abc similarly became DEF and HIJ. (In fact, I picked these words because of the many consecutive letters.) A Caesar cipher succeeds only in shifting the aei, no, and rst-uvwxyz patterns, but does not destroy them. A better cipher should destroy these patterns and move the letters away from their neighbors. And our multiplication example did exactly this: the letters rstu became BEHK, and abc became CFI. Based on this one example, it appears that multiplication will provide more security than addition.1 So multiplication can produce a cipher system that is better than the Caesar ciphers – if we can find a quick way of turning large numbers into the equivalent letters of the alphabet. This must be our next goal.

3.1 The Remainder Operator

Which ciphernumbers will be converted to the cipherletter A? Of course 1. Also 27, since 27 is one more than once through the alphabet. Similarly 53 which is one more than twice through the alphabet. And 79, 105, ... In fact, we could

1You may have wondered why Cryptology belongs to the field of Mathematics, rather than that of English. After all, a cipher is supposed the hide the meaning of the message, and surely English is about detecting the meanings of letters and words. Well, we’ve just seen the answer. The translation a=1, b=2, c=3, ... z=26, turns the subject of Cryptology into an area of mathematics. Similarly, once we’ve tried “add three” and “multiply by three”, almost any mathematical device that turns one number into another may be tried out as a enciphering method. 34 CHAPTER 3. THE INTRODUCTION OF NUMBERS do this for every letter of the alphabet to come up with a table like

A = 1 = 27 = 53 = 79 = ... B = 2 = 28 = 54 = 80 = ... C = 3 = 29 = 55 = 81 = ... . . Y = 25 = 51 = 77 = 103 = ... Z = 26 = 52 = 78 = 104 = ...

We could continue this chart to handle the numbers like 195 and 208 that will occur when we encipher by multiplying by 13, but this seem quite cumbersome. Further, although it may not be readily apparent, there are really two ques- tions lurking here. The first is one of conversion: how can we convert a number, say 54, into a number from 1 to 26 to see which letter it represents? The second is one of equivalence: the numbers 54 and 28 and 2 are all equivalent because they all represent the letter b. What does this equivalence mean? Let’s start with the conversion. Converting 54 to 2 is an operation, just as addition and multiplication are operations. What is involved is the number being operated on, 54, and the number doing the operating, which is 26 as there are twenty-six letters in the alphabet. To represent this operator we will borrow the symbol % from various programming languages and use it in the form 54%26 = 2. This is pronounced “fifty-four modulo twenty-six equals two.” Now how did we find the 2 from the 54? We knew that 28 and 2 both represent b, since 28 is 2 more than once through the alphabet. Numerically, 28 = 2 + 26 or 28 − 26 = 2. Similarly, 54 = 28 + 26 is one more alphabet than 28, which was b. So 54 must also be b. Or, working backwards, 54 − 26 = 28, 28 − 26 = 2 and 2 = b, so 54 = b. Hence we found 2 from 54 by repeatedly subtracting 26’s until we obtained a number no bigger than 26. Of course, we can do this with any number.

Examples: Compute the following numbers.

(1) 32%26 =? Since 32 − 26 = 6, the answer must be 6.

(2) 39%26 =?

(3) 55%26 =? Subtracting once gives 55−26 = 29. Since 29 is larger than 26 we subtract again, 29 − 26 = 3. So 55%26 = 3.

(4) 79%26 =?

(5) 144%26 =? 2



2(2) 13, (4) 1, (5) 14. 3.1. THE REMAINDER OPERATOR 35

Performing the %26 operation is not particularly difficult, but what if the number we are trying to reduce is 48924? Then repeatedly subtracting 26 is rather unappealing. Fortunately, we are not interested in the number of times we subtract 26, but rather in the number that is remaining when we are done. Just as multiplication is addition done quickly (4 × 3 just means 4 sets of 3, or 3 added to itself 4 times), subtracting 26 many times is very closely connected to dividing by 26. To illustrate, we divide 85 by 26 using the form we learned as children:

3 r = 7 √ 26 85 78 7

In multiplicative form this says 85 = 3 × 26 + 7. The 3 represents the quotient, how many times 26 goes into 85, and 7 is the remainder. Compare this with the subtraction method. 85%26 =? Subtracting once: 85 − 26 = 59, and 59 is larger than 26. Subtracting again: 59 − 26 = 33, still too large. Once more 33 − 26 = 7. Small enough. So 85%26 = 7. Notice that the quotient 3 is the same as the number of subtractions, and that the remainder 7 is the same as the modulus answer. Let us try again with 109%26.

4 r = 5 √ 26 109 104 5 so 109 = 4 × 26 + 5. On the other hand,

109 − 26 = 83 83 − 26 = 57 57 − 26 = 31 31 − 26 = 5.

So 109%26 = 5. Again the quotient indicates how many 26’s need to be re- moved, and the remainder in the division gives the same result as the remainder operator. For numbers larger than about fifty the division method is generally quicker 36 CHAPTER 3. THE INTRODUCTION OF NUMBERS than the subtraction method. For example, to compute 219%26:

8 r = 15 √ 26 219 208 15

So 219%26 = 15 in much less time than 8 subtractions would have taken. Of course, the only people that do division in this form are school children – the rest of us use a calculator. With a calculator we see 85 ÷ 26 = 3.2692 and 109÷26 = 4.1923 (rounding to four digits). The 3 and 4 represent the quotients. How do the fractional portions, .2692 and .1923, represent the remainders? Since 85 = 3 × 26 + 7, when we divide by 26 we have 85 7 = 3 + = 3.2692 26 26 7 So = .2692 or 7 = .2692 × 26. Similarly, 109 = 4 × 26 + 5 so 26 109 5 = 4 + = 4.1923. 26 26 5 Thus = .1923 or 5 = .1923 × 26. 26 Notice that in both examples %26 gave the remainder when dividing by 26. For this reason we call % the remainder operator. We perform %26 by first dividing by 26 and then multiplying the decimal part of the answer by 26.

Examples: Compute the following remainders.

(1) 121%26 =? 121 ÷ 26 = 4.6538. The decimal is .6538, so the remainder is .6538 × 26 = 17. Answer: 121%26 = 17.

(2) 888%31 =? What if the modulus is not 26? Then divide and later multiply by what- ever the modulus is. 888 ÷ 31 = 28.6451, so the decimal is .6451 and remainder is .6451 × 31 = 20. Answer: 888%31 = 20.

(3) 624%17 =? 624 ÷ 17 = 36.7059 has decimal .7059 and .7059 × 17 = 12. Answer: 624%17 = 12.

 3.1. THE REMAINDER OPERATOR 37

As a bit of a time saver, notice that after doing the division you do not need to clear the screen (on your calculator) before punching in the decimal, but can instead subtract off the quotient. For example, redoing 888%31, we have 888 ÷ 31 = 28.6451. Subtract 28 to leave .6451 which we then multiply by 31.3

Examples: Compute the following remainders.

(1) 424%26 =? 424/26 = 16.3077. Subtracting 16 leaves .3077. Answer is .3077 × 26 = 8.

(2) 58%7 =? 58/7 = 8.2857 → .2857 × 7 = 2. Answer: 2.

(3) 101%11 =?

(4) 2045%21 =?

(5) 48924%30 =? 4



What about −29%12? Following our procedure (divide by 12, discard the quotient, multiply by 12) gives −5. Since the goal was a small positive number, we now simply add 12. So −29%12 = 7. In the shortcut notation, −29/12 = −2.4167 → −2.4167 + 2 = −.4167 → −.4167 × 12 = −5 → −5 + 12 = 7.

Examples: Compute the following remainders.

(1) −89%23 =? −89/23 = −3.869 → −3.869 + 3 = −.869 → −.869 × 23 = −20 → −20 + 23 = 3. Ans: 3

(2) −134%17 =?

(3) −748%73 =? 5



Let’s summarize our work.

3The presence of division, and the comment about rounding, might make one concerned about the validity of this algorithm. Indeed, calculators only hold so many digits, so error is possible, especially when a is very large. These are valid concerns, but ones we will not worry about. 4(4) 12, (5) 2, (6) 8, (7) 24 5(2) 2, (3) 55 38 CHAPTER 3. THE INTRODUCTION OF NUMBERS

The Remainder Operator: The symbol % is called the remainder operator. For a positive number n, and any number a, a%n is shorthand for the remainder when a is divided by n. We pronounce a%n as “a modulo n” or “a mod n” and say that n is the modulus. To compute a%n: 1) Divide a by n. 2) Remove the integer part of this quotient. 3) Multiply this number by n. 4) If the resulting number is negative, add n. 5) The final number (perhaps rounded up or down to the obvious integer) is the answer a%n.

3.2 Modular Arithmetic

The translation of 54 to b had two related questions: how and what. Our work with the remainder operator answered how, so we now turn to what it means that 2 and 28 and 54 all represent b. Or, perhaps more accurately, the consequences of saying that 2 and 28 and 54 are all “equal.” We say that two numbers A and B are equivalent if they represent the same letter. With our work on %, we now know this means that A%26 = B%26. This last mathematical statement is a bit clunky and fortunately we have a replacement available. It was the great scientist Carl Fredrick Gauss6 who chose the symbol ≡ to use in this situation: “We have adopted this symbol because of the analogy between equality and congruence.” Instead of A%26 = B%26 we will write A ≡ B (mod 26). The ≡ is the equivalence symbol, and this whole equation is pronounced “a is equivalent to b modulo 26” or “A is congruent to B modulo 26.” In 1801 Gauss published his great work on Number Theory, Disquisitiones Arithmeticae. It begins Section I. Congruent Numbers in General 1. If a number a divides the difference of the numbers b and c, b and c are said to be congruent relative to a; if not, b and c are noncongruent. The number a is called the modulus. . . . Henceforth we shall designate congruence by the symbol ≡, joining to it in parentheses the modulus when it is necessary to do so; e.g. −7 ≡ 15(mod.11), −16 ≡ 9(mod.5).

6Carl Friedrich Gauss (1777-1855), “The of Mathematicians,” is probably the great- est mathematician of all time. Simply a list of his major accomplishments would take most of this page, so we will restrict ourselves to a varied few. He gave the first proof the the Law of Quadratic Reciprocity, the highlight of classical number theory; he calculated the orbit of the asteroid Ceres, inventing the method of least-squares to do so; he invented the heliotrope while surveying the state of Hanover; he founded the mathematics subjects of differential geometry and non-Euclidean geometry; and invented a primitive telegraph device. Of all his work, Gauss was most proud of his discovery that a regular 17-gon could be drawn using only a compass and straight-edge, apparently asking that a heptadecagon be carved on his tomb- stone. This was the first advance in this field since the time of the Greeks, and came on March 30th, 1796 when Gauss, only 18 at the time, was deciding between a career in mathematics or philology. 3.2. MODULAR ARITHMETIC 39

While % and ≡ are very closely related, they have an important difference: % performs an operation and ≡ is a statement. The symbol % is an operation, like addition. It turns two numbers into a third. But ≡ is a true/false statement, like equals. The mathematical statement 7 = 9 is a false one, while 7 = 9 − 2 is a true one. Similarly, 28 ≡ 2 (mod 26) is true and 29 ≡ 2 (mod 26) is false. If A%n = B then A ≡ B (mod n). For example, 78%10 = 8 so 78 ≡ 8 (mod 10). Conversely, if A ≡ B (mod n), then A%n = B%n. For example, 31 ≡ 17 (mod 7) since 31%7 = 3 = 17%7. Notice, however, that 31%7 6= 17 and 17%7 6= 31. So the “equivalence modulo” statement, ≡ (mod n), is slightly weaker than “ equal remainder,” %n =. For the next several chapters the remainder operator will dominate and it will be a while until we see how powerful equivalence is. But since we’ve done most of the work, let us state the following.

Theorem 1 Suppose A and B are any integers, and n is a positive integer. We write A ≡ B (mod n) if any of the following equivalent statements are true:

1. A%n = B%n.

2. A and B have the same remainder when divided by n.

3. n divides B − A with remainder 0.

4. A and B differ by a multiple of n.

Perhaps the language in the theorem is a bit unfamiliar. By “equivalent statements” we simply mean that when numerical values are substituted for A, B and n, then either all of the statements will be true, or all of them will be false. In less formal language, each statement contains the same information, they just present it in different ways. For example if A and B have the same remainder when divided by n, then A − B will have no remainder. So A − B will be a multiple of n, or A and B will differ by n. Doing algebra using ≡ is very similar to the algebra you are used to. In fact, +, −, × and the Associate, Commutative, and Distributive Rules all work using ≡ just like they always did with =. Division, however, is more complicated, as the following examples show.

Examples: Examples of Division in Modular Arithmetic.

(1) 3x ≡ 9 (mod 7) has the usual solution x = 3.

(2) 3x ≡ 9 (mod 12) has the usual solution x = 3, but also x = 7 and x = 11.

(3) 3x ≡ 8 (mod 7) has the unusual solution x = 5.

(4) 3x ≡ 8 (mod 12) has no solutions.

 40 CHAPTER 3. THE INTRODUCTION OF NUMBERS

Why this happens is a very interesting question, but one that we will leave unanswered for the moment. What we do need to take away from this example instead is simply that division is more complicated in modular arithmetic. That is not to say that it cannot be done, but rather that we will have to be careful when we do it.

3.3 Decimation Ciphers

Before getting caught up with remainders and equivalence we were trying to build a cipher built on multiplication rather than addition, but had difficulty with translating the ciphernumbers back into cipherletters. With our work with the remainder operator, this is now easy.

To use a Decimation Cipher: 1) Choose a proper keynumber k. 2) Convert the plaintext to plainnumbers. 3) Multiply each number by k to produce ciphernumbers. 4) Find the remainder %n of each ciphernumber. 5) Convert the reduced cipher numbers back into letters.

(We will study the meaning of “proper” in Chapter 4.)

Examples: Enciphering using a Decimation7 Cipher.

(1) Encipher About Face using k = 5. (k for “key”.) We follow the steps given in the definition:

plaintext a b o u t f a c e plain numbers 1 2 15 21 20 6 1 3 5 multiplied numbers 5 10 75 105 100 30 5 15 25 %26 5 10 23 1 22 4 5 15 25 ciphertext EJWAVDEOY The ciphertext is EJWAV DEOY. (2) Encipher Midnight using k = 5. (3) Encipher Revolution using k = 19.8

 Recall that the weakness in the shift ciphers was that adjacent letters in the plaintext alphabet remained adjacent even after being enciphered. The

7Why “Decimation”? It is not clear. In Roman times, to decimate your troops meant to have the troops choose one-tenth of their numbers by lots, and then kill those soldiers. Hopefully this cipher will not be that painful. 8(2) MSTRS INV, (3) DQBYT IPOYF 3.3. DECIMATION CIPHERS 41 previous about face example shows that decimation ciphers apparently will not have this weakness. The adjacent letters ef and tu were enciphered to YD and VA, respectively, and the adjacent triple abc to EJO. In fact, if we go ahead and encipher the entire alphabet with key k = 5 we find the following cipher alphabet:

plaintext abcdefghijklmnopqrstuvwxyz ciphertext EJOTYDINSXCHMRWBGLQVAFKPUZ

None of the letters are next to their former neighbors.9 Decimation ciphers destroy letter ordering! We have not explained how to “properly” choose our keynumbers, but to see that there must be some care taken consider enciphering anna or bob using a decimation cipher with k = 2. The ciphertexts are BBBB and DDD, respectively. How would anyone decipher BBBB? It is impossible to tell which B is an a and which is an n. So no one would be able to decipher a message whose enciphering key was 2. Hence, only certain keys can be used to multiply. In this chapter you will be given only proper keys, so we can safely put off solving this problem until Chapter 4. Thus far we have enciphered letters and only letters. What if we wish to keep spacing? Or to add punctuation in our messages? The next examples will do that.

Examples:

(1) Work modulo 28 with key k = 5 to encipher a space here. Use the usual plainnumbers with the addition that = 27, where the symbol means “space”. We do this as we did before, being careful to find the remainders using 28 rather than 26.

plaintext a s p a c e h e r e plain numbers 1 27 19 16 1 3 5 27 8 5 18 5 multiplied numbers 5 135 95 80 5 15 25 135 40 25 90 25 %28 5 23 11 24 5 15 25 23 12 25 6 25 ciphertext EWKXEOYWLYFY

The answer is EQAXE OYWLY FY.

(2) Encipher no one here using the same method with key k = 9.

9The careful reader may notice an oddity in the last plaintext–ciphertext alphabet pair. (If you didn’t, try enciphering zzz using any key before reading further.) Since we think of a as the first letter of the alphabet, z is the 26th. But 26 times any keynumber will be a multiple of 26, and so when divided by 26 it will have a remainder of 0. Since 0 is just before 1, 0 must represent Z. Hence, z will always be enciphered into Z. Fortunately z is uncommon enough that this doesn’t really threaten the security of this cipher. 42 CHAPTER 3. THE INTRODUCTION OF NUMBERS

(3) Encipher one, two, three using the usual plainnumbers with = 27 and , =28. Use a decimation cipher modulo 29 with key k = 8. The first part of this one is set up for you.

plaintext o n e , t w o , t h r e e plain numbers 15 14 5 28 27 multiplied numbers 120 112 40 224 216 %29 4 25 11 21 13 ciphertext DYKUM

(4) Encipher I’m his, he’s mine using k = 7 in a a decimation cipher mod- ulo 30, where = 27, , = 28 and ’ = 29. 10



3.4 Deciphering Decimation Ciphers

How do we decipher a message enciphered with a decimation cipher? To de- cipher a shift cipher we simply subtracted the number that had been added. Since decimation ciphers involve multiplication, it seems that division must be the key to enciphering. Let’s look at the first example from Section 3.3.

Example: Decipher EJWAV DEOY, whose enciphering key was k = 5 We begin with the usual set-up. Since enciphering was done by multiplying by 5, we try here to decipher by dividing by 5.

ciphertext EJWAVDEOY ciphernumbers 5 10 23 1 22 4 5 15 25 divided ciphernumbers 1 2 4.6 0.2 4.4 0.8 1 3 5 plaintext a b ? ? ? ? a c e

Several of the plaintext letters are correct, but which is the 4.6th letter of the alphabet? Or the .2th? Simply dividing the ciphernumbers does not work. And the reason is clear: we shouldn’t be dividing the (reduced) ciphernumbers (5, 10, 23, 1, etc.), but rather the non-reduced ones (5, 10, 75, 105, etc.) And it’s not clear that we can determine what they were. 

What we need is a way to “unmultiply,” or, to be more precise, to undo the multiplication modulo 26. We all know that to “unadd” we simply add the negative, also called the additive inverse. So to unmultiply we will need to multiply by the multiplicative inverse. Just as a number plus its additive in- verse equals 0, the additive identity, a number times its multiplicative inverse equals 1, the multiplicative identity.

10(2) NWSWN QWPWV W, (3) DYKUM OJDUM OF,KK, (4) CXAIZ CMPIZ EXMIA CHE. 3.4. DECIPHERING DECIMATION CIPHERS 43

To decipher a decimation cipher with key K we will need to find the inverse of K modulo 26 (the number that satisfies (K × )%26 = 1). It turns out that the inverse of 5 when working modulo 26 is 21: (5 × 21)%26 = 1. Since multiplying by 1 changes nothing, following a multiplication by 5 with another by 21 brings us back to our original message. The same is true for several other pairs of numbers. (The numbers that have multiplicative inverses are “proper” in the sense of the definition of Decimation ciphers on page 40.) We will learn how to produce these pairs in Chapter 4. For now, we simply list them.

enciphering key 1 3 5 7 9 11 15 17 19 21 23 25 deciphering key 1 9 21 15 3 19 7 23 11 5 17 25

Figure 3.2: Enciphering/Deciphering pairs modulo 26.

Notice that the deciphering key is seldom the same as the enciphering key. Do not make the mistake of simply reusing the enciphering key, or using its negative. A Decimation Cipher will only decipher properly when the correct key is used.

Examples:

(1) Decipher EJWAV DEOY, whose enciphering key was k = 5 From Figure 3.2 the multiplicative inverse of 5 is 21. So if we multiply the ciphernumbers by 21 and then find their remainders mod 26 we should have our message back. Let’s see.

ciphertext EJWAVDEOY ciphernumbers 5 10 23 1 22 4 5 15 25 ×21 105 210 483 21 462 84 21 63 105 %26 1 2 15 21 20 6 1 3 5 plaintext a b o u t f a c e

Multiplying by 21 really did undo the multiplication by 5 and so did decipher our message.

(2) Decipher MKCCKFI, if the enciphering number was 7. The key was multiply by 7. However, if we try to divide by 7 to decipher we have troubles:

ciphertext MKCCKFI ciphernumbers 13 11 3 3 11 6 9 divide by 7 1.857 1.571 .429 .429 1.571 .857 1.286 plaintext ??????? 44 CHAPTER 3. THE INTRODUCTION OF NUMBERS

Instead, we need to multiply by the inverse of 7, which from Figure 3.2 is 15. ciphertext MKCCKFI ciphernumbers 13 11 3 3 11 6 9 ×15 195 165 45 45 165 90 135 %26 13 9 19 19 9 12 5 plaintext m i s s i l e This works much better. The answer is missile. (3) Decipher JCPCJS if the enciphering key was k = 9. (4) Decipher RDYRSCSPQ if enciphering key was k = 19. (5) Decipher AWVFWYKLC if the enciphering key was k = 11. 11



3.5 Multiplication vs. Addition

We have seen that decimation ciphers cause letters which are adjacent in the plaintext alphabet to be separated in the ciphertext alphabet. How much does this improve security? Consider the following ciphertext that was enciphered with a decimation cipher.

UQESF YFTGW SGPVS PPVQX QEDGR PMQFP YJSFG EORVQ DQBQF PVQWO MRTQW PUOTT SPPOM QWOFE TIXQS FIMLQ DYJUY FXQDO FCW

It has the letter frequency table

011449402201406913365345340 ABCDEFGHIJKLMNOPQRSTUVWXYZ

There are no obvious places to fit the aei, no, rst and uvwxyz patterns. Nor, for that matter, any non-obvious place! Clearly our efforts to construct and understand the Decimation Ciphers have been fruitful – the techniques that allow us to decrypt Caesar ciphers no longer suffice.

3.6 Koblitz’s Kid-RSA and Public Key Codes

To demonstrate the powers of the ideas we’ve studied in this chapter we are going now to explain Neal Koblitz’s toy system, “Kid-RSA”. The RSA cryptosystem, due to Rivest, Shamir, and Adelman, is one of the most important systems

11(3) divide, (4) propagate, (5) subjugate. 3.6. KOBLITZ’S KID-RSA AND PUBLIC KEY CODES 45 in use today. We will study it in Chapter 12. The Kid-RSA is a Decimation Cipher with a slightly complicated setup. It offers no actual security, hence it is a “toy” system, but will allow us to introduce the concept of public keys.

Set-up of Kid-RSA: 1. Choose four integers, calling them a, b, A and B. 2. Compute AB − 1 and call it M. 3. Compute aM + A and bM + B and call them e and d, respectively. 4. Compute (ed − 1)/M and call it n. Then e and d will serve as the enciphering and deciphering keys in a Deci- mation Cipher modulo n.

A couple of examples will help clarify.

Examples:

(1) Suppose we choose a = 5, b = 7, A = 4 and B = 3. Then we compute the other parameters: M = AB − 1 = 11 e = aM + A = 59 and d = bM + B = 80, and n = (ed − 1)/M = (59 · 80 − 1)/11 = 429. Now we encipher numerical as usual:

plaintext n u m e r i c a l plainnumbers 14 21 13 5 18 9 3 1 12 ×59 826 1239 767 295 1062 531 177 59 708 %429 397 381 338 295 204 102 177 59 279 ciphertext 397 381 338 295 204 102 177 59 279

Because we are working modulo 429 we cannot at this point reduce modulo 26 to retrieve a ciphertext. Instead we simply use the reduced ciphernum- bers as our ciphertext. So numerical is enciphered to 397, 381, 338, 295, 204, 102, 177, 59, 279.

To decipher we multiply by d and reduce modulo 429:

ciphertext 397 381 338 295 204 102 177 59 279 ×80 31760 30480 27040 23600 16320 8160 14160 4720 22320 %429 14 21 13 5 18 9 3 1 12 plaintext n u m e r i c a l

So we retrieve our plaintext numerical. 46 CHAPTER 3. THE INTRODUCTION OF NUMBERS

(2) Given a = 3, b = 9, A = 5 and B = 4, 1. Compute M, e, d, and n. 2. Encipher public. 3. Decipher {111, 310, 408}.12 (3) Given a = 7, b = 2, A = 3, B = 6, 1. Compute M, e, d, and n. 2. Encipher private. 3. Decipher {29, 108, 79, 194}.13



While the setup of Kid-RSA is a bit of work, there is a very nice payoff that we illustrate with names. Suppose Alice wishes others to be able to send her private messages. She (secretly) chooses values for a, b, A and B, and then computes M, e, d and n. (For this example we will assume e = 165, d = 121 and n = 868.) Then she adds these public keys to her business cards

Alice Anderson Phone: 1-800-CALL-ALC Email: alice a mymail.com I use Koblitz’s Kid-RSA. My public keys are e = 165 and n = 868. and web page and her company’s contact information sheet. The reason for the name is now obvious: a public key is a key that is made known to the public. The value d Alice should keep secret and is called her private key. If Bob wants to send Alice a secret message, he first looks up her enciphering key and modulus. Then to say “Hi”, he computes

plaintext h i plainnumbers 8 9 ×165 1320 1485 %868 452 617 and sends Alice the ciphertext {452, 617}. Notice that Bob didn’t need a, b, A or B to do this. Further, because Alice has previously computed d, she can quickly decipher the message. How does Kid-RSA differ from our earlier Decimation Ciphers? For Alice and Bob to use a standard Decimation Cipher they must agree upon e, d and n. And they must keep these values private, for anyone who knows these numbers

121. M = 19, e = 62, d = 175, n = 571, 2. {421, 160, 124, 173, 558, 186}, 3. key 131. M = 17, e = 122, d = 40, n = 287, 2. {230, 187, 237, 101, 122, 144, 36}, 3. lock 3.6. KOBLITZ’S KID-RSA AND PUBLIC KEY CODES 47 can read the ciphertext messages just as easily as Alice and Bob can. They are private keys. In particular, Alice and Bob cannot simply transmit these num- bers to one another for fear that someone will eavesdrop on that transmission. A bit of thought shows this implies that for Alice and Bob to set up a method for exchanging secret messages, they must already be able to secretly exchange messages!14 A public key cryptosystem removes this difficulty. Bob and Alice need never have met, or have had contact of any kind. When, for whatever reason, Bob decides to send Alice a secret message he looks up her public key and enciphers using it. Similarly, if Alice wants to reply, she can look up Bob’s public key and use it.

Example: Bob sends Alice the message “Part 1: Using your public key, the enciphering key was 517. Part 2: Message is ’Meet on WIFXSG’.” If Alice’s public key is based on a = 4, b = 5, A = 4 and B = 10, what day will they meet? Doing the computations, d = 205 and n = 841. Since (205×517)%841 = 19, Bob used k = 19 to encipher WIFXSG. From Figure 3.2, using the key 11 will decipher the message. (You can now finish by deciphering the message.)15 

In order for a public key system to be secure it must be impossible, or at least really difficult, for someone to use the public information to determine the private information. If Ed, Alice’s enemy, notices that Bob sent her the message {452, 617}, how might he break it? Ed can look up Alice’s public keys e and n just as Bob did. Can he use these numbers to compute her private key d? If so, he simply deciphers the message. In this case, Ed knows that d is an integer, somewhere between 1 and n = 868. As in the method of exhaustion for Caesar Ciphers, Ed could successively try d = 1, d = 2, d = 3, ..., until the message pops out. And it would after at most 868 attempts. While this is too many computations to do by hand, it is a triviality for a computer. (Alice would be better off to choose larger values for the parameters: a = 195, b = 191, A = 184 and B = 177 would lead to n = 1, 213, 027, 575, and a billion computations will take a bit more time, even by computer.) Is there another method Ed might use? He knows the values of e and n, and wants a value d so that (d × e)%n = 1. The difficulty we had with the standard Decimation Ciphers (How do we determine the deciphering key from the enciphering key?) is now supplying the security in the Kid-RSA! Overcoming this difficulty (and hence breaking the Kid-RSA) is the main goal of Chapter 4.

14Of course, Alice and Bob could either meet in person or would allow a trusted third party to carry the numerical keys from one of them to the other. But these are hardly practical solutions in general. 15Sunday. 48 CHAPTER 3. THE INTRODUCTION OF NUMBERS

3.7 Summary

The mathematical operation underlying Caesar and Shift Ciphers is addition. Since these ciphers are insecure, it seems natural to try to build a cipher based on multiplication. These so-called Decimation Ciphers are somewhat more com- plicated, as they involve numbers much larger than 26 and it is not obvious how to determine their deciphering keys from their enciphering keys. On the other hand, they do not immediately fall prey to the type of frequency analysis that makes Caesar Ciphers insecure. To handle numbers larger than 26 we introduced the remainder operator %. For A and n integers, A%n is the remainder when A is divided by n. For positive A’s finding this value with a calculator is very quick: divide A by n, subtract the integer part of the result away, and then multiply by n. Using the remainder operator, Decimation Ciphers seem to be a fine replace- ment for our broken Caesar Ciphers. At least, the weakness that plagues those latter ciphers, adjacent letters being enciphered into adjacent letters, seems to not be a problem with our new ciphers. In Koblitz’s Kid-RSA, a somewhat fancy choice of the parameters produces a public key system. One party, “Alice”, makes certain keys public and keeps others private, and then other parties can use her public key(s) to send messages to Alice even if they have never met her. The implications of Public Key Codes are many and far reaching. Although the ideas behind public key codes did not coalesce until the 1970’s16, their advantages quickly became clear. Nearly all modern uses of cryptography now involve some combination of private key and public key cryptosystems.

3.8 Topics and Techniques

1. Why do we replace the letters of the alphabet with the numbers 1 through 26?

2. Why must we consider numbers larger than 26? Give an example.

3. What does A%n mean? How do we determine which number this is?

4. What does A ≡ B (mod n) mean? How do we determine if this equiva- lence is true?

5. What does it mean for two numbers to be equivalent?

6. What is a modulus?

7. What is a Decimation Cipher?

16The first clear remarks along this line were made by Martin Hellman and Whitfield Diffie in 1976. 3.9. EXERCISES 49

8. How do we encipher with a Decimation Cipher? 9. Is the deciphering process for a Decimation Cipher the same process as enciphering, or different? Explain. 10. How are the encipher key, deciphering key, and modulus in a Decimation Cipher related? 11. What is a multiplicative inverse? 12. Suppose we know a cipher is either a Caesar Cipher or a Decimation. Cipher. How can a frequency count help us determine which? 13. Can a frequency count help break a Decimation Cipher? Explain. 14. How many Decimation Ciphers are there modulo 26? 15. What does it mean to be a Public Key? How does this differ from a Private Key? 16. What parts of the Kid-RSA are made public? What parts are kept private? 17. How do Public Key systems differ from Private Key systems?

3.9 Exercises

Show your work, and explain any non-standard steps.

1. Find the following remainders. (a) 41%12. (b) 53%9. (c) 110%43. (d) 332%23. 2. Find the following remainders. (a) 87%17. (b) 95%13. (c) 195%15. (d) 3389%201. 3. Find the following remainders. (a) −8%10. (b) −37%11. (c) −621%34. 50 CHAPTER 3. THE INTRODUCTION OF NUMBERS

(d) −309%26. 4. Find the following remainers. (a) 192%18. (b) 1829%82. (c) −381%91. (d) 8391%5. 5. Test to see whether the following equivalence statements are true or not by computing the remainder of both sides. (a) 72 ≡ 44 (mod 19). (b) 87 ≡ 103 (mod 11). (c) 327 ≡ 199 (mod 32). (d) 411 ≡ 879 (mod 39). 6. Encipher the following, using a Decimation Cipher modulo 26 with the given key.

(a) Quotient, key = 11. (b) Remainder, key = 23. (c) Fraction, key = 5. (d) Integer, key = 19. 7. Decipher the following words. They have been enciphered using a Deci- mation Cipher modulo 26 with the given key.

(a) LOIAMCJ, key = 3. (b) UNGDL MADGK, key = 21. (c) QVOGH TQ, key = 17. (d) SOXSC XCY, key = 15. (e) MIRWBKP, key = 11. 8. Sending a message asking for 5293 men is a lot easier than sending for five thousand two hundred and ninety-three men. One way to simplify the transmission of numbers is to let the numbers 0 to 9 stand for themselves in the plaintext, save 10 for space, use the numbers 11 to 36 to represent the letters of the alphabet, and work modulo 37. So a = 11, b = 12, ..., z = 36, and for example, the numerical version of 10 to 15 people is then 1 0 10 30 25 10 1 5 10 26 15 25 26 12 15. (a) Encipher 1 if by land using a Caesar Cipher with key 10. (b) Decipher TU2AK 2326G R7F. It was enciphered using a Caesar Cipher with key 29. 3.9. EXERCISES 51

(c) Encipher 54 40 or fight using a Decimation Cipher with key 11. (d) Decipher EF2 G QOX3R It’s a Decimation Cipher with deciphering key 5. (e) Decipher DRWEQ VAH. It’s a Decimation Cipher with deciphering key 16. (f) Decipher 832Q3 AS35M 7. It’s a Decimation Cipher with deciphering key 28. 9. The ASCII code (American Standard Code for Information Interchange) is commonly used method for translating letters and characters into numeric equivalents. Lower-case and upper-case letters have their own values, as do the numbers, punctuation marks, and other useful symbols. (The numbers 0 to 32 represent computer controls – 9 is the tab key, for example, and do not much interest us.) ASCII Symbol ASCII Symbol ASCII Symbol ASCII Symbol 0 NUL 32 (space) 64 @ 96 ‘ 1 SOH 33 ! 65 A 97 a 2 STX 34 ” 66 B 98 b 3 ETX 35 # 67 C 99 c 4 EOT 36 $ 68 D 100 d 5 ENQ 37 % 69 E 101 e 6 ACK 38 & 70 F 102 f 7 BEL 39 ’ 71 G 103 g 8 BS 40 ( 72 H 104 h 9 TAB 41 ) 73 I 105 i 10 LF 42 * 74 J 106 j 11 VT 43 + 75 K 107 k 12 FF 44 , 76 L 108 l 13 CR 45 - 77 M 109 m 14 SO 46 . 78 N 110 n 15 SI 47 / 79 O 111 o 16 DLE 48 0 80 P 112 p 17 DC1 49 1 81 Q 113 q 18 DC2 50 2 82 R 114 r 19 DC3 51 3 83 S 115 s 20 DC4 52 4 84 T 116 t 21 NAK 53 5 85 U 117 u 22 SYN 54 6 86 V 118 v 23 ETB 55 7 87 W 119 w 24 CAN 56 8 88 X 120 x 25 EM 57 9 89 Y 121 y 26 SUB 58 : 90 Z 122 z 27 ESC 59 ; 91 [ 123 { 28 FS 60 < 92 \ 124 | 29 GS 61 = 93 ] 125 } 30 RS 62 > 94 ˆ 126 ˜ 31 US 63 ? 95 127 DEL 52 CHAPTER 3. THE INTRODUCTION OF NUMBERS

Figure 9. ASCII Code

In the examples below, we use the ASCII code by first translating plaintext characters into their ASCII equivalents, perform some numerical opera- tions on those equivalents modulo 128, and then send the numerical results as the ciphertext. For example, if doing a Caesar Cipher with key 3 on “$ = Dollars?,” the ASCII form is 36 61 68 111 108 108 97 114 116 127, and shifting gives the ciphertext 39 64 71 114 111 111 100 117 119 2. (a) Encipher Stalag 17 using a Caesar Cipher with key 90. (b) Decipher 5 5 110 33 67 60 65 51 66 110 33 66 64 55 62. It was enciphered with a Caesar Cipher with key 78. (c) Encipher 12 O’Clock High using a Decimation Cipher with key 17. (d) Decipher 46 40 24 75 93 91 103 102 92 107 24 103 110 93 106 24 76 103 99 113 103. It was enciphering using a Caesar Cipher with key 120. (e) Decipher 7 61 112 32 90 45 47 97 74 32 119 100 74 97 97 100. It was enciphered using a Decimation Cipher. The deciphering key is 11. (f) Decipher 35 59 40 92. It was enciphered using a Decimation Ci- pher. The deciphering key is 27. (g) Decipher 13 7 58 98 3 109 120 124 96 43 124 96 124 120 87 96 101 122 96 57 122 97 69 102 102 43 36. It was enciphered using a Decimation Cipher. The deciphering key is 35. (h) Decipher 42 26 121 48 121 38 68 92 32 104 101 58 28 121 53 73 13 38 37 42 32 31 101 13 116 32 68 8 121 32 36 13 111 23 68 43 111 23 32 1 101 38. It was enciphered using a Decimation Cipher. The deciphering key is 77. 10. Roberta’s MRSA has parameters a = 5, b = 1, A = 6 and B = 7. (a) Find the other parameters. (b) Encipher privacy. (c) Decipher 133, 187, 141, 50, 137, 112, 137, 17, 71. (d) Nikki has looked up Roberta’s public key and has sent her the fol- lowing two part message. Decipher it. First part: 21 187 50 17 137 133 50 71 141 71 46 137 100 21 162 83 17 129 54 191 17 71 191 137 191 54 Second part: QRCVC PWEMY OEYHE CZYIP HA. 11. Henry’s MRSA has parameters a = 2, b = 6, A = 5 and B = 4. (a) Find the other parameters. 3.9. EXERCISES 53

(b) Encipher secrecy. (c) Decipher 129, 111, 68, 258, 120, 172, 68, 59, 120, 43, 249. (d) Frita has sent Henry the following two-part message. The first part used Henry’s public key. Decipher both parts. First part: 129, 43, 215, 16, 43, 240, 206, 215, 7, 188, 43, 16, 77. Second part: AOL IBASLY KPK PA. 54 CHAPTER 3. THE INTRODUCTION OF NUMBERS Chapter 4

The Euclidean Algorithm

It is sometimes said that, next to the Bible, the “Elements” may be the most translated, pub- lished, and studied of all the books produced in the Western world. B. L. van der Waerden

In Chapter 3 we developed Decimation Ciphers. By means of multiplication and the remainder operator these ciphers seem to offer some degree of security. However, only some enciphering keys could be used, and it wasn’t clear how to determine the deciphering key from the enciphering key. Our work in this chapter is intended to surmount these difficulties. Let us start with a review in the form of a new cipher system.

4.1 Linear Ciphers

To try to make an even better cipher we might combine the two types of ciphers we’ve seen so far, by first multiplying and then adding.

Caesar Cipher: Pick a keynumber c. Then enciphering is “Add c mod 26”, that is, the message m becomes (m + c)%26.

Decimation Cipher: Pick a (proper) keynumber k. Then enciphering is “multiply by k mod 26”, that is, the message m becomes (k × m)%26.

Combining these, we define a linear cipher to be one that first multiplies and then adds.

Linear Cipher: Pick a (proper) keynumber k and (any) keynumber c. Then enciphering is “multiply by k then add c mod 26”, that is, the message m becomes (k × m + c)%26.

55 56 CHAPTER 4. THE EUCLIDEAN ALGORITHM

Examples:

(1) Encipher multiply with key 7m + 2. Rather than just adding or multiplying we combine the two.

plaintext m u l t i p l y plainnumbers 13 21 12 20 9 16 12 25 ×7 91 147 84 140 63 112 84 175 +2 93 149 86 142 65 114 86 177 %26 15 19 8 12 13 10 8 21 ciphertext OSHLMJHU

So the ciphertext is OSHLMJHU

(2) Encipher decimate with key 5m + 8.

(3) Encipher conquer with key 9m + 3.1



4.2 GCD’s and the Euclidean Algorithm

Our Decimation and Linear Ciphers have two problems: the enciphering key must be chosen “properly” (a term I still haven’t defined) and once it is chosen the corresponding deciphering key must somehow be determined. It is perhaps not surprising that the solutions to these two problems are related. However it is probably surprising that the solutions involve greatest common divisors and were known to Euclid, some 2500 years ago. Euclid (c. 350 B.C.E.) textbook, The Elements, is the most successful ever written, and, with The Bible, one of the most published books of all time, appearing in over 1000 editions. Euclid taught at the academy in Alexandria, but this is about all we know about his life. The Elements deals with plane and solid geometry and number theory, while other books of Euclid cover such topics as astronomy, mechanics, music, and optics. The Greatest Common Divisors, or gcd, of two integers is exactly what the name suggests: it is the largest integer that divides both. For example, 14 is divisible by 1, 2, 7 and 14, while 10 is divisible by 1, 2, 5 and 10. The largest divisor that 14 and 10 have in common is 2, and so gcd(14, 10) = 2. For some reason most people seem able to rather automatically compute the gcd’s of small numbers.

1(2) BGWAU MDG, (3) DHYZJ VI. 4.2. GCD’S AND THE EUCLIDEAN ALGORITHM 57

Examples: Compute the gcd’s.

(1) gcd(35, 15). (2) gcd(20, 12). (3) gcd(21, 12). (4) gcd(16, 8). (5) gcd(22, 15). 2

 To hint that there is a connection between gcd’s and the enciphering keys, recall from Figure 3.2 that modulo 26 the proper enciphering keys are 1, 3, 5, 7, 9, 11, 15, 17, 19, 21, 23, and 25. In other words, the numbers from 1 to 26 excluding the even numbers and 13, that is, excluding those numbers whose gcd with 26 is larger than 1. Although the gcd’s of small numbers are easy to compute, it is not obvious how to quickly compute the gcd of larger numbers, say gcd(182, 217). We next work to develop a method to do this. Euclid’s key observation was that gcd’s and the remainder operator are intimately related. Precisely, any number that divides both of the numbers a and b will also divide a%b. This is because, as we recall from Chapter 3, a%b = a − bq where q is the quotient, the integer part of a ÷ b. So if some number d divides both a and b, then it will divide a − bq = a%b. On the other hand, if a number d divides a%b and b then it will also divide a = a%b + bq. Since this is true for all divisors, it is certainly true for the largest divisor. Hence gcd(a, b) = gcd(b, a%b). For example, gcd(35, 15) = gcd(15, 35%15) = gcd(15, 5). In trading a 35 for a 5 we’ve made our problem much simpler. Doing this again, gcd(15, 5) = gcd(5, 15%5) = gcd(5, 0). Since every number divides 0, the largest number that divides both 0 and 5 is the largest that divides 5, which is 5.

Example: Use this method to compute gcd(69, 27). gcd(69, 27) = gcd(27, 69%27) = gcd(27, 15) = gcd(15, 27%15) = gcd(15, 12) = gcd(12, 15%12) = gcd(12, 3) = gcd(3, 12%3) = gcd(3, 0) = 3.

So gcd(69, 27) = 3. Of course, we’d probably abbreviate this as gcd(69, 27) = gcd(27, 15) = gcd(15, 12) = gcd(12, 3) = gcd(3, 0) = 3. 

2(1) 5, (2) 4, (3) 3, (4) 8, (5) 1. 58 CHAPTER 4. THE EUCLIDEAN ALGORITHM

The individual computations are not difficult, but the continual swapping of gcd entries, as well as the rewriting of “gcd” is a bit much. In addition while we have kept the remainder information, we have lost the quotients that we will soon need. So we are going to introduce a more compact way of presenting this computation, perhaps due to [Glasby]. To see how this is built, consider gcd(69, 27) = gcd(27, 69%27). The remain- der 69%27 is 15, and the quotient of 69 ÷ 27 is 2. So we write q 2 , r 69 27 15 where q and r remind us which line contains the quotient and which the remain- der. Similarly, for gcd(27, 15) = gcd(15, 27%15) we write

q 1 r 17 15 12 since the quotient of 27 ÷ 15 is 1 and the remainder is 12. The advantage of this representation is that q 2 q 1 and r 69 27 15 r 17 15 12 may be combined as q 2 1 . r 69 27 15 12 Adding the 12 ÷ 3 results then gives q 2 1 1 4 . r 69 27 15 12 3 0

Examples: (1) Compute gcd(15, 85). For this first example we work one division step at a time and write in bold the numbers added at each step. q Enter the two numbers, larger one first: r 85 15 q 5 from 85 ÷ 15: r 85 15 10 q 5 1 from 15 ÷ 10: r 85 15 10 5 q 5 1 2 from 10 ÷ 5: r 85 15 10 5 0 The gcd is the final non-zero remainder. So gcd(15, 85) = 5. 4.3. MULTIPLICATIVE INVERSES 59

(2) Find gcd(79, 201).

q 2 1 1 5 7 gcd(79, 201) = 1. r 201 79 43 36 7 1 0

(3) Find gcd(182, 217).

q 1 5 gcd(217, 182) = 5. r 182 35 7 0

 This process is called the Euclidean Algorithm,3 and the general rule is that at any stage of four numbers in a triangle q , a b r we have q = a ÷ b, the quotient of the division, and r = a%b, the remainder. Even for large numbers the Euclidean Algorithm takes relatively few steps.4

Example: Compute gcd(191, 156)

q 1 4 2 5 3 gcd(191, 156) = 1. r 191 156 35 16 3 1 0 

Being that this is math book, we should justify the algorithm, that is, explain how we know it always works. Fortunately this is easy. We know that gcd(a, b) = gcd(b, a%b), so by letting r = a%b, each consecutive pair of numbers in the “r line” has the same gcd. Since a > b > r ≥ 0, the values in the r line are positive but shrinking, so the algorithm must end. And when it does it, with a 0, because gcd(r, 0) = r the final non-zero entry equals the gcd of any two consecutive values in the line, in particular, the first two values in the line.

4.3 Multiplicative Inverses

For us to fully understand the multiplicative ciphers, decimation and linear, we need to know how to get from enciphering key to deciphering key. An extension

3An algorithm is a step-by-step procedure that solves a particular problem or produces some desired outcome. The word comes from the name of Mohammed ibn-Muse al-Khwarizmi, a mathematician in the royal court of Bagdad c. 800 A.D. Algebra likely also comes from his name. 4A theorem named for Gabriel Lam´e,a French engineer, physicist and mathematician, says that the number of divisions needed to find the greatest common divisor of two numbers is no more than five times the number of decimal digits in the smaller of the two numbers. 60 CHAPTER 4. THE EUCLIDEAN ALGORITHM of the Euclidean Algorithm will do this for us. For example, suppose we had used k = 9 as our multiplicative key (modulo 26). To decipher we need to use the key c satisfying (9c)%26 = 1, or, equiva- lently, 9 × c ≡ 1 (mod 26). To find c we extend Euclid’s Algorithm by adding a new line, a “coefficient line,” that begins with a 0 and a 1 and is continued by the formula

old c − current q × current c = new c.

Examples:

(1) Find gcd(9, 26) and the inverse of 9 modulo 26. For this example we will first work one step at a time.

q Begin with gcd entries and 0 and 1: r 26 9 c 0 1 ( q = 2, r = 8, and q 2 From 26 ÷ 9, 0 − (2 × 1) = −2 : r 26 9 8 c 0 1 -2 ( q = 1, r = 1, and q 2 1 From 9 ÷ 8, 1 − (1 × −2) = 3 : r 26 9 8 1 c 0 1 −2 3 ( q = 8, r = 0, and q 2 1 8 From 8 ÷ 1, 2 − (8 × 3) = −26 : r 26 9 8 1 0 c 0 1 −2 3 -26

Just as the final (non-zero) entry in the remainder line is the gcd, the entry in the coefficient line under the gcd gives us the solution to 9 × x ≡ gcd (mod 26). So 9 × 3 ≡ 1 (mod 26). (The −26 at the end of the coefficient line is ignored.) 4.3. MULTIPLICATIVE INVERSES 61

(2) Compute gcd(12, 45) and find a solution to 12x ≡ gcd (mod 45). q Setup: r 45 12 c 0 1 ( q = 3, r = 9, and q 3 From 45 ÷ 12, 0 − (3 × 1) = −3 : r 45 12 9 c 0 1 -3 ( q = 1, r = 3, and q 3 1 From 12 ÷ 9, 1 − (1 × −3) = 4 : r 45 12 9 3 c 0 1 −3 4 ( q = 3, r = 0, and q 3 1 3 From 9 ÷ 3, −3 − (3 × 4) = −15 : r 45 12 9 3 0 c 0 1 −3 4

So gcd(12, 45) = 3 and x = 4 is a solution to 12x ≡ gcd (mod 45), which we can check by seeing that (12 × 4)%45 = 3%45. (The entry in the coefficient row under the final remainder of 0 is ignored, so we didn’t bother to enter it.) (3) Compute gcd(27, 50) and find a solution to 27x ≡ gcd (mod 50). q 1 1 5 1 3 r 50 27 23 4 3 1 0 c 0 1 −1 2 −11 13

So gcd(27, 50) = 1. Since −11 ≡ 39 (mod 50), we have 27 × 39 ≡ 1 (mod 50).



As we’ve seen, gcd(a, n) always divides (ab)%n, for any choice of integer b. That is, (ab)%n is always a multiple of gcd(a, n). So for there to be a solution x to ax%n = 1 we must have gcd(a, n) = 1. And if so, then the solution x is found in the coefficient line. These concepts are important enough that they have names. If two numbers have a gcd of 1, they are said to be relatively prime. And when two numbers a and n are relatively prime then they each have a multiplicative inverse with respect to the other – there are solutions x and y to (ax)%n = 1 and (ny)%a = 1. 62 CHAPTER 4. THE EUCLIDEAN ALGORITHM

Before summarizing, let’s do a couple more examples.

Examples:

(1) Does 7 have a multiplicative inverse modulo 26? If so, what is it? This is asking us to find gcd(7, 26). If the gcd is 1, then we may solve 7x ≡ 1 (mod 26). q 3 1 2 2 r 26 7 5 2 1 0 c 0 1 −3 4 −11

Since gcd(7, 26) = 1, 7 and 26 are relatively prime and 7 does have a multiplicative inverse modulo 26. It is −11 or its positive version 15 (since −11 ≡ 15 (mod 26)). Thus 7 may be used as the multiplicative enciphering key in a decimation or linear cipher, and 15 will be (part of) the deciphering key.

(2) Does 8 have a multiplicative inverse modulo 26? If so, what is it? q 3 4 r 26 8 2 0 c 0 1 −3

Since gcd(8, 26) = 2 is not 1, 8 and 26 are not relatively prime, and so there is no multiplicative inverse for 8. Hence 8 may not be used as the multiplicative enciphering key in a Decimation or Linear Cipher. (3) Find gcd(49, 13) and, if possible, solve 13x ≡ 1 (mod 49) q 3 1 3 1 r 49 13 10 3 1 0 c 0 1 −3 4 −12

gcd(49, 13) = 1 so they are relatively prime and since −12 ≡ 37 (mod 49), the inverse of 13 modulo 49 is 37.



Euclid’s Extended Algorithm is quite simple. Nonetheless it is very pow- erful. Let’s end this section by summarizing it and some of its consequences. Because we will mostly be using it for computing a deciphering key from a known enciphering key k and modulus n, we state it in terms of k and n. 4.4. DECIPHERING DECIMATION AND LINEAR CIPHERS 63

Theorem 2 (The Extended Euclidean Algorithm) Given two integers n and k with n > k, construct the table q ? ··· ? r n k ··· g 0 c 0 1 ··· d ∗ q by, at each step, from form where q the a b a b r α β α β α − qβ quotient and r the remainder are computed from a ÷ b. Then

1. This process eventually terminates by producing a 0 in the “remainder” line. 2. The last non-zero entry in the remainder line is gcd(n, k). 3. At each step, β × k ≡ b (mod n). In particular, if d is the entry in the coefficient line directly below the gcd g = gcd(n, k), then a × d ≡ gcd(n, k) (mod n). 4. When n and k are relatively prime, d is the inverse of i modulo n.

We haven’t formally proven all of this important theorem, although all of the necessary ingredient have been stated. See any text on number theory, such as [KRosen], for the missing parts. Nonetheless, the theorem does explain the oddities in division we noticed at the end of Section 3.2.

Examples: Examples of Division in Modular Arithmetic.

(1) 3x ≡ 9 (mod 7) has the usual solution x = 3. Since gcd(7, 3) = 1 divides 9 there is exactly gcd = 1 solution. (2) 3x ≡ 9 (mod 12) has the usual solution x = 3, but also x = 7 and x = 11. Since gcd(12, 3) = 3 divides 9 there are gcd = 3 solutions. (3) 3x ≡ 8 (mod 7) has the unusual solution x = 5. Since gcd(7, 3) = 1 divides 8 there is gcd = 1 solution. (4) 3x ≡ 8 (mod 12) has no solutions. Since gcd(12, 3) = 3 doesn’t divide 8 there are no solutions.



4.4 Deciphering Decimation and Linear Ciphers

We now finally understand what needs to be “proper” about the multiplying keynumbers in Decimation and Linear Ciphers. When we work modulo 26, 64 CHAPTER 4. THE EUCLIDEAN ALGORITHM in order for the multiplicative keynumber number k to have a multiplicative inverse, its greatest common divisor with 26 must be 1. And if so, the Extended Euclidean Algorithm computes the multiplicative inverse of k. This allows us to finalize our description of the multiplicative ciphers.

Computing Keys for Decimation and Linear Ciphers: 1) Pick an integer k. 2) Use the Euclidean Algorithm to check that gcd(k, N) = 1. 3) If not, pick a new k. 4) If so, use the Euclidean Algorithm to find the inverse d. (If d is negative, use N + d instead.)

Examples: (1) Decipher OXXMT MEB. It was enciphered with key 19m + 12 modulo 26. A Linear Cipher first multiplies and then divides. To undo this we must go backwards: first subtract and then “divide.” Dividing, of course, means multiplying by the inverse of 19 modulo 26, which from Figure 3.2 is 11.

ciphertext OXXMTMEB ciphernumbers 15 24 24 13 20 13 5 2 −12 3 12 12 1 8 1 -7 -10 ×11 33 132 132 11 88 11 -77 -110 %26 7 4 4 9 20 9 15 14 plaintext a d d i t i o n The answer is addition. (2) Decipher FRGPN I. The enciphering key was 14 modulo 27. First we need to find the inverse of 14 modulo 27: q 1 1 1 r 27 14 13 1 0 c 0 1 −1 2 The inverse is 2. So we decipher by multiplying by 2 modulo 27.

ciphertext FRGPNI ciphernumbers 6 18 7 16 14 8 ×2 12 36 14 32 28 16 %27 12 9 14 5 1 16 plaintext l i n e a r The answer is linear. (3) May 13 be used as the multiplicative part of a key in a Decimation or Linear Cipher modulo 33? Yes: gcd(33, 13) = 1. (We didn’t even need Euclid’s Algorithm for this one!) 4.5. BREAKING DECIMATION AND LINEAR CIPHERS 65

(4) Decipher JWNIC G. It was enciphered with the Linear Cipher 13m + 4 modulo 33. Ok, now we do need Euclid’s algorithm:

q 2 1 1 6 r 33 13 7 6 1 0 c 0 1 −2 3 −5

Since −5%33 = 28 the inverse of 13 is 28. Now we may proceed by subtracting by 4 and multiplying by 28.5



As one last example, recall from Chapter 3 that the Kid-RSA cipher has as public information an enciphering key e and modulus n. In order for this cipher to be worthwhile, it must be infeasible for an adversary to compute the secret deciphering key d from the public keys e and n. Because of the Euclidean Algorithm, not only is computing d not infeasible, it is simple. To illustrate, if Albert has published e = 893 and n = 8106 as his public keys, we simply compute gcd(8106, 893):

q 9 12 1 16 4 r 8106 893 69 65 4 1 0 c 0 1 −9 109 −118 1997

So in a very few steps, the “secret” deciphering key is d = 1997. Even though there is no real security in Koblitz’s system, it did allow us to introduce the important concepts related to public keys.

4.5 Breaking Decimation and Linear Ciphers

Earlier we saw that decimation and linear ciphers cause letters which are adja- cent in the plaintext alphabet to be separated in the ciphertext alphabet. How much does this improve security? Consider the ciphertext from Section 3.5 that was enciphered with a linear cipher.

UQESF YFTGW SGPVS PPVQX QEDGR PMQFP YJSFG EORVQ DQBQF PVQWO MRTQW PUOTT SPPOM QWOFE TIXQS FIMLQ DYJUY FXQDO FCW

It has the following letter frequency table.

011449402201406913365345340 ABCDEFGHIJKLMNOPQRSTUVWXYZ

5cipher. 66 CHAPTER 4. THE EUCLIDEAN ALGORITHM

As we saw in Chapter 3 this is not a Caesar cipher; none of the standard hill and valley patterns can be found in the frequency count. Are there any new patterns that might replace our old ones? To find out, study carefully the following two plaintext/ciphertext alphabet pairs:

Key k = 5 : plaintext a b c d e f g h i j k l m n o p q r s t u v w x y z ciphertext E J O T Y D I N S X C H M R W B G L Q V A F K P U Z

Key k = 3m + 4: plaintext a b c d e f g h i j k l m n o p q r s t u v w x y z ciphertext E H K N Q T W Z C F I L O R U X A D G J M P S V Y B In the first, we have a → E, b→ J, c→ O, and while abc are consecutive, EJO are not; they are 5 letters apart. In fact, any two consecutive plaintext letters are sent to ciphertext letters five letters apart. The multiplicative key of 5 apparently spreads out the letters by a factor of 5. Why is this? Two consecutive plaintext letters will have plainnumbers that differ by one, say α and α + 1. When enciphered the (unreduced) ciphernumbers are 5α and 5α + 5, which differ by 5. Does the same work for the linear cipher k = 3m + 4? (Check it and see!)6 At first glance it seems that Decimation and Linear ciphers do a good job of “unordering” the plain alphabet, mixing it up well. But these examples should convince us that this is not so. In fact, the rule is that the distance consecutive plain alphabet letters are spread apart equals the multiplicative key of the cipher. We can use this idea to decrypt our ciphertext. In our unsolved cipher, B and U are the two most common letters in the ciphertext: perhaps they are e and t, respectively? Since a linear cipher has the formula mk + c, so guessing that B = e tells us 5k + c ≡ 2 and guessing U = t gives us 2km + c ≡ 21, both equations modulo 26. So c ≡ 2 − 5k and c ≡ 21 − 20k. Setting these equal gives 20k − 21 ≡ 5k − 2 or 15k ≡ 19. From Figure 3.2 the inverse of 15 is 7. So if we multiply both sides of 15k ≡ 19 by 7, since 15 × 7 ≡ 1 and 19 × 7 ≡ 3, we are left with k ≡ 3. Finally, from 5k + c ≡ 2 and k = 3, we can substitute and solve to see that c = 13. Thus the original cipher was a linear cipher with rule 3k + 13. We can then simply decipher the message.7 Now this only worked because we were lucky enough to guess both e and t. However, a cipher that is easily broken once your enemy correctly guesses only two letters is probably not a very strong one. Just as Caesar ciphers proved to be weak because they do not separate adjacent letters in the plaintext alphabet, Decimation and Linear ciphers are weak because the plaintext letters are enciphered using an easily recognized pattern.

6Yes. The letters are spread out by a factor of 3 and then shifted 4 more letters down the alphabet. 7We can only say that the decryptment of any cipher, even the simplest, will at times include a number of wonderings. Helen Fouch´eGaines. 4.6. SUMMARY 67

To put it more bluntly, guessing one letter correctly allows us to break Caesar ciphers. Guessing two letters correctly allows us to break Decimation and Linear ciphers.

4.6 Summary

A Linear Cipher is a combination of a Decimation and a Shift Cipher, and so has both a multiplicative and an additive key. The additive key can be chosen at random, like in a Shift Cipher, but, like in a Decimation Cipher, the multiplicative key must be chosen so that it is relatively prime to the modulus. A relatively quick way of computing greatest common divisors is with the Euclidean Algorithm. To compute the gcd of two numbers, the algorithm uses a process of repeated remainder operations in which the larger of the two numbers is continually replaced by the remainder of the larger divided by the smaller. By putting this into a table and making use of the quotients involved, the mul- tiplicative inverse can also quickly be found. Since Koblitz’s Kid-RSA depends for its security on the inability for an adversary to compute multiplicative in- verses, we see it not secure at all, but is intended as a toy system, intended to demonstrate the concept of public keys. Decimation and Linear Ciphers do a better job of mixing the ciphertext alphabet than a Caesar Cipher does. Letters that are adjacent in the plaintext alphabet are not adjacent once enciphered, they are k letters apart, where k is the multiplicative key of the cipher. While this prevents us from using the hills and valley patterns of the normal frequency to break the cipher, it provides enough order so that if two ciphertext letters can be matched with their plaintext counterparts either of these ciphers can be broken.

4.7 Topics and Techniques

1. How many keys does a Linear Cipher need?

2. How do we encipher with a Linear Cipher?

3. What is the greatest common divisor of two numbers?

4. What is the Euclidean Algorithm?

5. What does relatively prime mean? When are two numbers relatively prime?

6. How do we use the Euclidean Algorithm to find a gcd?

7. What is a multiplicative inverse?

8. When does a number have a multiplicative inverse modulo n? 68 CHAPTER 4. THE EUCLIDEAN ALGORITHM

9. How do we use the Euclidean Algorithm to find a multiplicative inverse? 10. How many Linear Ciphers are there modulo 26? 11. What information is needed to decipher a Decimation Cipher? A Linear Cipher? 12. Are Decimation Ciphers secure or not? Why? 13. Are Linear Ciphers secure or not? Why? 14. Can a frequency count help break a Decimation or Linear Cipher? Ex- plain.

4.8 Exercises

1. Use the Euclidean Algorithm to find the following gcd’s. (a) gcd(32, 54). (b) gcd(45, 39). (c) gcd(78, 53). (d) gcd(102, 78). 2. Use the Euclidean Algorithm to find the following gcd’s. (a) gcd(61, 56). (b) gcd(121, 77). (c) gcd(735, 140). (d) gcd(297, 201). 3. Use the Euclidean Algorithm to find the following gcd’s, and then find a solution to the related modular equation. (a) Find gcd(25, 85). Find a solution to 25x ≡ gcd (mod 85). (b) Find gcd(27, 17). Find a solution to 17x ≡ gcd (mod 27). (c) Find gcd(24, 102). Find a solution to 24x ≡ gcd (mod 102). (d) Find gcd(149, 78). Find a solution to 78x ≡ gcd (mod 149). 4. Use the Euclidean Algorithm to find the following gcd’s, and then find a solution to the related modular equation. (a) Find gcd(16, 110). Find a solution to 16x ≡ gcd (mod 110). (b) Find gcd(31, 79). Find a solution to 31x ≡ gcd (mod 79). (c) Find gcd(298, 79). Find a solution to 79x ≡ gcd (mod 298). (d) Find gcd(306, 192). Find a solution to 192x ≡ gcd (mod 306). 4.8. EXERCISES 69

5. Here we work with Decimation Ciphers modulo 28, using 27 = .

(a) Encipher no spaces allowed with key 19. (b) Decipher CUDAZ KOTQW A. The enciphering key was 17. (c) Decipher XDOMQM HKE Q JYBQM KT . The enciphering key was 11. (d) Decipher P WHO ENOWL MKI. The enciphering key was 13.

6. Encipher the following words, using the given linear cipher modulo 26.

(a) Geometry, 3m + 7. (b) Algebra, 11m + 19. (c) Trigonometry, 7m + 2.

7. Decipher the following words. They were enciphered with a linear cipher, modulo 26.

(a) VFQDY DMBMP O, key 5m + 20. (b) TKIAB CVFQ, key 23m + 4. (c) YVZGN YNX, key 19m + 8. (d) HGZGR HGRXH, key 25m + 1.

8. What happens if we follow a linear cipher by another linear cipher?

9. In this problem we work as we did in Chapter 3 Exercise 8: the numbers 0 to 9 stand for themselves in the plaintext, 10 is for spaces, 11 to 36 represent the letters of the alphabet, and we work modulo 37.

(a) Encipher 8 or 9 using a Caesar cipher with key 11. (b) Encipher 25 cents using a Decimation cipher with key 18. (c) Encipher Tea for 2 using a Linear Cipher with key 13m + 6. (d) Decipher NAA52 5I5NB 75I52 5NAA. It was enciphered with a Deci- mation cipher with key k = 19. (e) Decipher 9AIB ZE0AD ZAEJB A9D66 PL. It was enciphered with a Linear Cipher with key 15m + 9. (f) Decipher REHQD HMHTQ H7. It was enciphered with a Linear Cipher with key 8m + 12. (g) Decipher A0L6O ISS. It was enciphered with a Linear Cipher with key 16m + 7. (h) Decipher 5UHUE XUKEL U5UH. It was enciphered with a Linear Cipher with key 31m + 17. (i) Decipher L2 83 3 T20 JIQ. It was enciphered with a Linear Cipher with key 13m + 20. 70 CHAPTER 4. THE EUCLIDEAN ALGORITHM

10. In a Decimation Cipher the enciphering key must be smaller than an relatively prime to the modulus. Since there are 12 numbers between 1 and 26 whose gcd with 26 is 1, there are 12 Decimation Ciphers modulo 26. (a) How many Decimation Ciphers are there modulo 7? (Hint: how many numbers are smaller and relatively prime to 7?) How many (b) How many Decimation Ciphers are there modulo 18? (c) How many Decimation Ciphers are there modulo 28? (d) How many Decimation Ciphers are there modulo 35? (It might be easier to first find the numbers that are not proper enciphering keys.)

11. The following is a decimation cipher. Use the techniques of this chapter to decrypt it. PYETO MLWPQ QRVOT TWDQK IODQW WTYUR SEQSP JODWP UOTTO SMWVS AQWRQ SDQ. 12. The following is a linear cipher. With the hint that the three most common letters are eit (although not necessarily in that order), can you decrypt it? JXYVK MDZYJ PNFJK NLXLD SHYNM NWZMD HGNXG JPLJW NWZLO NFMFQ TMNFW DMNPP LENJM LIVKL YXLKM NQIL 13. Alex’s public Kid-RSA gives e = 361 and N = 4063. Suppose you’ve captured the message “I used your PK. 630, 3518, 269, 3157, 722, 899, 3157, 177, 1352, 630, 1352, 1444, 3157, 177, 1805, 991, 3157, 899, 2166, 3249, 3879, 1805. GTTGM JNQJZ DMFF. Can you decrypt the message?

14. Corrine’s public Kid-RSA lists e = 495 and N = 2644. You’ve intercepted a message for her: “Use 978, 2475, 314, 2475, 978, 1473, 2475, 1980, 495, 652, 2632, 1316, 495, 990, 2475, 1968. XZHS RM YLC MFNYVI GVM. Can you decrypt the message? Chapter 5

Monoalphabetic Ciphers

Here is an example of a plaintext – ciphertext alphabet pair for each type of cipher we have seen thus far.

1. A Caesar cipher with key 5:

plaintext alphabet abcdefghijklmnopqrstuvwxyz ciphertext alphabet FGHIJKLMNOPQRSTUVWXYZABCDE

2. A Decimation Cipher modulo 26 with key 21:

plaintext alphabet abcdefghijklmnopqrstuvwxyz ciphertext alphabet UPKFAVQLGBWRMHCXSNIDYTOJEZ

3. A Linear Cipher modulo 26 with key 7m + 9:

plaintext alphabet abcdefghijklmnopqrstuvwxyz ciphertext alphabet PWDKRYFMTAHOVCJQXELSZGNUBI

These are all monoalphabetic ciphers, ciphers in which the same plaintext letters are always replaced by the same ciphertext letters. Mono, meaning one, indicates that each letter has a single substitute. In this chapter we look at other ways of creating monoalphabetic ciphers. 1 To construct a monoalphabetic cipher, we need to create some ordering of the alphabet, such as SOMERDINGXHBVLTUJWKYZFACPQ, and pair it with a plaintext alphabet,

plaintext alphabet abcdefghijklmnopqrstuvwxyz ciphertext alphabet SOMERDINGXHBVLTUJWKYZFACPQ 1When the ciphertext alphabet is in the usual order just shifted, as in (1), it is said to be a regular or direct substitution alphabet. When the ciphertext alphabet is mixed up, as in (2) and (3), it is said to be a mixed substitution alphabet. A reversed alphabet is another possibility, and is exactly what it sounds like.

71 72 CHAPTER 5. MONOALPHABETIC CIPHERS

We then encipher and decipher by translating from the plaintext to ciphertext alphabets and back, as usual. In this example alphabet becomes SBUNSORY. However it is not particularly easy to remember apparently random orderings of 26 letters. So we will concentrate a couple of well-known methods that use a key to develop the ciphertext alphabet’s order.

5.1 Keyword Ciphers

To use Keyword Cipher method to construct the ciphertext alphabet, pick a keyword and write it down, ignoring repeated letters. Follow it with the letters of the alphabet that have not yet been used.

Example: Find the alphabet pairs for the keyword COLLEGE. Crossing out the letters that are making their second appearance leaves COLEG. To encipher then we use the pair of alphabets

plaintext abcdefghijklmnopqrstuvwxyz ciphertext COLEGABDFHIJKMNPQRSTUVWXYZ

Enciphering university then gives UMFVGRSFTY. 

Clearly there is a problem with this cipher: it does a poor job of mixing the ciphertext alphabet. Around 1580 Giovanni Battista Argenti suggested that one also pick a keyletter and begin the keyword under that letter of the plaintext. The Argentis, Giovanni and his nephew Matteo, form one of the great cryptology families of the middle ages. After many years of trying, in 1590 Giovanni finally became papal secretary of ciphers in Rome, only to quickly weaken from the frequent necessary trips to Germany and France. Before dying on April 24, 1591, he passed his knowledge to his nephew Matteo who succeeded him and held the office during the reign of the next five popes. To use Giovanni’s method with keyletter p we would start COLEG under pqrst, giving

plaintext abcdefghijklmnopqrstuvwxyz ciphertext JKMNPQRSTUVWXYZCOLEGABDFHI

Then university is enciphered as AYTAPLETGH. This method mixes the cipher- text alphabet better. 5.2. KEYWORD MIXED CIPHERS 73

Example: Encipher university using keyword xylophone and key F. 2 

Even with the added complication, significant parts of the alphabet are still enciphered as in a Caesar cipher. Compare the ciphertext alphabet using key- word COLLEGE and key P with the alphabet generated by the Caesar cipher with keyletter L:

plaintext abcdefghijklmnopqrstuvwxyz COLLEGE and PJKMNPQRSTUVWXYZCOLEGABDFHI Caesar key LLMNOPQRSTUVWXYZABCDEFGHIJK There are many overlaps and near overlaps in the two cipher alphabets, which makes a keyword cipher not much more secure than a Caesar cipher.

5.2 Keyword Mixed Ciphers

The cryptosystem we call Keyword Mixed Ciphers seems to have been in- vented in 1854 by Sir Charles Wheatstone, whom we will meet again [Bauer, page 48]. Pick a keyword and write it out, again ignoring repetitions of letters. Then write the remainder of the alphabet underneath, using the same number of columns as letters in the shortened keyword. Finally, pull off the columns in order and write the letters underneath the plaintext alphabet.

Example: Find the ciphertext alphabet using a Keyword Mixed Cipher and keyword COLLEGE. First we remove the repeated letters, so the shortened keyword is COLEG. Then we write the it down, followed by the remainder of the alphabet. COLEG ABDFH IJKMN PQRST UVWXY Z Finally we pull out the columns in order.

plaintext abcdefghijklmnopqrstuvwxyz ciphertext CAIPUZOBJQVLDKPQEFMSXGHNTY This results in a nicely mixed ciphertext alphabet without obvious pattern. 

2JAOKZFGOIR 74 CHAPTER 5. MONOALPHABETIC CIPHERS

Examples: Use a Keyword Mixed Cipher.

(1) Encipher monoalphabetic using the keyword SIMPLE. The columnar arrangement of the alphabet is

SIMPLE ABCDFG HJKNOQ RTUVWX YZ Pulling the columns off in order, left-to-right, and putting it under the usual alphabet gives

plaintext abcdefghijklmnopqrstuvwxyz ciphertext SAHRYIBJTZMCKUPDNVLFOWEGQX

From here the enciphering should be quick.

(2) Encipher cryptology using the keyword SECRET.

(3) Decipher DPCRD JPUTI using the keyword HOMOPHONE.

(4) Decipher COACH SHOHS EU using the keyword DIRECT. 3



Keyword mixed ciphers are one of the best monoalphabetic ciphers. The key is simple to choose, remember and change, and the cipher itself is easy to setup and use. For these reasons there are a couple of modifications that people sometimes use: Keyword Transposed and Keyword Interrupted Ciphers.

5.3 Keyword Transposed Ciphers

In a Keyword Transposed Cipher we require the columns to be pulled off in alphabetical order. So if we use again COLLEGE as the keyword, from the array

COLEG ABDFH IJKMN PQRST UVWXY Z

3(1) KPUPS CDJSA YFTH, (2) HFQWP OCOEQ, (3) polyphonic, (4) substitution. 5.4. INTERRUPTED KEYWORD CIPHERS 75 we first pull out the C column, followed by the E column, and then the G, L and finally O columns. This gives

plaintext abcdefghijklmnopqrstuvwxyz ciphertext CAIPUZEFMSXGHNTYLDKPQOBJQV as the substitution alphabet.

Examples: Use a Keyword Transposed Cipher.

(1) Encipher monoalphabetic using the keyword SIMPLE. The columns of the alphabet look like

SIMPLE ABCDFG HJKNOQ RTUVWX YZ Pulling the columns off in alphabetical order gives the ciphertext alphabet

plaintext abcdefghijklmnopqrstuvwxyz ciphertext EGQXIBJTZLFOWMCKUPDNVSAHRY

From here the enciphering should again be quick.

(2) Encipher cryptology using the keyword SECRET.

(3) Decipher QFKLQ SFNYR using the keyword HOMOPHONE.

(4) Decipher RMHRF YFMFY BI using the keyword DIRECT. 4



5.4 Interrupted Keyword Ciphers

In an Interrupted Keyword Cipher instead of removing duplicated letters put in * as a placeholder. This time COLLEGE becomes

COL*EG* ABDFHIJ KMNPQRS TUVWXYZ

4(1) WCMCE OKTEG INZQ, (2) JHQSU XFXBQ, (3) polyphonic, (4) substitution. 76 CHAPTER 5. MONOALPHABETIC CIPHERS

Then remove the ciphertext alphabet as in a keyword cipher, ignoring the *’s:

plaintext abcdefghijklmnopqrstuvwxyz ciphertext CAKTOBMULDNVFPWEHQXGIRYJSZ

Finally, one can use Interrupted Keyword Transposed ciphers, but we won’t.

5.5 Frequency Counts and Exhaustion

Are these new monoalphabetic ciphers any more secure than the ones we saw earlier? Or will frequency analysis once again save (ruin?) the day? Let’s try to break a message enciphered with a keyword mixed cipher and see.

Example: KNHHXKK QS PXTDQSB YQFJ NSISCYS HQEJXUK QK LXTKNUXP AO FJXKX RCNU FJQSBK QS FJX CUPXU STLXP EXUKXVXUTSHX HTUXRND LXFJCPK CR TSTDOKQK QSFNQFQCS DNHI FJX TAQDQFO TF DXTKF FC UXTP FJX DTSBNTBX CR FJX CUQBQSTD QK VXUO PXKQUTADX ANF SCF XKKXSFQTD The letter frequency count of the ciphertext is

ABCDEFGHIJKLMNOPQRSTUVWXYZ 4 5 10 10 2 17 0 6 2 9 17 3 0 9 4 7 18 4 16 16 12 2 0 27 2 0

Does this look like a Caesar cipher? By far the most common letter is X, so this is likely e. But if so then S and T are y and z, respectively, and it is not likely that a message would have sixteen each of these letters. In fact, there is no low sextuple. This is not a Caesar cipher. The ciphertext’s letters have been well enough mixed that the standard frequency patterns have been totally destroyed. 

This is the main advantage of the keyword ciphers: unless we know the method and the keyword, it is quite difficult to directly detect the pattern in the ordering of the ciphertext alphabet. In a Caesar cipher, once we knew where the ciphertext version of e was located we were basically done. In Decimation and Linear ciphers, knowing two letters allowed us to determine the rest. With a keyword cipher, the ciphertext alphabet is mixed enough that knowing the meanings of even five or ten ciphertext letters does not necessarily disclose the pattern of encryption. What about exhaustion, one might ask? We might use a computer and simply try all the possibilities. Working modulo 26 there are 26 different Caesar ciphers. There are only 12 decimation ciphers modulo 26, and 26 × 12 = 312 linear ciphers. How many monoalphabetic ciphers are there? Well, we have 26 choices for which letter goes is substituted for a. Then we have 25 choices for 5.6. BASIC LETTER CHARACTERISTICS 77 the substitution for b, 24 for c’s substitute, etc. Thus there are

26! = 26 × 25 × 24 × · · · × 2 × 1 = 403, 291, 461, 126, 605, 635, 584, 000, 000 different monoalphabetic substitution ciphers. How large of a number is this? If we used a computer that could check one trillion different possibilities every second, we’d need about 12 million years to check all the possibilities! Of course, no one simply uses brute force to break monoalphabetic ciphers. In our current example X almost positively must be e. And most of etaoinshr probably comes from CFKQSTU. This cuts down immensely on the number of possibilities. But to decrypt such a cipher in a truly finite amount of time we must go beyond simple frequency counts to consider the behaviors of the letters.

5.6 Basic Letter Characteristics

We’ve seen which letters are the most common (etaoinshr) and least common (vkjxqz). We next look at which letters appear first and last in words. We begin with the frequency information from earlier. (The initial and final letter percentages are from Sinkov’s study of 16410 words [Sinkov].)

8.2 1.5 2.8 4.3 12.7 2.2 2.0 6.1 7.0 0.2 0.8 4.0 2.4 abcdefghijklm 6.7 7.5 1.9 0.1 6.0 6.3 9.1 2.8 1.0 2.4 0.2 2.0 0.1 nopqrstuvwxyz

Figure 5.1: Letter Frequencies – Anywhere.

11.0 4.6 5.6 2.8 2.5 4.1 1.8 3.9 5.6 0.6 0.5 2.1 3.5 abcdefghijklm 2.4 7.2 4.7 0 3.1 7.4 15.9 1.4 0.6 5.1 0 0.7 0 nopqrstuvwxyz

Figure 5.2: Letter Frequencies – Initial Letters.

Summarizing, the most common individual letters are 1) Anywhere: etaoi, 4 vowels and t. 2) Beginning words: tasoic, with t easily the most common. 3) Ending words: edtsn (almost spells “endts”). 4) Doubles: lesot. We put this summary into Figure 5.4. 78 CHAPTER 5. MONOALPHABETIC CIPHERS

2.9 0.2 0.6 10.0 20.3 4.5 2.8 2.5 0.4 0 0.1 3.7 1.3 abcd e fghijklm 9.7 4.5 0.5 0 5.5 12.7 9.7 0.2 0.1 1.0 0.2 5.5 0 nopqrstuvwxyz

Figure 5.3: Letter Frequencies – Final Letters.

a e i o t r n h s common x X x x x start x x x X x end X x x x x x doubles x x x x

Figure 5.4: Characteristics of etaoinshr.

We also list the most common short words in English in Figure 5.5 and one list of the 100 most common English words, with the number of times they appeared out of 1, 000, 000 words, in Figure 5.6.

1 letter words: a i 2 letter words: an at as he be in is it on or to of 3 letter words: the and (both really common,) was for his had 4 letter words: that with this from have

Figure 5.5: Most Common Short Words

Notice that the most common words by far, are non-context words: articles, prepositions, conjunctions and other auxiliary particles. In fact, in some lists the of and to a in make up nearly 20% of all words, and up to 70 of the 100 most used words are non-context words.

5.7 Aristocrats

We begin our breaking of general monoalphabetic ciphers with Aristocrats. These are short quotations enciphered with a monoalphabetic substitution. They appear in almost every newspaper, usually in the comics section. We start with them since they tend to be not terribly difficult, mainly because they keep word divisions and punctuation, and are frequently given with a hint. Decrypting aristocrats is mostly simple hard work, although some of our frequency information will be of service. Some suggested steps are 5.7. ARISTOCRATS 79

the 69971 or 4207 so 1984 like 1290 of 36411 have 3941 said 1961 our 1252 and 28852 an 3747 what 1908 over 1236 to 26149 i 3700 up 1895 man 1207 a 23237 they 3618 its 1858 me 1181 in 21341 which 3562 about 1815 even 1171 that 10595 one 3292 into 1791 most 1160 is 10099 you 3286 them 1789 made 1125 was 9816 were 3284 than 1789 after 1070 he 9543 her 3037 can 1772 also 1069 for 9489 all 3001 only 1747 did 1044 it 8756 she 2859 other 1702 many 1030 with 7289 there 2724 new 1635 before 1016 as 7250 would 2714 some 1617 must 1013 his 6997 their 2670 time 1599 through 969 on 6742 we 2653 could 1599 back 967 be 6377 him 2619 these 1573 where 938 at 5378 been 2472 two 1412 much 937 by 5305 has 2439 then 1377 your 923 this 5146 when 2331 do 1363 way 909 had 5133 who 2252 first 1360 well 897 not 4609 more 2216 any 1345 should 888 are 4393 no 2201 my 1319 because 883 but 4381 if 2199 now 1314 each 877 from 4369 out 2096 such 1303 just 872

Figure 5.6: The 100 Most Common Words in English

1. Do a frequency count. Identify which ciphertext letters are most likely etaoinshr. Be aware, however, that especially in short message, frequen- cies can be very strange.

2. Look at the initial and final letters of words. Use this to help identify which etaoinshr letters are which.

3. Study the short words. There frequently are words like I, a, the and and present.

4. Work hard. Mix effort with brain power. Remember that brilliant induc- tive realizations usually come only after some hard thought.

Example: SDGHKHMP HP TDQJ ZBFXJQDEP KWBF LBQ, YDQ HF LBQ CDE BQJ DFGC UHGGJZ DFMJ. LHFPKDF MWEQMWHGG Hint: G = l. Substituting the given G = l gives the word UHllJZ=**ll**. So H and J must be vowels, and U and Z are probably consonants. The words HP and HF show P and F to be consonants, and H is perhaps i? 80 CHAPTER 5. MONOALPHABETIC CIPHERS

Turning to the frequency count

ABCDEFGHIJKLMNOPQRSTUVWXYZ 05283765055340036011103112 the most common letters are BDFGHJKQ. Of these, F, J and Q each occur three times as final letters. Let’s guess that one of these is e. It cannot be F, a consonant, nor Q, as then the word TDQJ would end in an e-vowel combination. So e must be J. Next, BQJ=BQe, so Q must be a consonant, and from LBQ, B is probably a vowel, hence a or o. Trying B=o doesn’t work. But when we try B=a, we quickly see Q=r, making YDQ = YDr = Yor = for, as o is the only vowel then left. At this point the cipher looks like *oli*i*s is *ore *an*ero*s **an *ar, for in *ar *o* are onl* *ille* on*e. *in*on ***r**ill. From here the quotation should be pretty easy to complete. 5 

Clearly this example worked out a little too nicely, in part because we hid some false starts and in part because time spent thinking and scratching one’s head is hard to indicate in text. But it should still give an idea of how Aristocrats are attacked.

5.8 Summary

A monoalphabetic cipher is one in which the each plaintext letter is replaced by the same ciphertext letter throughout the entire message. The Caesar or Shift Ciphers, Decimation Ciphers and Linear Ciphers are all monoalphabetic, as are the various types of Keyword Ciphers introduced in this chapter. The Keyword Ciphers all use a keyword to develop an ordering of the cipher- text alphabet. The more important ones are the Keyword Mixed and Keyword Transposed Ciphers. Select a keyword and drop any repeated letters in it. Then write the remainder of the alphabet underneath in as many columns as remain in the keyword. For a Mixed Cipher the ciphertext alphabet is then the columns pulled off in order, left-to-right, while for a Transposed Cipher the alphabet is pulled off in alphabetical order of the top row. A popular type of decryption puzzle is an Aristocrat, which is generally a short quote enciphered with a monoalphabetic cipher. Word divisions and punctuation are generally kept, and a hint is often given. The keys to decrypting these ciphers include the usual frequency count, some knowledge of initial and final letters of words, and very often the use of common short words.

5Politics is more dangerous than war, for in war you are only killed once. Winston Churchill. 5.9. TOPICS AND TECHNIQUES 81

5.9 Topics and Techniques

1. What is the chief characteristic of a Monoalphabetic Cipher?

2. What is a Keyword Cipher? How do we encipher and decipher with it?

3. What is a Keyword Mixed Cipher? How do we encipher and decipher with it?

4. Keyword Ciphers and Keyword Mixed Ciphers use their keyword differ- ently. Explain the difference.

5. What is a Keyword Transposed Cipher? How do we encipher and decipher with it?

6. How do Keyword Mixed Ciphers and Keyword Transposed Ciphers differ?

7. What letters most commonly begin words in English?

8. What letters most commonly end words in English?

9. What are the most common one-letter words in English? Two-letter words? Three-letter words?

10. What are the most common words in English?

11. What is an Aristocrat?

12. What steps help decrypt an Aristocrat?

5.10 Exercises

1. Encipher or decipher the following words using a Keyword Cipher with the given keyword.

(a) democrat with REPUBLICAN. (b) chocolate with HOT. (c) JUIES OP with LETTERS. (d) DSTLQ with TAILS. (e) SQGSI LT with BASEBALL.

2. Encipher or decipher the following words using a Keyword Mixed Cipher with keyword COLORS.

(a) canary. (b) eggplant. (c) fuchsia. 82 CHAPTER 5. MONOALPHABETIC CIPHERS

(d) lavender. (e) DXLAU EET. (f) PEGB HM. (g) WUEHF HIVLU. (h) DCOIU QC. (i) JHUII C. 3. Encipher or decipher the following words using a Keyword Mixed Cipher with keyword WEEKDAYS. (a) Tuesday. (b) Thursday. (c) DOGTW J. (d) CHKTW J. (e) PYGTW J. 4. Encipher or decipher the following words. They were enciphered via a Keyword Transposed Cipher with the keyword VEGETABLES. (a) turnip. (b) cabbage. (c) lentil. (d) eggplant. (e) parsnip. (f) MDHIA MI. (g) RDORG WJA. (h) RHRHP IJM. (i) GKDFM AIO. (j) RAHFOUFKVJM. 5. Decipher the following words. They were enciphered via a Keyword Mixed Cipher with the given FRUITS.

(a) artichoke. (b) guava. (c) tapioca. (d) mulberry. (e) papaya. (f) currant. (g) DIKYB VFQFE Y. 5.10. EXERCISES 83

(h) DYVTP KKIQ. (i) EFKFV PQO. (j) FWFHFOI. (k) UMKLMFE. (l) KFQBI. 6. Encipher or decipher the following words using a Keyword Transposed Cipher with the keyword AMERICA. (a) Brazil. (b) Columbia. (c) Ecuador. (d) French Guiana. (e) BXNHL H. (f) UYLYW XYAH. (g) JXDML HFY. (h) XDXBX HN. (i) ZHDHB XHN. 7. Decipher the following words. They were enciphered via a Keyword Trans- posed Cipher with the keyword SPORTS. (a) Rugby. (b) Tennis. (c) Luge. (d) Biathlon. (e) Croquet. (f) Lacrosse. (g) COMIL NUXN. (h) ROQOR LNB. (i) VOLOD OL. (j) SWUON AZW. (k) CXEDL NB. (l) HFLHR WU. 8. Encipher or decipher the following words using an Interrupted Keyword Cipher with the given keyword. (a) tea with keyword COFFEE. (b) beer with keyword PRETZELS. (c) LUAZC XE with keyword APPLES. (d) STERI with keyword TELEVISION. (e) RTJHI Z with keyword PAPER. 84 CHAPTER 5. MONOALPHABETIC CIPHERS

9. Decrypt the following Aristocrats. A hint has been given.

(a) ZJG REGFF ADLH DZRFG AFDRE, EDXG DKA ZXKYFGKDXVH BJKRXKQF RJ YGJO. CQR TEJKF BDVVH RDTFG JZZ. IJEKKL BDGHJK. Hint: R = t. (b) KAHTH FN WJ XLW NJ EMHNNHC KALK NJXH RAJ NKLWC EP AFN CHLKAEHC RJW’K ALFM KAH JDDLNFJW RFKA CHMFIAK. XLTDSN LSTHMFSN. Hint: A = h . (c) E OEB IZ JVEYU KCH KVIRTZ E ZTJVTR IB EBU HRCTV KEU RCEB HBT KCIJC KIWW JHBJTEW IR XVHO RCT FAWGEV. VHGTV DEJHB. Hint: V = r. (d) D QZJSEY ZO D VEGSAX AB TYZGZMK D VEOODKE OA GSDG ZG QDMMAG IE YEDX IL DMLAME ZKMAYDMG AB GSE VEGSAX. BNEGQSEY JYDGG. Hint: X = d.

10. Decrypt the following Aristocrats.

(a) IOKVK MVK UKTKVMZ JGGF XVGIKLICGHU MJMCHUI IKYXIMICGH ENI IOK UNVKUI CU LGSMVFCLK YMVA ISMCH (b) EKY HZMJECDZAX US EKY AZUSECHZAE CI JLWWNYS XCEEC CI EKY AXYZ- UHAR HZMJECDZAJK ASSCHUAEUCR (c) CP TEF HALZKCTO ZM QCREFKV CT CV ZPNO TEF FGCVTFPQF ZM KFDJPD- APQO CP TEF ZKCXCPAN HFVVAXFV TEAT HAWFV A VZNJTCZP RZVVCINF (d) MOY DYKP IOJIVJPFEPV ESDOKP LJ E JVFWOYK PFEJIV BGVJ BOFZLJA OJ E IONV LP LK JOP OQPVJ NOJV CM IOJKILOYK VQQOFP DLKK QFLVN- FLIGK

11. During the 1670’s The Chevalier de Rohan was imprisoned under suspicion of treason. While guilty, there was little evidence and his life depended on the fate of his accomplice who lay dying in the same prison. If his friend confessed, Rohan would be convicted and executed. If not, Rohan might be able to go free. Shortly before his trial Rohan received a note hidden in a bundle of cloth- ing: PVQ RDOWYFQD OW XQSX VQ WSOX FYPVOFC. Should Rohan confess and throw himself on the mercy of the court? Or should he stonewall and hope to be let free? (The message was actually in French. For those who read French, the ciphertext is MG EULHXCCLGU GHJ YXUJ LM CT ULGC ALJ.)

12. Decrypt the following monoalphabetic cipher. Kahn [page 770] says a similar method was used, albeit in Latin, to record this event. J1629 B1762, 01ug3927 of W45541m B1762, cu71927 of 932 p17483, 48 b1p94820 J16u17y 1, 1645 46 C5219o7, Cumb275160, 26g5160 5.10. EXERCISES 85

13. Ciphers in which pairs of numbers replace each letter are called biliteral or dinome ciphers. A simple way to construct such ciphers is to put the letters of the alphabet into a rectangle and number the rows and the columns. For example, if we use a five-by-five , and squash i and j together, we have 1 2 3 4 5 1 a b c d e 2 f g h ij k 3 l m n o p 4 q r s t u 5 v w x y z To encipher, replace the letter by its row–column number pair, that is, read from the side first. So box becomes 12 34 53. Deciphering is just translating back into letters. Use the given square to encipher or decipher the following.

(a) rectangle. (b) square. (c) 54 66 53 56. (Hint: 41 was added to every number before trans- mission). (d) 42 32 14 15 23 45 43. The alphabet was entered into the square using the keyword DINOME. (e) Perhaps the first ever use of a password was when Giovanni Argenti, around 1589, used PIETRO and a rectangle with two rows and 10 columns (0 being the first). (The letters jkvwxy did not appear.) Us- ing this method, 17 16 13 13 11 27 13 16 and 24 16 13 13 12 15 gives Argenti’s middle name and his nephew’s name. What are they?

14. A larger bipartite cipher was used by Brig. General Leslie R. Groves [Kahn, page 546]. It took the form

1 2 3 4 5 6 7 8 9 0 1 I P I O U O P N 2 W E U T E K L O 3 E U G N B T N S T 4 T A Z M D I O E 5 S V T J E Y H 6 N A O L N S U G O E 7 C B A F R S I R 8 I C W Y R U A M N 9 M V T H P D I X Q 0 L S R E T D E A H E 86 CHAPTER 5. MONOALPHABETIC CIPHERS

(a) Groves used this cipher for certain telephone conversations during the building of the atomic bomb. Encipher atomic bomb. (b) Decipher 01-53 72-29-01 96-22-25-70 45-74 77-47-28-92-42. He was Groves’ phone partner. (c) The bomb was developed at 28-92-66 62-01-08-19-15-39.

15. Construct a complete sentence of at least six words that uses no e’s and no t’s. 16. John Jay served as a delegate to the First and Second Continental Con- gresses, as well as in the Continental Congress, eventually being elected its president. He was only thirty-three when he was appointed Special Minister to Spain in 1779. This was an especially important post: Jay would have the responsibility for negotiating for the expected aid from Spain, perhaps even an alliance, as well as for the right for Americans to use the southern Mississippi for shipping (New Orleans then being in Spain’s control). Even before departing for Spain, Jay worried about the Spanish habit of reading diplomatic correspondence. He proposed the following to Robert R. Livingston, then delegate to the Continental Congress and Jay’s former law partner.

On Board the Confederacy near Reedy Island, 25 October 1779 Dear Robert ... To render [our correspondence] more useful and satisfactory a Cypher will be necessary. There are twenty six Letters in our alpha- bet. Take twenty six Numbers in Lieu of them thus.

abcdefgh i j k lmnopqr s tuvwxyz 5 6 7 11 13 8 9 10 12 14 16 19 22 1 2 3 4 15 23 25 26 24 20 21 18 17 Remember in writing in this Way to place a , after each number, and a ; or : or a - after each Word. This will prevent Confusion.” [Morris, pgs 656–666]

(a) A portion of the remainder of this letter has been enciphered using Jay’s method. Decipher it. 12, 25; 20, 12, 19, 19: 6, 13; 26, 1, 1, 13, 7, 13, 23, 23, 5, 15, 18- 25, 2; 20, 15, 12, 25, 13: 5: 20, 10, 2, 19, 13- Letter; 12, 1: Cypher- 23, 2; 22, 5, 1, 18; 20, 2, 15, 11, 23: 12, 1- Cypher- 5, 23; 20, 12, 19, 19: 6, 19, 12, 1, 11; 25, 10, 13- Sense: 20, 12, 19, 19; 6, 13- 23, 26, 8, 8, 12, 7, 12, 13, 1, 25; 5, 1, 11: 22, 2, 15, 13- 23, 5, 8, 13: 5, 23- 5: Discovery: 20, 12, 19, 19: 25, 10, 13, 15, 13, 6, 18- 6, 13; 15, 13, 1, 11, 13, 15, 13, 11- 22, 2, 15, 13: 11, 12, 8, 8, 12, 7, 26, 19, 25 (b) The message contains some of Jay’s thoughts on making an enci- phered message more secure. Do you agree or disagree with Jay’s thoughts? Explain. 5.10. EXERCISES 87

17. General Pierre G.T. Beauregard, departmental commander of South Car- olina, Georgia, and Florida forces of the Confederate Army sent Maj. Gen. Patton Anderson the following on 7 April 1864 [Gaddy]

“General: I inclose you herewith the following simple cipher for future use in important telegrams to these headquarters. For very important tele- grams the diplomatic cipher should be used. Please inform me of its reception. [Inclosure] AbyM HbyR ObyU UbyF BbyK IbyS PbyI VbyQ CbyO JbyV QbyGWbyD DbyA KbyH RbyY XbyT EbyN LbyX SbyE YbyB FbyC MbyP TbyZ ZbyJ G by W N by L [end quote]

(a) Using this cipher, ORMYXNEZUL, EUFZR OMYUXSLM gives Beauregard’s location. Where was he? (b) Using this cipher, Anderson’s location was KMXADSL CXUYSAM. Where is this?

18. (a) Here is a cipher sent during the controversy about the Florida elec- toral returns following the 1876 Presidential election. [Glover, page E-48].

Jacksonville, 13. GEO. P. RANEY: 1:12 a.m., Nov. 14.

Yeeiemnsppaissitpinsititaashshyyp iimimnssspeenaaaimaennsyisnpinsimi mpeaaityyen. DANIEL. As the New York Tribune put it, “It was evident, on a slight ex- amination, that each letter in this cipher was not a substitute for another letter. ... Probably, then, each letter in the cipher alphabet consisted of two characters.” Why? (b) The Tribune then continued with a second message, “which, being partly in plain English, seemed to promise a clew [sic].” 88 CHAPTER 5. MONOALPHABETIC CIPHERS

JACKSONVILLE, Nov. 22 S. PASCO, Tallahassee: Gave p p a i s h s h charge of i t y y i t n s he sent to m a p i n s i m y y p i i t But not to the other. Brevard returns sent you to-day E m y y p i s s a i n y Gone to Tallahassee Talla with him and let me know if I shall send trusty messenger. J. J. DANIEL. The Tribune started its explanation by noting that It appeared to be nearly certain that the first cipher word [in the second message] was the name of a person, and the second and third were names of counties. If we assume that each cipher [letter] consists of two letters, we must find as the equivalent of “ityyitns” a word of four letters, the first and third of which, “it,” are the same. “Dade” is the only name in the list of Florida counties which fulfils these conditions. The letters of “Dade” are repeated in the next word, and fit in with the obvious interpre- tation “Brevard. ... The construction of the rest of the alphabet was now easy. Please complete the construction of the alphabet, and so decrypt the messages. Chapter 6

Decrypting Monoalphabetic Ciphers

We begin with the unsolved cipher from Section 5.5:

KNHHXKK QS PXTDQSB YQFJ NSISCYS HQEJXUK QK LXTKNUXP AO FJXKX RCNU FJQSBK QS FJX CUPXU STLXP EXUKXVXUTSHX HTUXRND LXFJCPK CR TSTDOKQK QSFNQFQCS DNHI FJX TAQDQFO TF DXTKF FC UXTP FJX DTSBNTBX CR FJX CUQBQSTD QK VXUO PXKQUTADX ANF SCF XKKXSFQTD

The ciphertext has as its frequency chart

ABCDEFGHIJKLMNOPQRSTUVWXYZ 4 5 10 10 2 17 0 6 2 9 17 3 0 9 4 7 18 4 16 16 12 2 0 27 2 0

We have previously determined that this is not a Caesar cipher (as none of the usual hill and valley patterns fit) so we are treating it as a monoalphabetic cipher of unknown type. The most common letters are C, D, F, J, K, Q, S, T and X, so these are probably the substitutes for etaoinshr. But which is which and how can we decide? Just as ciphers with word spacing are generally easier to decrypt than those without, long ciphers are generally easier to decrypt than short ones. And for a simple reason: the longer the cipher is, the more accurate the information given by the frequency count is likely to be. If we are given a 100, 000 letter message enciphered with a monoalphabetic substitution (without other trick!), we can be quite confident that exactly one of the ciphertext letters will occur almost 13% of the time, and this letter stands for e, that the next most common letter will appear very near 9% of the time and will stand for t, etc. Very little thought will be needed to decrypt the cipher. It is the medium length ciphers of 25 to 100 letters, usually without proper word division, that give the beginning codebreaker some difficulty. We will concentrate on those in this chapter.

89 90 CHAPTER 6. DECRYPTING MONOALPHABETIC CIPHERS

6.1 Letter Interactions

We have previously given the frequencies of letters, initial letters and final let- ters. Now we must look in more detail at the characteristics of the letters: how they behave and interact with each other. Approximately 35% of letters in English are the vowels aeio and about 35% of letters are the dominant consonants hnrst. So if we can determine which ciphertext letters represent these nine letters about 70% of the decrypting will be done. (And the rest will usually be fill in the blank.) Thus the main challenge when tackling a monoalphabetic cipher is telling the etaoinshr letters apart. First, these nine letters tend to group themselves into three sets of three by the likelihood they are initial or final letters, as can be seen in Figures 5.6 and 5.6.

Initial and Final Letters: (1) t, o and s appear frequently as both initial and final letters. (2) a, i, and h appear frequently as initial letters, but much less so as final letters. (3) e, n and r appear frequently as final letters but much less so as initial letters.

While the doubling of a letter is fairly rare, in a long message there will almost certainly be some, and this help us tell the letters apart.

Doublings: (1) ee, tt, oo and ss are common doubles. (2) aa, ii, hh, nn and rr are less common.

Next, the vowels.

Vowels: (1) Vowels like to combine with consonants, but not with each other. (2) Vowels are friendly, willing to combine with many different letters, including the low frequency ones. (3) The only common pairs of vowels are ou, ea and io. (4) e is the easiest to find: it is very common and ends many words. (5) a frequently follows e but never precedes it. (6) The pair ie and ei is the only common vowel-vowel reversal. (7) e and o almost never touch each other and both will make doubles.

Finally, the consonants.

Consonants: (1) Consonants like vowels, but, as there are so many of them and so few vowels, they often combine with one another as well. 6.2. DECRYPTING MONOALPHABETIC CIPHERS 91

(2) h can be the easiest consonant to identify. It precedes vowels, rarely follows them, and can be found most often in th, he and ha. (3) t acts like a vowel in that it likes to mate with many other letters. It has a very strong desire to make th’s, and will make tt’s. (4) r will appear next to any vowel, and likes the company of the other high- frequency letters. (5) n and s act behave the reverse of h. They prefer to follow vowels and precede consonants. It can be hard to tell whether nst are vowels or consonants.

We summarize some of this information in Figure 6.1, using x and X to indicate strong and stronger behaviors.

a e i o t r n h s common x X x x x starting words x x x X x ending words X x x x x lots of mates x X x x x doubles x x x x

Figure 6.1: Some Basic Letter Behaviors

6.2 Decrypting Monoalphabetic Ciphers

Now its time we get decrypting. What do we do? 1) Make a frequency chart.

2) Does the frequency count suggest a Caesar cipher? Look for e and t, the aei and rst triples, the uvwxyz string, either forwards or backwards. If you find two or three of these patterns it may be a Caesar cipher and we can decrypt these. If not, continue with step 3.

3) Find the most common letters: these are probably etaoinshr.

4) Develop a digraph table for the probable etaoinshr letters. For each appearance of a common ciphertext letter list the letter that precedes it and the letter that follows it in the cipher. These tables are a bit time consuming to construct, but are very helpful, especially for ciphers without word breaks.

5) Use the letter behaviors table, Figure 6.1, to make initial guesses.

6) Work hard and be persistent. 92 CHAPTER 6. DECRYPTING MONOALPHABETIC CIPHERS

Let us illustrate these steps with the example given earlier. We start with an example that has word breaks because these are a bit easier. To help us later lets also number the words (1)KNHHXKK (2)QS (3)PXTDQSB (4)YQFJ (5)NSISCYS (6)HQEJXUK (7)QK (8)LXTKNUXP (9)AO (10)FJXKX (11)RCNU (12)FJQSBK (13)QS (14)FJX (15)CUPXU (16)STLXP (17)EXUKXVXUTSHX (18)HTUXRND (19)LXFJCPK (20)CR (21)TSTDOKQK (22)QSFNQFQCS (23)DNHI (24)FJX (25)TAQDQFO (26)TF (27)DXTKF (28)FC (29)UXTP (30)FJX (31)DTSBNTBX (32)CR (33)FJX (34)CUQBQSTD (35)QK (36)VXUO (37)PXKQUTADX (38)ANF (39)SCF (40)XKKXSFQTD

Step 1) Make a Frequency Chart:

ABCDEFGHIJKLMNOPQRSTUVWXYZ 4 5 10 10 2 17 0 6 2 9 17 3 0 9 4 7 18 4 16 16 12 2 0 27 2 0

Step 2) Is it a Caesar Cipher? This does not look like a Caesar cipher. X is probably e, but then U is a, T is z, S is y, etc., which looks bad.

Step 3) Find the common letters: The most common letters are C, D, F, J, K, Q, S, T, and X. (We usually pick the most common letters seven to ten letters.)

Step 4) Make a Digraph Chart: For each letter from Step 3) we build a column that gives each appearance of that letter. Let’s work on C’s column. C’s first appearance is in word (5), where it is preceded by S and followed by Y. So we put an SY in the C column. Similarly, an RN will come from C’s next appearance, in word (11). We will use “.” to indicate a space, or no letter. So word (15) provides with a .U, in word (28) with a , we put F. Doing this for each appearance of each common letter results in Figure 6.2.

Step 5) Make a Letter Behavior Table: The Digraph Table took some time, but makes it easy to build a Behavior Table. For example, there are four times .* appears in C’s column (here * just means “some letter”), so C must begin four words. Likewise one *. shows C ends one word. CDFJKQSTX count 10 10 17 9 17 18 16 16 27 start 4 3 7 1 1 5 2 3 1 end 1 3 3 1 7 0 4 0 8 mates 9 6 9 5 10 14 10 13 15 doubles 0 0 0 0 0 2 0 0 0 6.2. DECRYPTING MONOALPHABETIC CIPHERS 93

C D F J K Q S T X SY TQ QJ F. .N .S Q. XD HK RN N. .J EX XK DS QB XK PT .U TO .J FX K. YF NI SL JU JP .N .J FQ U. HE IC US LT .R QQ XJ FX Q. .K Y. HU UP QS .X SN FC TN QB .S FK F. .T QQ FX XX .S Q. SD K. .R T. .J FX B. KK .T .A J. .U AX QO FX UX .S TH .F PU SF T. T. P. NF TT XK LP K. OQ FC QF XP EU .C Q. AD C. DS KV .J TF DF TB NB VU .J Q. UB QT SD H. N. XQ BS .C UA UR C. XK .K XF QD LF SQ KX KU J. FT DT UT J. B. J. VU PK D. .K

Figure 6.2: Digraph Table

Step 6) Make our first guesses: (1) X=e: it is common, ends many words, starts few words. (2) F=t: it is common, starts many words, appears a lot with the one letter J, which is probably h. (3) J also mates a lot but starts and ends few words, also like h. These three guesses I’d be pretty sure of, especially after noting the five times FJX = the appears. (4) K ends words, does not start them, appears as a double. Maybe s or r? (5) S end words but does not start them. Maybe r or n? (6) Q and T start words but do not end them. Maybe a or i? (7) D doesn’t mate too much, but is common and both starts and ends some words. Maybe s? 94 CHAPTER 6. DECRYPTING MONOALPHABETIC CIPHERS

(8) Finally, C more or less matches the description o.

Next, lets use the information we strongly believe to determine some other letters. Start with FJX=the. FJXKX must either be there or these, so K is r or s. From earlier we’d guessed r or n. So probably K=r. Since S was r or n, that makes S=n. Then the second word of the cipher QS=*n must be in, so Q=i and T=a. (It would be odd for the second word of a sentence to be an. Of course, if we are wrong, we can always come back and try this later.) Now FC=t* must be to, so C=o, as we’d guessed previously. ciphertext CDFJKQSTX Summarizing our current guesses: plaintext o s t h r i n a e Now let’s substitute these.

KNHHXKK QS PXTDQS B YQF J N S I SCYS HQE J XU K r errin easin ith nnon ihe r QK LXTKNUXP AO F J XKX RCNU F JQS BK QS F J X ir ear e there o thinrinthe CUPXU STLXP EXUKXVXUTSHX HTUXRND LXFJCPK o e nae ereeane ae s ethor CR TSTDOKQK QS FNQFQCS DN H I F J X TAQDQFO TF o anas rir int itions the a isit at DXTKFFCUXTPFJXDTSBNTBXCRFJXCUQBQSTD seart to ea the san a eo the o i inas QK VXUO PXKQUTADX ANF S C F XKKXS FQTD ir e eri ase tnoterrentias

We’ve made progress, with many words looking like real English. But we’ve also made a couple of wrong guesses. Words (1) and (7) are not words, so there is a mistake here. Perhaps QK=is, so K=s? Changing this makes the final word of the cipher XKKXSFQTD=essentia*, hence D=l.(l is the tenth most common letter in English, after etaoinshr, so it is not surprising that it is one of the seven most common letters in this text.) With these two changes we have

KNHHXKK QS PXTDQS B YQF J N S I SCYS HQE J XU K s ess in ealin with nnon ihe s QK LXTKNUXP AO F J XKX RCNU F JQS BK QS F J X is eas e these o thinsinthe CUPXU STLXP EXUKXVXUTSHX HTUXRND LXFJCPK o e nae eseeane ae l ethos CR TSTDOKQK QS FNQFQCS DN H I F J X TAQDQFO TF o anal sis int ition l the a ilit at DXTKFFCUXTPFJXDTSBNTBXCRFJXCUQBQSTD least to ea the lan a eo the o i inal QK VXUO PXKQUTADX ANF S C F XKKXS FQTD is e esi a le tnotessential 6.2. DECRYPTING MONOALPHABETIC CIPHERS 95

At this point it is nearly fill in the blanks. (This is the opening sentence of Colonel Parker Hitt’s 1916 Manual for the Solution of Military Ciphers, one of the first serious American books on cryptology, enciphered with a keyword transposed cipher with keyword TRICKY.) 

Not all examples will be as quick as this one. You will often need to swap a pair of common letters, like we did from K=r to K=s. However, work me- thodically, on a large piece of paper, with pencil and not ink, and after a little practice soon you will be decrypting ciphers with the best of them.

Example: Decrypt the following cipher without word breaks. ROXEY ZLOHE QXHUW ROXEY HUHKX TVTHO BTZPB YVPBT RHKKB WYNVU ZOOBR VEOZR HKTRV BURBT HUWVU GDURY VZUYQ BXVUW BBWPV OOZOZ UBHUZ YQBON QHYZU BWZBT YQBZY QBODU WZBTY QBVOU HYDOB TQZNB IBOWV GGBOG DUWHP BUYHK KXROX EYZLO HEQXV TYQBZ OBYVR HKHUW HMTYO HRYRO XEYHU HKXTV TVTBP EVOVR HKHUW RZURO BYBWH IVWFH QU

Step 1) Frequency Chart: ABCDEFGHIJKLMNOPQRSTUVWXYZ 0 28 0 4 8 1 4 24 2 0 9 2 1 3 23 5 11 15 0 14 21 18 13 10 20 16

Step 2) Certainly this is not any sort of Shift Cipher.

Step 3) The most common letters are B, H, O, R, T, U, V, Y, and Z.

Step 4) See Figure 6.3.

Step 5) See Figure 6.4.

Step 6) The lack of information about initial and final letters make this cipher considerably more difficult to decrypt than the previous one. But we must start somewhere. B is most common letter, has lots of mates, is doubled, so maybe B=e. If this is so, the most common mates with B are OTUWYZ, so these may be consonants and HRV vowels. Looking at HRV, HV both have many low frequency mates, whereas R combines with mostly high frequency letters. So HV must indeed be vowels, but R is actually probably a consonant. Now V appears both before and after B=e, whereas H appears only after. Let’s guess V=i. The pair eo is very rare, so if H is to be a vowel it must be a. 96 CHAPTER 6. DECRYPTING MONOALPHABETIC CIPHERS

B H O R T U V Y Z OT OE RX .O XV HW TT EZ YL XU LH WO VH HH YP EH TP PT YU RX TH BZ VZ NU BV UO KW UK HB BV BR BR RE WN OR OR TO ZO ZH KR HW RB RV VU VU RK OB TV BH VG WU UQ OO RT RK EZ UB BY DR YZ ZQ OU QX TU VO UY BY ZY XU HZ UY WB BU OZ XO BQ VW PO TQ YU BW QY ZZ VH VY ZB BO ZQ WB UH UY BN HY MY HZ WG TQ BY QO WP BD YO XV ZB XT HD WB UW YK VU VH VV DW YR UH QN ZT OE DB WZ VB OH TT EZ YL QZ RK BW UO DW TT TQ BO QO KU BG BY EO BV RU ZT WM RX HW OR TO QV OR LH HH IW RR OT YU ZB HW EH NI UK YH ZR BB IO RK RX Q. GO KU VV PU WI RB QZ FQ OY TP OY YW

Figure 6.3: Digraph Table

Next, Q follows Y a remarkable six times, with B right behind each time. We must have YQB=the. Turning to our other high frequency letters, ORTUWZ, which is o? Of these letters, Z likes to combine with our vowels BHV the least. Is it possible Z=o? The outstanding undetermined letter is O. It appears with all our vowels, both precedes and follows each, and of its common mates almost all are high frequency letters. Of nrs this sounds most like r. So we now have the guesses B=e, H=a, O=r, Q=h, V=i, Y=t, Z=o. Let’s 6.3. SUKHOTIN’S METHOD FOR FINDING VOWELS 97

BHORTUVYZ count 28 24 23 15 14 21 18 20 16 mates 17 15 15 11 10 12 14 13 12 doubles 1 2 favorites OPQT KORUY BHOR HOV BVY BHWZ ORT BEHQ BOUY UWYZ VXZ TZ

Figure 6.4: Letter Behaviors substitute them and see how they look.

ROXEY ZLOHEQXHUWROXEYHUHKXTVTHOBTZP r toraha r taa iareo BYVPBTRHKKBWYNVUZOOBRVEOZRHKTRVBUR etie a etiorreiroa ie BTHUWVUGDURYVZUYQBXVUWBBWPVOO Z O ZUB ea i tiothei ee irroroe HUZYQ BONQHYZUBWZBTYQBZYQBODUWZ BTYQ a other hato e oe theother oe th BVOU H Y DOB T Q Z NB I BOWVGG B OG D UWH P B U YHK eiratrehoeer i er a e ta KXROX EYZLOHEQXVTYQBZOBYVRHKHUWHMTY r torahi theoreti aa a t OHRYROXEYHUHKXTVTVTBPEVOVRHKHUWRZU ratr taa iie iriaa o ROBY B WH I VWF H QU r e t e a i a h

While there is always a lot of hard work in decrypting, there is also usually a point at which the discoveries start to come at you very rapidly. We are now at that stage. Staring at the message there now many places letters jump out at us. The “tio the” in the middle of the third line says “n”, for example. So we know we are both on the right track, and very close to the end. 1 

6.3 Sukhotin’s Method for Finding Vowels

In a field like cryptography, there are always new tricks and shortcuts being proposed to simplify one or another part of the enciphering, deciphering or de-

1“Cryptography and cryptanalysis are sometimes called twin or reciprocal sciences, and in function they indeed mirror one another. What one does the other undoes. Their natures, however, differ fundamentally. Cryptography is theoretical and abstract. Cryptanalysis is empirical and concrete.” David Kahn. 98 CHAPTER 6. DECRYPTING MONOALPHABETIC CIPHERS crypting process. Areas of frequent interest are methods for decrypting monoal- phabetic ciphers more quickly. To give a flavor of these, we present a slight variation on a method for recognizing the vowels in a short ciphertext due to B. V. Sukhotin [Gray]. The idea behind this method is that vowels tend to have a wide variety of partners, with the vast majority being consonants. We should be able to make use of this to weed out the vowels from the consonants. The easiest way to explain the method is to use it. Let’s use our first example from Section 6.2 as the ciphertext. Step 1: Use the digraph chart to count the number of times each proposed etaonishr letter is in contact with another of these letters, ignoring any times they contact themselves, and total the results.

C D F J K Q S T X total C 0 2 1 0 1 3 0 0 5 D 0 0 0 0 3 0 5 2 10 F 2 0 8 1 5 2 1 1 20 J 1 0 8 0 1 0 0 6 16 K 0 0 1 0 5 0 2 7 15 Q 1 3 5 1 5 6 1 0 22 S 3 0 2 0 0 6 6 1 18 T 0 5 1 0 2 1 6 4 19 X 0 2 1 6 7 0 1 4 20

Step 2: The letter with the largest total is a vowel. Mark it so. Any letter contacting a vowel is probably not one. So we go through the remaining rows, and penalize each row by twice the number of times this vowel appears in their list.

C D F J K Q S T X total total2 C 02101300 7 5 D 0 0 0 0 3 0 5 2 10 4 F 2 0 8 1 5 2 1 1 20 10 J 1 0 8 0 1 0 0 6 16 14 K 0 0 1 0 5 0 2 7 15 5 Q 1 3 5 1 5 6 1 0 22 vowel S 3 0 2 0 0 6 6 1 18 6 T 0 5 1 0 2 1 6 4 19 17 X 0 2 1 6 7 0 1 4 20 20

Step 3: Again, the letter with the largest remaining count is a vowel. Mark it 6.4. FINAL MONOALPHABETIC TRICKS 99 and penalize those that touch it.

C D F J K Q S T X total total2 total3 C 02101300 7 5 5 D 0 0 0 0 3 0 5 2 10 4 0 F 2 0 8 1 5 2 1 1 20 10 8 J 1 0 8 0 1 0 0 6 16 14 2 K 0 0 1 0 5 0 2 7 15 5 -9 Q 1 3 5 1 5 6 1 0 22 vowel S 3 0 2 0 0 6 6 1 18 6 4 T 0 5 1 0 2 1 6 4 19 17 9 X 0 2 1 6 7 0 1 4 20 20 vowel

Step 4: Repeat twice more to find the other two aeio vowels.

C D F J K Q S T X total total2 total3 total4 total5 C021013007 5 5 5 1 D 0 0 0 0 3 0 5 2 10 4 0 -10 -10 F 2 0 8 1 5 2 1 1 20 10 8 6 vowel J 1 0 8 0 1 0 0 6 16 14 2 0 -16 K 0 0 1 0 5 0 2 7 15 5 -9 -13 -15 Q 1 3 5 1 5 6 1 0 22 vowel S 3 0 2 0 0 6 6 1 18 6 4 -8 -12 T 0 5 1 0 2 1 6 4 19 17 9 vowel X 0 2 1 6 7 0 1 4 20 20 vowel

So, according to this method, QXT and F are the vowels. Now no method is foolproof. This one heavily depends on the letter that contacts the common letters the most being a vowel, and if the first “vowel” designation is wrong, this method will quickly lead into a morass. Here, Q has the largest initial total, but only by 2. Both F and S have high totals, even though they are consonants. If either had occured a couple more times, we’d have had trouble. Similarly, no method should be applied without thought. At the end of Step 4, C had a total of 5 and F a total of 6. Which is more likely to be a vowel? Vowels tend to dislike one another, and C touches QTX a total of once, while F touches each of them for a total of 14 times. So, even though its total is a bit smaller, it is more likely that C is a vowel.

6.4 Final Monoalphabetic Tricks

The examples in this chapter hopefully have convinced you that even a good monoalphabetic substitution is not a very good cipher method. How, then, should we make a better cipher? The name mono gives us a hint: we must change the one-for-one letter replacement and cause multiple letters to represent 100 CHAPTER 6. DECRYPTING MONOALPHABETIC CIPHERS the same letter. That is the goal of our next chapter. We end this chapter with a couple of ways that can make monoalphabetic ciphers harder to decrypt. (These tricks can actually be used with most of the ciphers we will see.)

Homophones. When a substitution alphabet has multiple substitutions for a given letter these substitutes are called homophones. For instance, we may decide that every other e will be replaced by a z before the message is enciphered. Since z is so uncommon, our partner should be able to figure out we’ve replaced some e’s by z’s without prior warning. But to the enemy who intercepts the ciphertext, the tall 12% peak of e’s will be split into two far less visible 6% bumps. This would make the frequency analysis of our adversary a bit harder. Homophones are more commonly used when one is sending the ciphertext in numerical form. For example, consider the replacement list in a cipher of Henri IV of Navarre of France, c. 1600. [Pratt, page 64.]

plaintext abcdefghilnoprstuwxyz ciphertextA 31 26 27 28 31 29 3 33 12 14 44 15 16 17 9 20 21 22 23 24 25 ciphertextB 34 35 36 37 38 39 30 41 42 43 67 18 46 47 19 50 51 52 76 54 55 ciphertextC 37 59 60 61 62 40 64 65 66 85 45 69 70 49 73 74 75 77 78 ciphertextD 80 81 82 68 83 72 84 86 87

(Most of the unassigned numbers stood for common words: 10 = le, 39 = mon.) To encipher a letter choose one of the numbers beneath it. So tres might be enciphered as 20-17-31-9 or as 50-70-38-49. These multiple substitutes flatten the frequency chart, making it much harder for our adversary to decide which number(s) represent which letter. To make this effective we would probably want several homophones for each letter, and then somehow force ourselves to pick the homophones at random. We might have six homophones for each letter and then to encipher a letter we would roll a die and if the die comes up with a 4, then pick the fourth homophone as the cipherletter. Unfortunately, homophones make enciphering and deciphering much slower. Further, if the message is thousands of letters long frequency analysis will still win out. A handwritten page will have approximately 500 characters on it (about 25 characters per line and 20 lines per page) and a typewritten page can easily consist of 3000 characters (70 characters per line and 45 lines per page). For our die-homophone example, the cipher-numerals standing for uvwxyz still would very seldom be used, while those standing for e would be popular, and often occur paired with those standing for t and h. So while homophones are helpful, they cannot make monoalphabetic ciphers secure.

Nulls. Extra meaningless symbols that added to a text only to confuse the enemy analysts are known as nulls. One could spread a number of unpopu- lar letters randomly throughout the plaintext. Itz yisx xnot jtooquk qdifficult wtowq jread ax meksskage cbontgainuing nzullys, but it is harder to cryptanalize it. However, it is a bother to add (and later, remove) such letters, and they make 6.5. SUMMARY 101 our ciphertext much longer than it would otherwise have been. Nonetheless, all modern cipher techniques involve the use of nulls.

Salting the message. Many messages begin and end with similar, repeated information. It is standard practice to begin the message with whom it is for, and who is sending it: “From: Capt. Thomas, USS Lexington. To: Admiral Nelson, Enterprise Carrier Group, US South Pacific Force.” To prevent an adversary from guessing such a stereotyped beginning of a message, one salts the message by adding a meaningless strings to the beginning and/or end of the plaintext before it is enciphered. A different method with the same purpose is Russian Copulation. Cut the message approximately in half and the swap the two halves. This hides the beginning and ending somewhere in the middle of the message. Hopefully, this is of no bother to the person who will decipher the message, but will prevent the enemy from using information about the stereotyped beginning and end- ing of the message. Unfortunately, it is a very small leap from guessing that “Enterprise” or “Naval Task Force” appears at the beginning and/or end of a message to using such words and phrases as cribs for the entire message. Cribs are words or phrases thought to be part of the message. If, say, “carrier” is thought to appear in a monoalphabetic cipher, a quick scan for its distinctive letter pattern **XX**X, where X represents any particular letter, should be able to determine this. And if it is found, the codebreaker has an immediate decryp- tion of 5 letters of the code! We will not study them further, but clearly cribs are a very powerful tool to have when decrypting ciphers.

6.5 Summary

Decrypting shift ciphers involved only frequency analysis, really, just counting the letters. For more general monoalphabetic ciphers we need additional infor- mation about how the letters relate to one another. For example, while e and t both are very common letters, and both are among the most frequent final let- ters of words, e seldom starts words whereas t very frequently does. When the ciphertext is presented with word breaks, this information alone usually allows the identification of e and t, with h quickly following. Even without word breaks, vowels are much happier mating with consonants than with other vowels. We can thus use e and t as wedges to separate the vowels of aonirsh from the consonants. Using the differences in letter behavior should then give enough guesses at substitutions that we can then try out our guesses, fixing and revising as we make progress. Despite tricks like these, monoalphabetic ciphers simply are not very secure, and have not been for a very long time. 102 CHAPTER 6. DECRYPTING MONOALPHABETIC CIPHERS

6.6 Topics and Techniques

1. Of the etaoinshr letters

(a) Which three appear frequently both as initial and final letters? (b) Which three appear frequently as final letters, but less so as initial letters? (c) Which three appear frequently as initial letters, but less so as final letters? (d) Which four are most likely to make doubles?

2. Which are the most common vowel-vowel combinations? Are they actually very common?

3. Which letters do vowels prefer to associate with? What about consonants?

4. Which (type of) letters are the friendliest, associating with almost all others? Which are most discriminating, preferring the company of only a few other letters?

5. Are ciphertexts that retain their word breaks easier or harder to decrypt? Explain why.

6. What is a digraph table? How to construct one? What does it tell us?

7. What is a homophone?

8. Why can ciphers that involve homophones be more secure than those without?

9. What is a null? How do nulls make the cryptanalyst’s life more difficult?

6.7 Exercises

1. Decrypt the following cipher. UN MGHHYV LNS VYETEHGUH HLY KVQFHNCVGM GDD HLGH TE VYGDDQ UYYRYR TE GU YUHVQ HLY TRYUHTATKGHTNU NA NUY SNVR NV NA HLVYY NV ANPV DYHHYVE LYDYU ANPKLY CGTUYE Frequency count: ABCDEFGHIJKLMNOPQRSTUVWXYZ 5 0 2 6 6 1 9 15 0 0 3 7 2 12 0 2 3 4 2 8 10 12 0 0 19 0

2. Decrypt the following cipher. GNLTA DTPUH IOHJQ BTCNJ QHCVN DQPIQ ZNDCN ZJTGD TQHIQ TWWHO TIGTH IQBTF NDWLQ NLPSH QVDNL XGTJC XGBCN DTPIL CXGBC NDTQD XJQFN DQBSH IZNDC PQHNI QBPIJ VHTJ 6.7. EXERCISES 103

Frequency count: ABCDEFGHIJKLMNOPQRSTUVWXYZ 1 6 8 11 0 2 7 10 9 7 0 4 0 13 2 6 14 0 2 13 1 2 3 4 0 3

3. Decrypt the following cipher. PYJQG KYUQR CQEQR LOTGQ RQTRT RYTAH KYLTE HAWEJ QRWHQ RCTAW UHEWR PQAYW UQKWV YDWEH YPNGT RQHLT UYHQL YHKWR QHPYE YUVYE JKWUF YEBWB BWCY Frequency count: ABCDEFGHIJKLMNOPQRSTUVWXYZ 4 3 3 1 8 1 3 9 0 3 5 4 0 1 1 4 13 10 0 8 6 2 11 0 14 0

4. Fletcher Pratt used a Keyword Cipher to encipher the following message. Can you decrypt it, and recover the keyword? SZPQP ERHKQ PCRKJ VZXPU PJSZP GKRSC GCSPT QIQXL SKNQC LZPQR ZXTFN ZPRES CSPFK JNKUP QCRDG LFPRT HRSES TSEKJ IELZP Q 5. Decrypt the following cipher. AOPBO GDGPG DUKLD NWCTD OALTQ NGUMT AEDKR WDLWG WCUTD MDKAZ ZCGGC TOUFG WCNZA DKGCH GGASC KCDGW CTODK MZQUT DKMTU PNOUF LUKOG AKGZC KMGWW AXCBC CKTCN ZALCV BQUGW CTZCG GCTOF DMPTC OODMK OUTLU EBDKA GDUKO UFGWC EDKAL LUTVA KLCRD GWAVC FDKDG COQOG CEAKV SCQ. Frequency count: ABCDEFGHIJKLMNOPQRSTUVWXYZ 14 4 25 20 4 5 23 1 0 0 18 9 7 5 14 4 5 2 2 14 14 4 11 1 0 7

6. Among the techniques the British used during the Revolutionary War was a relatively simple monoalphabetic cipher. It was apparently used by Sir Henry Clinton, the commander-in-chief of His Majesty’s Forces in North America from 1778 to 1782. Decrypt the following fragment of a letter sent in this cipher.

75 62 55 67 77 68 74 69 71 68 69 68 72 55 54 73 62 55 66 61 73 73 55 71 73 68 66 55 63 73 62 68 74 61 62 73 63 62 51 54 60 74 65 65 77 55 76 89 65 51 63 67 55 54 66 77 72 55 65 60

Hint: start with a frequency count of the pairs of numbers.

7. A technique for making a monoalphabetic cipher with homophones is to take a long word or phrase, number the letters in order of appearance, and then use those numbers as substitutes for the letters. Any letters that do not appear are numbered at consecutively at the end. (This method apparently originated with the Argenti’s [Kahn, 113].) In a letter on 9 Oct 1863 from 1st Lt. Stephen M. Routh, Chief Signal Officer of the District of West Louisiana, Confederate Army, to Maj. Gen. 104 CHAPTER 6. DECRYPTING MONOALPHABETIC CIPHERS

Stephen D. Lee, it was suggested that “WILHELMINUSVOLKSVEST” be used in this manner. The numbering is

WILHELMINUSVOLKSVEST 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 As H appears only once, there is only one replacement for it. On the other hand, L appears three times so it has three possible replacements. The letters that don’t appear in the keyword are numbered at the end, so A is 21, B is 22, and so on. When used with a long keyphrase the number of replacements for each letter will mimic how frequently they appear in normal language. If the substitute for a letter is always randomly chosen from the set of replace- ments, then this leads to a very flat frequency count. Hence this method can produce quite secure ciphertexts. Unfortunately, people tend to get lazy and generally reuse the same couple of replacements, destroying the strength of the system.

(a) Routh’s word was later taken up by General Robert E. Lee. He used it to encipher a message he sent on 23 Nov 1863 [Gaddy]. The following contains Lee’s location and the name and location of the intended recipient. Decipher the message.

25 30 13 7 4 29 23 21 12 21 3 30 32 2 9 7 8 11 16 2 19 11 8 28 28 2 26 30 21 9 21 24 21 20 13 23 13 6 22 5 9 27 21 7 8 9 16 1 18 1 14 3 21 21 26 7 5 30 2 24 8 21 9

(b) In June 1864 a Confederate major of Gen. E. Kirby Smith’s trans- Mississippi command deserted and disclosed a cipher similar to the one above, based on the word “impersonificationaly” [Gaddy]. Decipher the following silly phrase (designed to take advantage of the homophones). 1 5 18 17 18 31 18 20 13 8 22 1 2 24 15 22 11 8 23 9 17 2 7 17 14 18 8 13. 8. The most famous cipher story in history is Edgar Allen Poe’s “The Gold Bug”. First published in 1843, it caused a sensation. By far Poe’s most popular story, it gave Poe an international reputation as a great cryptog- raphy. In the story, William Legrand is living on an island near South Carolina. By legend, this island was once the home of the pirate Captain Kidd. Legrand has found a new species of beetle, a gold-colored one. He draws a picture of this bug to show to a friend. It happens that the paper he uses contains a secret message written in invisible ink that just so happens to become visible when the friend just so happens to hold the paper close, but not too close, to a fire. Here is the message. 6.7. EXERCISES 105

53++!305))6*;4826)4+.)4+);806*;48!8‘60))85;]8*:+*8! 83(88)5*!;46(;88*96*?;8)*+(;485);5*!2:*+(;4956*2(5* -4)8‘8*; 4069285);)6!8)4++;1(+9;48081;8:8+1;48!85;4 )485!528806*81(+9;48;(88;4(+?34;48)4+;161;:188;+?;

Legrand’s friend (the narrator of the story) is quickly able to decrypt this cipher and find the hidden treasure. Can you? Hint: Do this like any monoalphabetic cipher. 9. Some folks get very emotional about cryptology. The great cryptologist Blaise de Vigen´ere(Traicte’ des Chiffres, 1585) certainly did. 02-17-17 15-2-9-8-11-24 20-10 16-24-11-24-17-4 20-20-13-21-24-11 2-15- 25 2 10-24-0-11-24-9 06-11-20-9-20-15-22 9-21-24 22-11-24-2-9 15-2-16- 24 2-15-25 24-10-10-24-15-0-24 14-23 22-14-25 2-15-25 21-20-10 6-14-15- 25-24-11-10 9-21-24 7-24-11-4 25-24-24-25-10 13-11-14-19-24-0-9-10 6- 14-11-25-10 2-0-9-20-14-15-10 2-15-25 25-24-16-24-2-15-14-11 14-23 16- 2-15-18-20-15-25 6-21-2-9 2-11-24 9-21-24-4 23-14-11 9-21-24 16-14-10-9 13-2-11-9 1-8-9 0-2 0-20-13-21-24-11 10. E. A. Poe was aware of the use of homophones to help secure a monoal- phabetic cipher. At one point he used the le gouvernement provisoire as a substitution alphabet:

plain a b c d e f g h i j l m n o p q r s t u v x y z cipher L E G O U V E R N E M E N T P R O V I S O I R E

(a) Encipher Monday. (b) Decipher ETONNNE. (c) William Friedman [Friedman, page 168] was not impressed with this method. Why?

11. Albert Myer used the following homophonic system. Set up the alphabet in columns: [SSA2, page 12 or Myers pages 101-111, 264-5.] element 1: A F K P U V element 2: B G L Q W element 3: C H M R X element 4: D I N S Y element 5: E J O T Z To encipher L, since is in the 2nd row, and the 3rd in that row, L = 23. To give homophones, now pick any letter from row 2, say W, and any from row 3, say R. So L becomes WR. Similarly f becomes FB or KW. Decipher UV CR PA FK KJ OF BH TC FG DS LL SR KU BX YI. 106 CHAPTER 6. DECRYPTING MONOALPHABETIC CIPHERS

12. Frequency analysis was probably first invented by the Arabs. Decrypt the following message. The ideas are due to Ibn ad-Duraihim as paraphrased by Qalqashandi in 1412’s Subh al-a sha EJJLF GZPGE BAABO HJPPT OXJHA UOBVT BQAER ZQAGB XBHHD NBDQI PJXCD OBAGZ HFZAG AGBCD AABOQ JUEBA ABOUO BVTBQ PRCOB MZJTH ERXBQ AZJQB IFGBQ RJTHB BAGDA JQBEB AABOJ PPTOH ZQAGB XBHHD NBXJO BJUAB OQAGD QAGBO BHAAG BQDHH TXBAG DAZAZ HDEZU AGBQD HHTXB AGDAA GBQBY AXJHA UOBVT BQAZH EDXAG BDPPT ODPRJ URJTO PJQSB PATOB HGJTE IWBPJ QUZOX BIWRA GBUDP AZQDX DSJOZ ARJUP JQABY AHEDX UJEEJ FHDEZ U

ABCDEFGHIJKLMNOPQRSTUVWXYZ 40 45 3 20 13 4 19 21 4 24 0 1 1 2 19 15 20 8 2 13 12 3 2 13 2 15

13. The aftermath of the election of 1876 provides us with a number of inter- esting ciphertexts [Elements and Secret 1876].

Jacksonville, Nov. 16 (1876) Geo. F. Raney, Tallahassee:

pp yy em ny yy pi ma sh ns yy ss it ep aa en sh ns pe ns sh ns mm pi yy sn pp ye aa pi ei ss ye sh ai ns ss pe ei yy sh ny ns ss ye pi aa ny it ns sh yy sp yy pi ns yy ss it em ei pi mm ei ss ei yy ei ss it ei ep yy pe ei aa ss im aa ye sp ns yy ia ns ss ei ss mm pp ns pi ns sn pi ns im im yy it em yy ss pe yy mm ns yy ss it sp yy pe ep pp ma aa yy pi it

L’Engle goes up to-morrow. (signed) Daniel

In this cipher pairs of letters were used as substitutes for individual letters. The quote above highlights this by seperating the pairs. (In the original letter this was not the case, and the first progress towards decyphering the letter was realizing the double-letter nature of the cipher.) Given this, and the hint that neither e nor t is the most common letter, can you decrypt it?

14. Consider the following telegram [Hassard and Secret 1876].

TALLAHASSEE, Mov. 19, 1876 J. J. DANIEL, Jacsonville, Fla.:

84 55 89 31 93 27 66 89 27 20 42 66 34 55 33 93 20 34 84 55 55 39 93 42 55 33 93 48 44 55 52 27 66 33 20 20 55 31 31 66 42 27 82 96 96 93 20 82 66 48 93 52 27 93 44 93 34 82 31 31 27 6.7. EXERCISES 107

93 93 82 48 39 66 82 20 34 42 82 48 93 44 82 96 39 66 42 48 82 84 52 31 66 42 27 66 75 55 52 48 39 66 82 33 93 20 93 39 55 27 82 48 66 52 48 44 55 42 82 48 89 84 55 96 96 52 33 82 84 66 48 93 20 89 93 27 48 93 42 20 66 89 27 31 93 48 48 93 42 96 55 20 82 68 82 93 20 66 27 66 75 55 87 93 82 33 99 52 33 84 48 82 55 33 66 77 66 82 33 27 48 77 55 87 93 42 33 55 42 84 66 33 87 66 27 27 82 33 77 93 31 93 84 48 55 42 66 31 87 55 48 93 66 33 10 96 66 33 20 66 96 52 27 48 55 96 66 25 93 96 84 31 82 33 66 33 20 84 55 34 77 82 33 66 84 48 82 96 96 93 20 82 66 48 93 31 89 34 82 31 31 75 93 27 55 52 77 44 48 48 55 96 55 42 42 55 34 L’ENGLE.

At the New York Tribune reported in its expose on the topic

It was natural to suppose that this dispatch referred to the chief topic of the day, and if so the word “canvass” must be in it somewhere. The problem was, therefore, to find a combination of seven [pairs of] numbers, of which the second and fifth, standing for A, should be the same, and the sixth and seventh (SS) also the same. The translator began at the beginning and tried every sequence of ciphers until at the end of the twelfth line one was found which fulfilled the desired conditions, namely, ...

what? Find the set of 7 pairs of numbers that fit the letter pattern of “canvass.” Then, with some frequency work, you should be able to decrypt this cipher. Do so. 108 CHAPTER 6. DECRYPTING MONOALPHABETIC CIPHERS Chapter 7

Vigen`ereCiphers

It was the amateurs of cryptology who created the species. The professionals, who almost cer- tainly surpassed them in cryptanalytic expertise, concentrated on the down-to-earth problems of the systems that were then in use but are now outdated. The amateurs, unfettered to these re- alities, soared into the empyrean of theory. David Kahn The Codebreakers

We’ve seen four types of monoalphabetic ciphers: Caesar Ciphers: shift the letters of the alphabet by some fixed amount. Decimation Ciphers: multiply by some fixed amount. Linear Ciphers: multiply and shift. Keyword Ciphers: use a keyword and columns to make a new alphabet. In these ciphers each plaintext letter is replaced throughout the entire message by the same ciphertext letter. Because of this one-to-one correspondence, fre- quency analysis allows anyone to decide which letter is posing as which, and hence to decrypt the message. How, then, should we make a better cipher? We must change this one-for- one replacement and find a way to cause multiple letters to represent the same letter. We must (re)invent polyalphabetic ciphers. Apparently, the first monoalphabetic cipher was thought up by one person in one sudden intellectual burst. Not so for the family of polyalphabetic ciphers. To fully develop this idea took four people: an architect, a cleric, a courtier and a scientist, none of whom were cryptologists by profession.

109 110 CHAPTER 7. VIGENERE` CIPHERS

7.1 Alberti, the Father of Western Cryptology

Leon Battista Alberti (1404-1472) was born in Florence, Italy, the illegitimate but favored son of a rich merchant. Although not well-known today, in the Renaissance he was generally considered to be second only to Leonardo da Vinci in terms of all-around talents. For example, as an architect Alberti1 designed the first Fountain of Trevi in Rome, his De Re Aedificatoria was the first printed book on architecture and was the “theoretical cornerstone of the architecture of the Renaissance” (Kahn). He was one of the best organists of his day, as well as a painter. His writings include poems, fables, comedies, law, and the first scientific study of perspective. He was also an excellent athlete. [Kahn] In 1466, at age 62 or 63, wrote for his friend Leonardo Dato, the papal secretary, what is the West’s oldest book on cryptology, De cifris. It was only 25 pages long, but included

1. Frequency analysis. Probably not invented by him, but presented in a relatively developed form.

2. The use of code letters for common words and phrases. For example, ZZ = rome, XZ = pope, etc. Further, he changed these regularly, so later Pope might be ZAZ and later still XXY. These letters are enciphered along with the rest of the text, in a method called enciphered code. So Pope would be first replaced by XZ and then XZ would be enciphered like any other plaintext word. This method was so far ahead of its time that it was almost 400 years until it was generally used.

3. The first mechanical cipher device. Known as the Alberti disc, it is very similar in function to the Saint Cyr slide but has the alphabets arranged in circles, the plaintext inside and ciphertext outside.

4. Primitive polyalphabetic ciphers. His idea is both very clever and very simple: use a different Caesar cipher with each word!2

Examples: Use Alberti’s methods on the following.

(1) Encipher Vatican City and Rome by using the first letter of each word as the key. Vatican we encipher with key V, giving QVODXVI. For City we use key C, giving EKVA. Similarly with key A, and = AND (A is a foolish key) and, finally, Rome = IFDV. So the ciphertext is QVODXVI EKVA AND IFDV.

(2) Encipher Fountain of Trevi using the second letter of the word as the key.

1who shouldn’t be confused with Giovanni Battista Argenti of 5.1, despite the similar names 2Alberti actually only changed the key every couple of words. 7.2. TRITHEMIUS, THE FATHER OF BIBLIOGRAPHY 111

(3) Decipher JCQNGI RRIIZMVJ MUZLLAZ BUIVSTEBZ. Hint: The first letter of each word is the key.

(4) Decipher LZWS XYXY RDGGV SBHSFJWOO CQNBRMNJ KVVYH Hint: the last letter of the word is the key.3



Look again at part (1) of this example. The i in Vatican became O but the i in City became K. Similarly, the t once became D and once V. A single plaintext letter was enciphered to different ciphertext letters! This is the trademark of a . A monoalphabetic (mono = one) cipher uses only one ciphertext alphabet throughout the message, so each plaintext letter can become only one ciphertext letter. A polyalphabetic cipher (Poly = many) provides for several ciphertext alphabet, which allows each plaintext letter to become several different ciphertext letters. As we will see, that the same plaintext letter can become different ciphertext letters provides for a very strong cipher. The weakness of Alberti’s method is a bit more subtle. Can you find it? (Try to decipher parts (1) or (2) from the example.) 4

7.2 Trithemius, the Father of Bibliography

Johannes Trithemius (February 2, 1462 – December 15, 1516) is the second of our four developers. By the age of 22, Trithemius was the abbot of the Bene- dictine abbey of Saint Martin. His most important work, Liber de scriptoribus ecclesiasticis, was a chronological list of about 7000 theological works, it was published in 1494 and led to his title. In 1499 he wrote Stenanographia, meaning “covered writing”. In it are some examples of simple substitutions (swapping all i’s and t’s with each other, for instance) and some null ciphers. (These ciphers appear like normal corre- spondence but hide messages within them, for example, by having only certain letters be meaningful, such as the initial letters in “Can’t order donuts every Sunday”.) The third portion was a study of what we’d now call magic and the occult. This portion brought him fame and notoriety and in 1609 the book was placed on the Catholic Church’s Index of Prohibited Books. Starting in 1508 Trithemius published the six-part Polygraphia (“many ways of writing”). In Book V appears the world’s first known tableau, or table. The simplest tableau is known as Trithemius’ tabula recta, and appears as Figure 7.1. (Trithememius’ recta came from the Latin alphabet, and so was 24 × 24, rather than 26 × 26.)

3(2) TCIBHOWB TK KIVMZ, (3) The XZ=pope arrives in ZZ=Rome on thursday, (4) The ZAZ=pope will enter the side door. 4We must include the keyletter(s) with the message. If the method is simply “use the first letter of the words as the key”, then that letter is enciphered and is hard to determine! 112 CHAPTER 7. VIGENERE` CIPHERS

abcdefghijklmnopqrstuvwxyz AABCDEFGHIJKLMNOPQRSTUVWXYZ BBCDEFGHIJKLMNOPQRSTUVWXYZA CCDEFGHIJKLMNOPQRSTUVWXYZAB DDEFGHIJKLMNOPQRSTUVWXYZABC EEFGHIJKLMNOPQRSTUVWXYZABCD FFGHIJKLMNOPQRSTUVWXYZABCDE GGHIJKLMNOPQRSTUVWXYZABCDEF HHIJKLMNOPQRSTUVWXYZABCDEFG IIJKLMNOPQRSTUVWXYZABCDEFGH JJKLMNOPQRSTUVWXYZABCDEFGHI KKLMNOPQRSTUVWXYZABCDEFGHIJ LLMNOPQRSTUVWXYZABCDEFGHIJK MMNOPQRSTUVWXYZABCDEFGHIJKL NNOPQRSTUVWXYZABCDEFGHIJKLM OOPQRSTUVWXYZABCDEFGHIJKLMN PPQRSTUVWXYZABCDEFGHIJKLMNO QQRSTUVWXYZABCDEFGHIJKLMNOP RRSTUVWXYZABCDEFGHIJKLMNOPQ SSTUVWXYZABCDEFGHIJKLMNOPQR TTUVWXYZABCDEFGHIJKLMNOPQRS UUVWXYZABCDEFGHIJKLMNOPQRST VVWXYZABCDEFGHIJKLMNOPQRSTU WWXYZABCDEFGHIJKLMNOPQRSTUV XXYZABCDEFGHIJKLMNOPQRSTUVW YYZABCDEFGHIJKLMNOPQRSTUVWX ZZABCDEFGHIJKLMNOPQRSTUVWXY

Figure 7.1: Trithemius’ tabula recta

To use the tabula recta, find your plaintext letters in the outside top row. Use the first row as the ciphertext alphabet for the first letter, the second row as the ciphertext alphabet for the second letter, the third row as the ciphertext alphabet for the third letter etc., repeating after 26 letters. In the language of our Caesar ciphers, we encipher each letter using a Caesar cipher but change the key with each letter. We use A as the key for the first letter of the message, B as the key for the second letter, C for the third, etc. This gives a key progression (or a progressive key), since the key progresses with each new letter.

Examples:

(1) Encipher table. To use Trithemius’ tabula, find the first letter of the message t and read down to the A row, a T. Then find the next letter a and read down to the B row, a B. Find b and read down to the C row, a D. Find l and read down to the D row, a O. Finally e in the E row is an I. The answer is TBDOI. Using a Saint Cyr slide one enciphers as usual, simply remembering that each letter gets its own key. With the slide, move the key one letter forward each time you encipher a letter.

(2) Encipher rectangle.

(3) Decipher DJCJS SGS. 7.3. BELASO, THE UNKNOWN AND PORTA, THE GREAT 113

(4) Decipher FFFHVFR. 5



The influence of this system was great, in part because of Trithemius’ no- toriety and in part because his was the first published book on cryptology. Letter-by-letter enciphering quickly became common and the basis for many cipher systems. In some ways this system is better than Alberti’s because a new alphabet is used with each letter. Even very recognizable words like Mississippi will become well-enciphered. (Alberti’s method with key F gives FRNXXNXXNUUN, putting the keyletter first, while Trithemius’ method gives MJUVMXYPXYS.) It also has clear weaknesses. For example, if the plaintext contains alpha- betically anti-consecutive letters, as in pon of pontoon or fed of federal then the cipher text will contain a consecutive copies of the same letter.6 While not terribly common, this allows easy entries to decrypting the cipher. (This last is Porta’s idea.) Nonetheless, Trithemius contributed the idea of changing the key with each letter.

7.3 Belaso, the Unknown and Porta, the Great

Giovan Batista Belaso, unlike the others appearing in this chapter, is someone we know little about. He was from Brescia, Italy and lived during the 1500’s. What we do know is that in 1553 he published La cifra del. Sig. Giovan Balista Belaso in which he described a ciphersystem with an easily remembered and easily changed key. He called the key a countersign, although we will just call it the keyword. His cipher became known as the Vigen`erecipher and is the most important and most used cipher in history. (Vigen`erewill appear in Chapter 8.) Belaso’s idea is an extension of Trithemius’s: to encipher pick a keyword and use its letters cyclically as the key in a Caesar cipher to encipher the text.

Example: Encipher eek, eek, I saw a mouse near that computer us- ing the keyword TYPE.7 plaintext eekeekisawamousenearthatcomputer key TYPETYPETYPETYPETYPETYPETYPETYPE ciphertext XCZIXIXWTUPQHSHIGCPVMFPXVMBTNRTV Answer: XCZIX IXWTU PQHSH IGCPV MFPXV MBTNR TV. 

5(2) RFEWESMSM, (3) diagonal, (4) federal. 6pontoon is enciphered as PPPWSTT. 7Belaso was careful to use longer keywords, or better, phrases, like OPTARE MELIORA and BIRTUTI OMNIA PARENT. 114 CHAPTER 7. VIGENERE` CIPHERS

Giovanni Battista Porta (1535–1615) was probably the outstanding cryp- tographer of the Renaissance. In addition, he organized the first association of scientists, the “Academia Secretorum Naturae”. He was also a prolific writer: between 1586–1609 published books on human physiognomy, meteorology, the design of villas, astronomy, as well as 14 prose comedies. In 15638 he published De Furtivis Libetarum Notis. Its four books dealt with, respectively, ancient ciphers, modern ciphers, cryptanalysis, and linguis- tic peculiarities, and encompassed all the cryptologic knowledge of the time. According to Kahn, it is one of the few books of the period that is still read- able. (In it it appears that Porta nearly learned out how to break the Vigen`ere ciphers. If he had, they probably would not have become popular, in which case this chapter wouldn’t exist. But he didn’t quite, and so here we are.) Porta’s contribution was two-fold. First, his book was very influential and made the polyalphabetic ciphers popular. Second, he combined the letter-by- letter encipherments of Trithemius, the easily changed key of Belaso, and the mixed alphabet suggested by Alberti. Rather than simply using the 26 regu- lar alphabets (from Trithemius’ table) as the ciphertext alphabets, his cipher allowed for mixed alphabets.

7.4 Vigen`ereCiphers

The Vigen`erecipher is one of the most influential ciphers in history. It is simple to use and easy to remember, and, because it is polyalphabetic, is much more secure that the ciphers we have previously studied.

Examples:

(1) Encipher Meet me at the Met at the time ten twenty using the key- word CODE. plaintext meetmeatthemetatthetimetentwent key CODECODECODECODECODECODECODECOD ciphertext OSHXOSDXVVHQGHDXVVHXKAHXGBWAGBW Yes, this is a silly message. But notice how many times the e’s and t’s and m’s go to different letters. There are 9 e’s and 10 t’s in the plaintext, but in the ciphertext no letter appears more than 5 times. (2) As his example of Vigen´ere, Parker Hitt used the key GRANT to enci- pher All radio messages must hereafter be put in cipher. What is the ciphertext? (3) Charles Dodgson (Lewis Carroll) reinvented the Vigen`erecipher in 1868, calling it the Alphabet Cipher. In his diary he used the keyword VIGILANCE 8Why did the 1500’s have this sudden explosion of interest in cryptology? ”The growth of cryptology [in the west in the 1500’s] resulted directly from the flowering of modern diplomacy” [Kahn, pg 108] since permanent ambassadors needed to send home regular reports. 7.4. VIGENERE` CIPHERS 115

to encipher Meet me on tuesday eve [Abeles, p. 326]. What did he find as the ciphertext? (4) Decipher TMOQE JKCNF SJDOE ESF using the keyword HOLMES. Deciphering is set up as you probably think: ciphertext TMOQEJKCNFSJDOEESF HOLMESHOLMESHOLMES (5) Decipher WYONT REJOL BXNUQ IZHS using the keyword AROUND. (6) During World War I the German Army used a cipher that is equivalent to a Vigen`erecipher with keyword ABC. To find out in which year, decipher NJPEUGEO HOVTTFGN. (7) Joseph Willard Brown claims that IN GOD WE TRUST was used as a key to encipher the message Longstreet is marching on Fisher’s Hill [Brown]. If so, what was the ciphertext?9



How does being polyalphabetic influence the frequency counts? Here is an enciphered message. QNTQH PIECN SCPOE QXPZC FFILF ACNMY FRRPW XUTKK RROLV CVEQX KASFS IMGSY XYRRM POEFH MYROI OEEUO EVSET EVPIX KAEFC ZGKEQ AHTAK KXTEI EGRNA DZPKM MDCAC ZGPVT QTFHZ OXZDE RFEWX YFWVN ARIGW AANUG RNQAE FXSMT FHCZG RWTMP ZDJII YQRRN QLDCV NKTHI VTKP

ABCDEFGHIJKLMNOPQRSTUVWXYZ 11 0 10 5 16 12 7 6 10 1 10 3 8 9 7 10 9 14 6 11 3 8 5 9 6 8

Could this be from a monoalphabetic cipher? Almost all the letters appear, and most appear between 5 and 10 times each. This frequency chart is much flatter than any we’ve see in the past. So, no, this ciphertext cannot be from a monoalphabetic cipher. Perhaps we could break this cipher by exhaustion, by trying each possible keyword? Well, there are 26 one-letter keywords, 262 = 576 two-letter keywords, 263 = 17576 three-letter ones, etc.. Here I used a five-letter keyword, and there are 265 = 11, 881, 376 possible five-letter keywords! It takes very little imagination to realize that trying all possible keywords is impossible. The Vigen`erecipher, for much of its lifetime, had two very strong advantages over the ciphers we have previously seen:

1. It was apparently unbreakable, in fact, it was basically unbroken until the 19th century.

9(2) GCLET JZOZX YJATX YDUFM NVRRT LKEEU KGUGB TTICA KI, (3) HMKBX EBPXP MYLLY RXI, (4) my dear doctor watson, (5) what goes around comes, (6) nineteen fourteen, (7) TBTUV PVXVN ALUNX QKERZ FHXBA UKFVD MEC 116 CHAPTER 7. VIGENERE` CIPHERS

2. Each ambassador, messenger and spy could have their own key that could be changed easily if stolen or lost. (Generally the keywords were phrases, i.e., “God save the Queen”.)

It did, however, have the reputation for being cumbersome and prone to error, which meant it was used less than one would otherwise expect. Even so, it served as a prototype for many ciphers used by professionals.

7.5 Variants and Beaufort

In 1857 Admiral Sir Francis Beaufort of the Royal Navy (known for his scale of wind speed) and his brother published what they thought was a new cipher.10 They put their Beaufort Tableaux (Figure 7.2) on 4×5-inch cards and sold these as a new way of secret writing “adapted for telegrams and postcards” (postcards having recently been invented). To use the card, they wrote Let the key for the foregoing [message] be a line of poetry or the name of some memorable person or place, which cannot easily be forgotten ... Now look in the side column for the first letter of the text (t) and run the eye across the table until it comes to the first letter of the key (v), then at the top of the column in which v stands will be found the letter c. ([Kahn], page 202.) So the plainletter t is enciphered by V to give the cipherletter C. (Actually, one can look either up or down to find the ciphertext.) Clearly this cipher method is very closely related to the Vigen`erecipher. It does have one small advantage: deciphering is exactly the same process as enciphering.

ABCDEFGHIJKLMNOPQRSTUVWXYZ BCDEFGHIJKLMNOPQRSTUVWXYZA CDEFGHIJKLMNOPQRSTUVWXYZAB DEFGHIJKLMNOPQRSTUVWXYZABC EFGHIJKLMNOPQRSTUVWXYZABCD FGHIJKLMNOPQRSTUVWXYZABCDE GHIJKLMNOPQRSTUVWXYZABCDEF HIJKLMNOPQRSTUVWXYZABCDEFG IJKLMNOPQRSTUVWXYZABCDEFGH JKLMNOPQRSTUVWXYZABCDEFGHI KLMNOPQRSTUVWXYZABCDEFGHIJ LMNOPQRSTUVWXYZABCDEFGHIJK MNOPQRSTUVWXYZABCDEFGHIJKL NOPQRSTUVWXYZABCDEFGHIJKLM OPQRSTUVWXYZABCDEFGHIJKLMN PQRSTUVWXYZABCDEFGHIJKLMNO QRSTUVWXYZABCDEFGHIJKLMNOP RSTUVWXYZABCDEFGHIJKLMNOPQ STUVWXYZABCDEFGHIJKLMNOPQR TUVWXYZABCDEFGHIJKLMNOPQRS UVWXYZABCDEFGHIJKLMNOPQRST VWXYZABCDEFGHIJKLMNOPQRSTU WXYZABCDEFGHIJKLMNOPQRSTUV XYZABCDEFGHIJKLMNOPQRSTUVW YZABCDEFGHIJKLMNOPQRSTUVWX ZABCDEFGHIJKLMNOPQRSTUVWXY

Figure 7.2: Beaufort’s Tableaux

10It had already been studied by Giovanni Sestri in 1710. 7.6. HOW TO BREAK VIGENERE` CIPHERS 117

The may also be performed using a Saint Cyr slide: simply replace the ciphertext slide with one on which the alphabet is reversed.

Examples:

(1) Encipher send supplies using keyword COMET.

plaintext sendsupplies key COMETCOMETCO ciphertext KKZBBIZXTLYW

The ciphertext is KKZBB IZXTL YW.

(2) Encipher Admiral using the keyword NAVY.

(3) Decipher GWFDY XGTPW U using keyword CASH. 11



To use the Variant Beaufort (confusingly also known as the Variant Vi- gen`ere) instead of starting with the plaintext letter and moving inwards to the keyletter, we start with the keyletter and trace inwards to the plaintext letter.

Examples: Using the Variant Beaufort:

(1) Encipher send supplies using keyword COMET.

(2) Decipher WNQ OOTA FVKD using keyword EAT. 12



7.6 How to Break Vigen`ereCiphers

“The method used for the preparation and read- ing of code messages is simple in the extreme and at the same time impossible of translation unless the key word is known.” From “A New Cipher Code” Scientific American Supplement, Jan 27, 1917, The Proceedings of the Engineer’s Club of Philadelphia.

In 1550 Belaso invents the Vigen`erecipher. In 1563 Porta almost figures out how to break it, but not quite. And then for the next 300 years the Vigen`ereis

11(2) NXJQV AK, (3) we need money. 12(1) QQBZZ SBDHP CE, (2) And some food. 118 CHAPTER 7. VIGENERE` CIPHERS unbreakable. So how finally was the Vigen`ereCipher broken? By turning the power of the keyword against the cipher. Let’s start by backing up. What enabled us to decrypt a Caesar cipher? We knew that every letter in the message was enciphered using the same key letter. And we could detect this key letter from the frequency habits of normal English. Why can’t we do this with a Vigen`erecipher? Because not all of the letters are enciphered using the same key letter. In other words, a Caesar cipher is monoalphabetic, a Vigen`erecipher is polyalphabetic. Now suppose we did know which letters were enciphered by the same letters of the keyword. Then we could split up the ciphertext into groups in such a way that all the letters in each group would be enciphered with the same keyletter. Each group of letters would be like a (different) Caesar cipher. And we know how to break Caesar ciphers. Since we could decrypt each of these groups of letters, then we could decrypt all of the letters in the ciphertext, thus breaking the message. So (perhaps) we don’t actually need to know the keyword, but we do need to determine which ciphertext letters have been enciphered with the same letter of the keyword. How can we do this? Imagine rewriting the ciphertext downward into columns, with as many letters per column as there are letters in the key- word. Of course, we’d need to know how many letters there are in the keyword, but pretend for a moment we know that. Then the letters in the first row of our re-written ciphertext would all have been enciphered using the first letter of the keyword. And the letters in the second row of the re-written ciphertext would all have been enciphered using the second letter of the keyword. Similarly for the other rows. We would be left with several Caesar ciphers to break. To make this a bit less abstract, let’s suppose the keyword contained 5 letters, and we number them k1k2k3k4k5, and similarly number the ciphertext letters ct1ct2ct3 ··· . Then the complex of rows and columns from the previous paragraph takes the form

k1 ct1 ct6 ct11 ct16 ... k2 ct2 ct7 ct12 ct17 ... k3 ct3 ct8 ct13 ct18 ... k4 ct4 ct9 ct14 ct19 ... k5 ct5 ct0 ct15 ct10 ...

The letters ct1, ct6, ct11, ct16, ... all have been enciphered by the key k1. So we can treat ct1ct6ct11ct16 ... as a Caesar cipher with unknown key k1. Likewise, ct2, ct7, ct12, ct17, ... all have been enciphered by k2, so ct2ct7ct12ct17 ... can be treated as a Caesar cipher. So to decrypt a Vigen`ere-encipheredmessage, assuming we know the length of the keyword, first rewrite the ciphertext in columns with as many letters per column as letters in the keyword. Then the letters of each row will constitute a Caesar cipher that can be broken with our techniques from Chapter 1. To summarize, from what does the Vigen`erecipher get its security? Ob- 7.6. HOW TO BREAK VIGENERE` CIPHERS 119 viously from its keyword. But, surprisingly, the important thing about the keyword, at least from a security standpoint, is not which letters are in it, but rather how many! Let’s illustrate the method with the example we considered earlier.

Example: QNTQH PIECN SCPOE QXPZC FFILF ACNMY FRRPW XUTKK RROLV CVEQX KASFS IMGSY XYRRM POEFH MYROI OEEUO EVSET EVPIX KAEFC ZGKEQ AHTAK KXTEI EGRNA DZPKM MDCAC ZGPVT QTFHZ OXZDE RFEWX YFWVN ARIGW AANUG RNQAE FXSMT FHCZG RWTMP ZDJII YQRRN QLDCV NKTHI VTKP Now suppose somehow, by hook or by crook, we learned that this is indeed a Vigen`erecipher with a keyword consisting of 5 letters. This would mean that every fifth letter of the message is enciphered by the same letter of the keyword. So the letters appearing first in each five-letter grouping are enciphered, as a Caesar cipher, using the same keyletter. Similarly, the letters occurring second in each five-letter grouping are enciphered using the same keyletter. Likewise for the third, the fourth and the fifth letters. So if we group together the letters that are enciphered by the same keyletter, then we can decipher each of these groups as a Caesar Cipher. First, we regroup and make new frequency counts.

First letters: QPSQFAFXRCKIXPMOEEKZAKEDMZQORYAARFFRZYQNV 40113400103021224410010223 ABCDEFGHIJKLMNOPQRSTUVWXYZ

Second letters: NICXFCRURVAMYOYEVVAGHXGZDGTXFFRANXHWDQLKT 30221332101112101302131421 ABCDEFGHIJKLMNOPQRSTUVWXYZ

Third letters: TEPPINRTOESGRERESPEKTTRPCPFZEWINQSCTJRDTK 00216110212002151536001001 ABCDEFGHIJKLMNOPQRSTUVWXYZ

Fourth letters: QCOZLMPKLQFSRFOUEIFEAENKAVHDWVGUAMZMIRCHP 30213312202231222210221002 ABCDEFGHIJKLMNOPQRSTUVWXYZ

Fifth letters: HNECFYWKVXSYMHIOTXCQKIAMCTZEXNWGETGPINVI 10303122402023111013022321 ABCDEFGHIJKLMNOPQRSTUVWXYZ 120 CHAPTER 7. VIGENERE` CIPHERS

Our eyes should be sufficiently trained by this point to detect the five Caesar ciphers here. For example, the first two have the following fit: First letters: plain: o p q r s t u v w x y z a b c d e f g h i j k l m n 40113400103021224410010223 cipher: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Key is M.

Second letters: plain: a b c d e f g h i j k l m n o p q r s t u v w x y z 30221332101112101302131421 cipher: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Key is A. The other three letters of the keyword are quickly determined. (Do it!)13 

This example shows that if we can determine the length of the keyword of a Vigen`erecipher, the difficult work will be done, because breaking the Vigen`ere cipher will be the same as breaking several Caesar ciphers. In other words, a bit time-consuming, but pretty easy. This is actually where Linquist’s method becomes useful. If one has a com- plete Caesar cipher, running down the alphabet is a quick and simply-minded of decrypting it. But when working on a Vigen`erecipher we do not have con- secutive groups of letters, and so running down the alphabet looking for words will not work. Further, there will be cases in which each group of similarly- encrypted letters is quite small, making traditional frequency analysis hard. Linquist’s method, however, is designed with small groups of letters in mind, and will sometimes be helpful.

7.7 The Kasiski Test

Friedrick W. Kasiski (1805–81) was born in what is now Poland, entered East Prussia’s Infantry in 1823 and retired 1852. In 1863 he published Die Geheim- schriften und die Dechiffrir-kunst, a book of only 95 pages. After his book made no apparent impact, he quit cryptography and became an anthropologist of some local fame. His book contained a method for determining the number of letters in a Vigen`ere keyword, and by the beginning of the 20th century experts considered polyalphabetic ciphers to be vulnerable.

13The key is MARCEL and the plaintext is “Encode well or do not encode at all. In transmitting cleartext, you give only a piece of information to the enemy, and you know what it is; in encoding badly, you permit him to read all your correspondence and that of your friends.” General Marcel Givierge, head of French cryptology during WWI, author of Course de Cryptographie, 1925. [Kahn, pg 349] 7.7. THE KASISKI TEST 121

Kasiski’s idea was to not look directly for the length of the keyword, but to instead look for multiples of that length. Suppose a sequence of letters appears more than once in the plaintext. And suppose the plaintext and keyword are such that the same letters of the keyword encipher the same letters of these repeated bits. Then the resulting ciphertext would be the same. Now turn this around. If there is a repetition of some group of letters in the ciphertext, perhaps this was caused by the same plaintext letters being enciphered by the same keyword letters. If so, then the keyword would have had to repeat just the right number of times to fit both the first and second appearances of the plaintext letters. The distance these appearances are apart would then be a multiple of the keyword length. If we can find several such repetitions, so several such distances, then we ought to be able to focus in on the length of the keyword. To give a short example, consider the following cipher, with every fifth letter numbered ([Kahn], page 208):

5 10 15 20 25 30 KIOVIEEIGKIOVNURNVJNUVKHVMGZIA

There are two repetitions here, one of KIOV and one of NU. The second KIOV starts at letter 10, the first at letter 1, so they occur a distance 9 apart. The second NU starts at letter 20, the first at letter 14, a distance of 6. Let’s put this information in a chart:

Repetition Start Positions Distance Factors KIOV 1, 10 10 − 1 = 9 3 × 3 NU 14, 20 20 − 14 = 6 2 × 3

What number(s) are both 2 × 3 and 3 × 3 a multiple of? 3. So, Kasiski would conclude, the keyword must be 3 letters long. In fact, the plaintext is To be or not to be. That is the question. and by putting the repeating keyword underneath we see that KIOV came from the combination of tobe and RUNR, and NU from th and UN:

plaintext t o b e o r n o t t o b e t h a t i s t h e q u e s t i o n key RUNRUNRUNRUNRUNRUNRUNRUNRUNRUN ciphertext K I O V I E E I G K I O V N U R N V J N U V K H V M G Z I A

Summarizing gives Kasiski’s Test. 122 CHAPTER 7. VIGENERE` CIPHERS

Kasiski’s Test:

1. Look for repetitions in the ciphertext of two, three, or more letters. Determine the distances (= position of first letter in second appear- ance minus position of first letter in first appearance) between the beginnings of these repetitions. This is doing a Kasiski Examina- tion. 2. Find the largest number that divides most of your distances in step 1 by first factoring those distances. This number should give a good idea of the length of the keyword. We call this length the keylength. Be aware that longer repetitions are more significant than shorter ones, so repeated triples are more meaningful than repeated pairs, and 4 letter strings are even more significant. Also, repetitions can arise that do not come from a repeat in the text. So the keylength will not necessarily divide all of the repetition distances.

3. Write the ciphertext in columns, with the number of letters per col- umn being equal to the keylength. This will result in a keylength number of rows, with each row being simply a Caesar cipher. This is building up a depth.

4. Use frequency analysis on each row to determine the keyletter for each, and put the keyletters together to find the keyword. Then decipher the message.

Example: Determine the keylength of the following ciphertext using Ka- siski’s method. To ease the counting, there are 50 letters to a line. (This example is stolen from [Kahn], page 209) ANYVG YSTYN RPLWH RDTKX RNYPV QTGHP HZKFE YUMUS AYWVK ZYEZM EZUDL JKTUL JLKQB JUQVU ECKBN RCTHP KESXM AZOEN SXGOL PGNLE EBMMT GCSSV MRSEZ MXHLP KJEJH TUPZU EDWKN NNRWA GEEXS LKZUD LJKFI XHTKP IAZMX FACWC TQIDU WBRRL TTKVN AJWVB REAWT NSEZM OECSS VMRSL JMLEE BMMTG AYVIY GHPEM YFARW AOAEL UPIUA YYMGE EMJQK SFCGU GYBPJ BPZYP JASNN FSTUS STYVG YS We start by listing the repetitions, their starting positions, the distance between the starting positions, and the factorization of that distance in Figure 7.3. What factors appear in (almost) every distance? All the distances are di- visible by 2, most are divisible by 2 × 3, and many by 2 × 2. The key length is probably either 6 or 4. (The factors 5, 7, 19, and 137 appear only once each, and 11 only twice, and can be ignored.) If the keylength is 4 then the long repetition LEEBMMTG would have to be ignored, while if the keylength is 6 we only need ignore the short repeat STY. The keylength is most likely 6.14 

14key = SIGNAL, “If signals are to be displayed in the presence of an enemy, they must be 7.8. SUMMARY 123

Repetition Start Positions Distance Factors YVGYS 3, 283 280 2 × 2 × 2 × 5 × 7 STY 7, 281 271 2 × 137 GHP 28, 226 198 2 × 3 × 3 × 11 EZM 48, 114 66 2 × 3 × 11 EZM 48, 198 150 2 × 3 × 5 × 5 ZUDLJK 52, 148 96 2 × 2 × 2 × 2 × 2 × 3 LJK 55, 151 96 2 × 2 × 2 × 2 × 2 × 3 LEEBMMTG 99, 213 114 2 × 3 × 19 CSSVMRS 107, 203 96 2 × 2 × 2 × 2 × 2 × 3 SEZM 113, 197 84 2 × 2 × 3 × 7 ZMX 115, 163 48 2 × 2 × 2 × 2 × 3 RWA 138, 234 96 2 × 2 × 2 × 2 × 2 × 3 GEE 141, 249 108 2 × 2 × 3 × 3 × 3

Figure 7.3: A Kasiski Table

7.8 Summary

The Vigen`erecipher is the most successful cipher in history. It is really nothing but a Caesar cipher with the key being changed in a pattern provided by a keyword or keyphrase, but produces a polyalphabetic cipher. One continually enciphers by using the next letter of the keyphrase to encipher the next letter of the message, repeating the keyphrase when necessary. When the key consists of many letters the frequency counts produces are quite flat, successfully hiding the pattern of the repetition of the key. Misnamed though it was, when used correctly the Vigen`erecipher was all but unbreakable for 300 years and remained in use even after techniques for routinely decrypting its messages were known. A sign of the value of an idea is the number of times it is rediscovered, and the numerous times the Vigen`erewas reinvented, sometimes in its standard form, sometimes in the form that the Beaufort and the Variant Vigen`eretake, show the idea behind this polyalphabetic cipher to be a very good one indeed. It wasn’t until Kasiski’s test that a general test for decrypting a carefully enciphered Vigen`erecipher was known. To perform this test, find the distances between repetitions in the ciphertext. The largest number that divides most of these distances is likely the length of the keyword (or a multiple of it). From here, divide the ciphertext into groups of letters that were enciphered by the same keyletter and decrypt these individual groups as simple Caesar ciphers. As we will see, the next 70 years of cipher history would be dominated by guarded by ciphers. The ciphers must be capable of frequent changes. The rules by which these changes are made must be simple. Ciphers are undiscoverable in proportion as their changes are frequent and as the messages in each change are brief.” From Albert J. Myer’s Manual of Signals. 124 CHAPTER 7. VIGENERE` CIPHERS attempts to fix the Vigen`erecipher and make it once again secure.

7.9 Topics and Techniques

1. What is a polyalphabetic cipher?

2. What is the difference between a polyalphabetic and monoalphabetic ci- pher?

3. Who was Alberti? Give two of his contributions to the development of cryptography.

4. Who was Trithemius? Give two of his contributions to the development of cryptography.

5. Who was Belaso? What did he contribute to the development of cryptog- raphy?

6. What is a countersign? How do we use it?

7. What is a Vigen`ere cipher? How do we encipher using a Vigen`erecipher?

8. How do we decipher a Vigen`erecipher?

9. Who invented the Vigen`erecipher?

10. Suppose we are given the frequency count for an unknown cipher. What will it look like if the cipher was a monoalphabetic one? What will it look like if the cipher was a polyalphabetic one?

11. Who was Beaufort? What is his cipher?

12. How do we go about breaking a Vigen`erecipher if we know the length of the keyword used in enciphering?

13. Who was Kasiski? What is his contribution to the development of cryp- tography?

14. What is the Kasiski test? How to perform it? Why does it work? What does it tell us?

7.10 Exercises

For Problems 1 through 4, encipher the first line of the following Mother Goose rhymes using the given keyword, part (a) using a Vigen`ere Cipher, (b) using Beaufort Cipher, and (c) using a Variant Cipher.

1. This little pig went to market. Keyword TOES. 7.10. EXERCISES 125

2. Little Miss Muffet sat on a tuffet. Keyword CURDS.

3. Humpty Dumpty sat on a wall. Keyword HORSES.

4. Old Mother Hubbard went to her cupboard. Keyword DOG.

5. Decipher the first line of the Mother Goose rhymes using a Vigen`erecipher with the given keyword.

(a) FAVOR JDCMC HWXRK QPMLV DIEP. Keyword WATER. (b) OWELE ZYISX GJAEK KFREL KR. Keyword GREAT. (c) JOJPL RAMVO PWKSV WESDA OBWSP. Keyword CANDLE. (d) BWQOH XFEAX HMKHO LUMAT TTOI. Keyword TAMBOURINE.

6. Decipher the first line of the Mother Goose rhymes using a Vigen`erecipher with the given keyword.

(a) HWPQE LDBIY JLXLR ORVT. Keyword POCKET. (b) JRSII SWOJE QYFCP FGCJ. Keyword BE ONE. (c) WLWHI ONREG TNEDN ELIBB WSXNZ N. Keyword RAIN. (d) KYAGO WALVF YYZKJ VVGCI RVTKL V. Keyword ORANGE HER.

7. Decipher the first line of the Mother Goose rhymes using a Beaufort cipher with the given keyword.

(a) ZMKWA WTWNP KNVGE ZBOUE HFBHD OBGKA SCUFR OQZTS D. Keyword STONE. (b) WWEYQ YISKG EXAAV ESNAM MRX. Keyword ERIE. (c) RUBNY MQZXH PBOTN ZBZPH LQSIU AMZXQ KNLPH YMFNU M. Keyword DUSTY. (d) BZXFK QNCCK IJXOL CDUAU ZOBKP VABCU LQOIK RWZAW MSPEW MURO. Keyword CHOICES.

8. Decipher the first line of the Mother Goose rhymes using a Variant cipher with the given keyword.

(a) QNAVF TCWXS ZDDGJ SQNAR XRERL IWI. Keyword DUEL. (b) ERZGM GNHGG WAVTZ ZVHPK VXZEO AF. Keyword SHOE. (c) KNSXL WJBKU GDPAJ IJOSU TLDMP JWC. Keyword JUMBLIES. (d) OKEZQ CBNPH WHNUC ZHMLG YEQLS KC. Keyword SUPPER.

9. The Confederate Army made frequent use of Vigen`ereciphers during the Civil War, calling them “court” or “diplomatic” ciphers. In 1863 the Union forces under General Ulysses S. Grant were engaged in a siege of Vicksburg and captured a ciphertext. Here is his report on this [Bates]. 126 CHAPTER 7. VIGENERE` CIPHERS

Headquarters Department of the Tennessee, Near Vicksburg, May 25, 1863. Col. J.C. Kelton, Assistant Adjutant-General, Washington, D.C. COLONEL: Eight men, with 200,000 percussion caps, were arrested whilst attempting to get through our lines into Vicksburg. The in- closed [sic] cipher was found upon them. Having no one with me who has the ingenuity to translate it, I send it to Washington, hoping that some one there may be able to make it out. Should the meaning of this cipher be made out, I request a copy be sent to me. Very respectfully, U.S. Grant, Major-General.

The enclosed message was

[General Joe] Jackson (miss), May 25, 1863 Lieutenant General Pemberton: My XAFV. USLX was VVUFLSJP by the BRCYAJ. 200000 VEGT. SUAJ. NERP. ZIFM. It will be GFOECSZOD as they NTYMNX. Bragg MJTPHIHZG a QRCM- KBSE. When it DZGJS. I will YOIG. AS. QHY. NITWM do you YTIAM the IIKM. VFVEY. How and where is the JSQMLGUGS- FTVE. HBFY is your ROEEL. J. E. Johnston

Decipher the enciphered words. The keyphrase is Manchester Bluff. (Note: there are several errors. Can you fix them?)

10. After Vicksburg fell to the Union, soldiers found the following cipher mes- sage:

Vicksburg, Dec. 26, 1862 Gen J.E. Johnston, Jackson: I prefer O A A V V R. It has reference to X H V K H Q C H F F I B P Z E L R E Q P Z Q N Y K to prevent P N U Z E Y X S W S T P J W at that point. R O E E L P S G H V E L V T Z F I U T LILASLTLHIFNOIGTSMMLFGCCAJD J.C. Pemberton

Somebody (Pemberton’s clerk?) failed to destroy the enciphered mes- sage after translating it. Military telegraph clerks in Washington broke the message and recovered the key Manchester Bluff. They were very surprised when they subsequently found that this key worked for many message. Eventually the Confederates, suspecting the cipher was broke, switched to new key. Decipher the message.

11. Captain William Plum [Antonucci] was in charge of Union communica- tions at New Orleans. He received an intercepted message addressed 7.10. EXERCISES 127

to Gen. E. K. Smith, the commander of the Conferederacy’s Trans- Mississippi Department. In the fall of 1964, the southwest campaign was not going well for the Union. Smith’s forces’ next target (north into Mis- souri? Raid the west? Advance south to New Orleans? Fight to regain a foothold across the Mississippi river?) was unknown. Perhaps the cipher gave this target!

September 30 To Genl. E. K. Smith: What are you doing to execute the instructions sent you to HCDL- LVW XMWQIG KM GOEI DMWI JN VAS DGUGUHDMITD. If success will be more certain you can substitute EJTFKMPG OPG- EEVT KQFARLF TAG HEEPZZU BBWYPHDN OMOMNQQG. By which you may effect O TPQGEXYK above that part HJ OPG KWMCT patrolled by the ZMGRIK GGIUL CW EWBNDLZL. Jeffn. Davis

The last part of the message mentioned patrolling. Perhaps it referred to gunboat patrols on the river, the only patrols likely to interest Confederate high command. So Plum guessed that “that part HG OPB KWMCT patrolled” stood for “that part of the river patrolled”. Plum wrote “this meaning occurred to the author, at first sight, and doubtless would to any one familiar with military affairs in that section.” He then turned to “By which you may effect O TPQGEZYK above that part of the river ...” Perhaps “a crossing”? He soon had decrypted the message. Can you? (Hint: use Plum’s guesses to help find the keyword.) (So word division led to complete solution. Why leave in word division?15 During the battle of Vicksburg Grant drove between the forces of Permber- ton and Joe Johnston, forcing Johnston into the city, which Grant then besieged. Johnston telegraphed for reinforcements. Unfortunately, the cipherer made mistakes, and the telegrapher added his own (confusing R (− −−) with S (− − −), and I (− −) with a pair of E’s (−)). After Kirby Smith spent 12 fruitless hours trying to read this message, he finally sent his chief of staff, Major Cunningham, on horseback around the flank of the Union armies to retrieve the message directly. By the time Cunningham reached Johnston, Johnston’s army was completely cut off from Smith. After this, the Confederates retained word division [Pratt 186–7]. This is not to imply that the North was cryptologically far more advanced than the South. General Albert Myer, in his A Manual of Signals: for the use of Signal Officers in the Field, he seemed to imply that in Vigen`ere- type systems the keys should be chosen at will by the sender and sent at the start of the transmission! That is, send both the key and ciphertext!

15“In 1862 South adopted the centuries-old Vigen’ere as its principal official cipher, the proceeded to violate its inherent strengths for the time by such practices as retaining plaintext word length, interspersing plain text and cipher, etc” Ralph E. Weber, [Weber] 128 CHAPTER 7. VIGENERE` CIPHERS

It is true that the Confederate’s signal service was organized by Captain (later General) E. Porter Alexander, who had known Myer in the old U.S. Army while Myer was organizing his signals. So it could well be believed that the Confederates knew the system being used. But Myer’s confidence in his system shows how little cryptanalysis was understood at the time. [SSA2])

12. By 1865 the Civil War was going very badly for the Confederates. The following message from General Robert E. Lee to John Breckenridge, the Secretary of War for the Confederacy was intercepted [Plum].

Confederate States of America, Military Telegraph. Dated Head- quarters, February 25, 1865. Received at Richmond, Va., 12:25 minutes, A.M. To Hon. J.C. Breckenridge, Sec’y of War :- I recommend that the tsysmee fn qoutwp rfatvvmp ubwaqbqtm exfvxj and iswaqjru ktmtl are not of immediate necessity, uv kpgfmbpgr mpc thnlfl should be lmqhtsp. (signed) R.E. Lee

It was known that the key was COME RETRIBUTION. Decipher the mes- sage.

13. In his Manual, Parker Hitt uses a ciphertext given in the 1914 Signal Book, enciphered using a cipher disk, to introduce . The ci- phertext is DRZCS XOTFG EYRIF HZRWC SXETA EBKSX MQQQW CKBPT DMF. What keylength did he determine?

14. Use Kasiski’s method to find the length of the keyword in the following Vigen`erecipher. Then decrypt the cipher. The quote is from ’s autobiography Passages from the Life of a Philosopher. OOPSF USIMP DXSJY KUMLV CILVA DEIRJ DXIDD SFUSI ASESF EPGIQ SIRJY KITEL ETEVO ORGOO GMCUT SNQZW SFDWE EMCEW PVYQP VSPYI VFYQO EPVAU PPYBN UUBTR TFOAI USMTU SETIP MSBMP EUZGO ODXRV NXADT THFCA HJNLN PMSDZ PPSFN ENEPG IQSIR JSEVF LPSPZ FSFCZ EEELA UELED WIVFC IRUSI PFCWO OELEN ZVEJY XINLX EJDLI TNSNW TGTJZ R

15. Use Kasiski’s method to decrypt the following Vigen`erecipher. (Find the keylength and then decrypt the cipher.) The quote is from Belaso’s book. PVZVU KIEWW NGZJF IOPFG JGZVL KTJRE AKFUV OWELL WZZDF KFCDL EBFUS JMFWZ AFCDF CIRJW WBUWZ AKFUV OARBT ATVZG NARQQ WGNDF PSUWZ ABNHL WYVWZ AKFUV OKVZA OVKRO NWKHS JRGXL PVVPG JDRSW NKILL EBXWZ AAERL PCFFD KGVWG CSKKW NHYHF KJVUW WQYRX PVVOW PHVUK SSGOS YSROW PHVUG BCLUU KIEWW NGZJF

16. This cipher contains Kasiski’s instructions for computing the keylength: 7.10. EXERCISES 129

MEJMY JKXCD LCNMQ DELMI QOTCB ERSRE DLCBI NOXGD MMXWD BSKYR CKRMD LCBEL NILNI YFSPD SZBIY UYNDL GCRSW FCBML DSGDW DKGRY VQDLC PEADS PWSQD JPOUS ORRVC DYYLN MLNMA KXCCX FORSW FCBSD VIRDI PCMLD LCUIW Use Kasiski’s method to decrypt it [Kasiski]. 17. Use Kasiski’s method to decrypt this message [Antonucci]. Head-Quarters, C.S. Armies, March 24, 1865. Gen. E. Kirby Smith, comdg. Trans-Miss. Dept., Gen.:- VVQ ECILMYMPM RVCOG UI LHOMNIDES KFCH KDF WASPTF US TFCFSTO ABXCBJX AZJKHMGJSIIMIVBCEQ QB NDEL UEISU HT KFG AUHD EGH OPCM MFSUVAJWH XRYMCOCI YU DDDXTMPT IU ICJQKPXT ES VVJAU MVRR TWHTC ABXCIU EOIEG O RDCGX EN UCR PV NTIPTYXEC RQVARIYYB RGZQ RSPZ RKSJCPH PTAXRSP EKEZ RAECDSTRZPT MZMSEB ACGG NSFQVVF MC KFG SMHE FTRF WHMVV KKGE PYH FEFM CKFRLISYTYXL XJ JTBBX RQ HTXD WBHZ AWVV FD ACGGAVXWZVV YCIAG OE NZY FET LGXA SCUH (Signed) R.E.Lee 18. Joseph Willard Brown [Brown] describes the Confederate cipher wheel. “To facilitate reading the cipher messages, Capt. William N. Barker, of the Confederate Signal Corps, invented a simple but convenient apparatus. The alphabetical square was pasted on a cylinder and revolved under a bar on which was a sliding pointer. The pointer was brought to the letter in the key on the bar, and the letter in the word to be converted was rolled up under the bar and the pointer rested on the required substitute letter.” Make a Confederate cipher wheel. 19. If we know the plaintext and the ciphertext, we can, of course, find the key. The Zig-Zag attack on polyalphabetic ciphers is based on this. Start with a probable word (also called a crib), a word that is probably in the plaintext. Assuming it starts at the first letter of the ciphertext, use it to find the beginning of the key. If the key appears to be nonsensical, try again starting with the second letter of the ciphertext, and then third, fourth, and so on, until part of the key appears. Then zig-zag back and forth from the key to the ciphertext and back again, gradually building both out.

(a) As a very simple example, was are the last three letters of the cipher- text OSQSW. Find the three associated letters of the key, and then see if you cannot guess the rest of the key and use it to find the rest of the ciphertext. (b) The very common phrase and the appears in the ciphertext PHWWZ RYBRR JTSUL GNXTV NSLSI QE. Find where, and use it as a key to decrypt the text. (c) At least one of the common endings -tion or -ing appear in this text: WEQCN IVEDO PHWWK OQHCQ KXLYL LGWGL OAFHX MLP. Use this information to decrypt it. 130 CHAPTER 7. VIGENERE` CIPHERS

20. A common reinvention of the Vigen`ereCipher reduces it to a progressive key cipher. From [Pall] we have “The Pass-Word Cipher.” To use it, select a keyword, say PRUDENTIA, and then encipher using the table produced, using a new line for each new word.

ABCDEFGHIJKLMNOPQRSTUVWXYZ P qrstuvwxyzabcdefghijklmnop R stuvwxyzabcdefghijklmnopqr U vwxyzabcdefghijklmnopqrstu D efghijklmnopqrstuvwxyzabcd E fghijklmnopqrstuvwxyzabcde N opqrstuvwxyzabcdefghijklmn T uvwxyzabcdefghijklmnopqrst I jklmnopqrstuvwxyzabcdefghi A bcdefghijklmnopqrstuvwxyza

The example given is JXU KGDVAWJK HPODIT JSV BFSY CT PCWNOUFMA BDYYUH VT EH LZWQ RDGG VIZSPX YT HVS YHYGS. Decipher it.

21. Graf Gronsfeld suggested that the Vigen`erekeyword be replaced by a short series of numbers. For example, come here enciphered with keynumber 1403 is DSMH IIRH. Now known as the Gronsfeld Cipher, in effect it is a Vigen`ereCipher as a composite of Shift Ciphers, rather than the usual composite of Caesar Ciphers.

(a) Encipher Gronsfeld with keynumber 296. (b) Decipher JVIMZ HTXLL LR. The keynumber was 34870. (c) Decipher EMSLH MCOF. The keynumber was 2625. (d) Considering the example given in the question, what makes this a good cipher? (e) For a sizable ciphertext, we can use the techniques for decrypting a Vigen`ereCipher to decrypt a Gronsfeld Cipher. But because of the restricted choice of keys (really, only words made up from the first ten letters of the alphabet), the Gronsfeld is considerably less secure than the Vigen`ere. The following ciphertext has been enciphered using a Gronsfeld Ci- pher in which the keynumber uses only (some of) 0’s, 1’s, 2’s and 3’s. By trial-and-error, can you decrypt the message? (Hint: each ciphertext letter could come from at most four plaintext letters: D from a, b, c, or d. Neatly list them, and see if you cannot build a message.) LHEBE SQGLL OUSWB KRZBA TRIGA YFP. 7.10. EXERCISES 131

22. (a) Use Kasiski’s method to decrypt the message: CT OSB UHGI TP IPEWF H CEWIL NSTTLE FJNVX XTYLS FWKKHI BJLSI SQ VOI BKSM XMKUL SK NVPONPN GSW OL IEAG NPSI HYJISFZ CYY NPUXQG TPRJA VXMSI AP EHVPPR TH WPPNEL UVZUA MMYVSF KNTS ZSZ UAJPQ DLMMJXL JR RA PORTELOGJ CSULTWNI XMKUHW XGLN ELCPOWY OL ULJTL BVJ TLBWTPZ XLD K ZISZNK OSY DL RYJUAJSJF ZIEQN ASC YQ LNFFTR CIKQYF XMG TBWY KU TSRG VVXBCYE FTWUE Z JUZFP HTLXW BKSM RTV IF WHTBUMSKGFH XQ ZIEWRSZ EX RHTWJPNFVX VOFVJ CYF XMQZF AMQ DJPQ NLU CTW ROSB OF NSAGTFRYU MPV YJL QSQKJF QFA ABQUGY XM YJ FPYW EVSVJUWPRIGUDI GWA MEYGY PR BJLO LZG HOH HTF IEAG KJII FVXR BKSM PJV FPY PPVX EQN

(b) This message appears in Parker Hitt’s book [Hitt, page 59]. He claims it appeared in the “personals” column of a London newspaper, and after breaking hit he looked back at the column from the day before and found the message “M.B. Will deposit £27 14s 5d tomorrow.” How are the messages related?

23. Trithemius’ tabula recta is used in three different ways to produce the Vigen`ereCipher, Beaufort Cipher and their Variant Cipher. In the chart below we encipher money with SEND using a Vigen`erecipher as usual. Next to it, the letters are converted to their equivalent numbers.

plaintext m o n e y plain numbers 13 15 14 5 25 key SENDS key numbers 18 4 13 3 18 ciphertext ESAHQ cipher numbers 31≡5 19 27≡1 8 43≡17

(a) Using P , C and K to represent the plaintext, ciphertext and key, respectively, how, mathematically, do P and K determine C? That is, give an equation C =??, where ?? is some combination of P and K. This gives a mathematical formulation of enciphering with Vigen`ere. (b) Find the similar formula for deciphering a Vigen`erecipher. (c) Create similar charts, using the same plaintext and key, but this time using a Beaufort Cipher. Use your charts to find the mathematical formulation of enciphering with the Beaufort. (d) Find the similar formula for deciphering a Beaufort cipher. (e) Repeat parts (c) and (d) but this time using a Variant Cipher. (f) What does this problem tell us about the possibility for other polyal- phabetic ciphers? 132 CHAPTER 7. VIGENERE` CIPHERS

24. (Continuation of 23.) In I, Claudius by Robert Graves, the character Claudius writes The key of the ... cipher ... was provided by the first hundred lines of the first book of the Iliad, which had to be read concurrently with the writing of the cipher, each letter in the writing being represented by the number of letters of the alphabet intervening between it and the corresponding letter in Homer. Thus the first letter of the first word of the first line of the first book of the Iliad is Mu. Suppose the first letter of the first word of an entry in the dossier to be Upsilon. There are seven letters in the Greek alphabet intervening between Mu and Upsilon so Upsilon would be written as 7. In this plan the alphabet wouild be thought of as circular, Omega, the last letter, following Alpha, the first, so that the distance between Upsilon and Alpha would be 4, but the distance between Alpha and Upsilon sould be 18. It was Augustus’s invention and must have taken rather a long time to write and decode. Although Graves describes a cipher using Greek, it should not be Greek to you. (Sorry – couldn’t resist.) This is clearly a polyalphabetic cipher. What kind – Vigen´ere,Beaufort or Variant? If it helps, the Greek alphabet is αβγδζηθικλµνξoπρστυφχψω

25. Admiral Beaufort was neither the first nor the last person to invent the cipher named for him. In his diary on April 22, 1868 Charles Dodgson wrote that while “Sitting up at night I invented a new cipher, which I think of calling the Telegraph Cipher.” He told George Ward Hunt, First Lord of the Admiralty about his invention. No response is known [Abeles2]. His method involves two alphabets: Key Alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZ Message Alphabet: abcdefghijklmnopqrstuvwxyza

“To translate a message into cipher, write the key-word, letter for letter, over the message, repeating it as often as may be necessary: slide the message-alphabet along under the other, so as to bring the first letter of the message under the first letter of the key-word, and copy the letter that stands over ’a’: then do the same with the second letter of the message and the second letter of the key-word, and so on. Translate back into English by the same process. For example, if the key-word be WAR, and the message meet me at six, we write it thus:–

warw ar wa rwa meet me at six kwnd on wh zod 7.10. EXERCISES 133

(a) Construct a device on which to do the Telegraph Cipher. (b) Encipher Through the Looking Glass with key ALICE. (c) Decipher KHDKD WLDGG R using key HOLE. (d) The Saint Cyr slide uses the alphabet three times, whereas Dodgson’s device needs only two. Why? (e) Explain why the Telegraph Cipher is the same as the Beaufort Cipher. (Dodgson was likely aware of Beaufort’s 1857 cipher. That they were equivalent was not pointed out until 1883 by Auguste Kerckhoffs in his La Cryptographi Militaire.)

26. Charles Dodgson also (see Exercise 25) invented two ciphers on February 23rd and 26th of 1858. From his diary entry on the 26th [Abeles3]:

Invented another cypher, far better than the last: it has these advan- tages. (a) The system is easily carried in the head. (b) The key-word is the only thing necessarily kept secret. (c) Even one knowing the system cannot possibly read the cipher without knowing the key-word. (d) Even with the English to the cipher given, it is impossible to discover the key-word. To use this cipher, Dodgson writes the (latin) alphabet in form of square

AFLQW BGMRX CHNSY DIOTZ EKPV*

(so I=J, U=V, and * just fills the square.) With keyword of GROUND he enciphers send as follows

Measuring from G to S we find it to be “2nd column 1st line,” and write 21. In re-translating [deciphering] we begin at G, & go “2 columns to the right, & 1 line further down,” and this gives us S again. Measuring from R to E gives 23. from O to N - 04. from U to D - 24 .... we write 21.23.04.24.

So we always read to the right and then down. If we need to move off the end of the square, simply re-enter on the top or left, as appropriate. Dodgson added some complications: i) Putting pairs in parentheses, such as (2.5), would mean to restart the keyword at its fifth letter N. Or (2.5)(1.2) says restart with the letter that is 1.2 from the 5th letter: N + (1.2) = V. 134 CHAPTER 7. VIGENERE` CIPHERS

ii) Putting a pair with a letter says to use the keyletter corresponding to the second number of the pair to encipher this letter. The first and last number of the result tells how many nulls are at the front and rear of the message. So (1.2)Q says encipher Q with R - gives 04, says no nulls at start, 4 at rear. iii) Deliberately misspelling words, by leaving out letters or adding extra letters, would increase safety. (a) Dodgson gives as an example (2.3)(V)10.14.20.00.00.01.33.40.42.40.01.20.23.02. Decipher it. (b) Dodgson added that an “improvement on this again is instead of 03 to write D ... and so on.” Explain why this makes his system a Variant Beaufort system. 27. As Vigen´eresaid, ”the longer the key is, the more difficult it is to solve the cipher.” Does using words that have some repetition in them, like Banana, Concoct, Tomorrow or Rococo? Change this conclusion? 28. As the end of World War II drew near, plans were drawn up for the future of the SSA. Included in this were studies of the history of U.S. cryptography prepared under the Direction of the Chief Signal Officer. The following quote is from the study Codes and Ciphers during the Civil War dated 20 April 1945.

Careful use of Vigen`ererequires [the] cipher clerk to first underline any repetitions in the body of the plaintext. Then copy the text into columns under the keyword/phrase and make sure that none of these repetitions appear in the same column, and if any do, insert nulls to throw the repetition out of phase.

Explain what this quote means. That is, explain what the underlining is trying to accomplish. 29. In his column in Philadelphia’s Alexander’s Weekly Messenger Edgar Allen Poe challenged readers that he could break any monoalphabetically enciphered text. A Mr. G. W. Kulp sent the ciphertext GEIEI ASGDX VZIJQ LMWLA AMXZY ZMLWD YXRTV JCIML LHJXA MXZYF IFIWA FEPML BGPXW DLNRW EQWBC KMHJT NWSLB RZLEW MKDTC HUCMK WZZXN TGUIE LBRJL HTAIV UGMBX LKIUU PAMUM WXKJX EWEQM CZXZL GNSBW LBRNT YOLPI MLQIH WKXKW IOLXE UFBXV V that was published February 26th, 1840. Poe showed in a later issue that the ciphertext was not a monoalphabetic cipher, and called it “A jargon of random characters having no meaning whatsoever.” Indeed, it is a Vigen`erecipher. Even knowing this does not necessarily make this an easy text to break. Can you do it? Chapter 8

Polyalphabetic Ciphers

The key cipher is the noblest and the greatest in the world, the most secure and faithful that never was there man who could find it out. Matteo Argenti

Following our work in Chapter 7 we have a way to decrypt Vigen`ereciphers: the Kasiski Examination followed by frequency analysis. The frequency analysis is relatively easy once we know the length of the keyword. However the Kasiski Examination, although it works fine, is quite time consuming since we must go through the message letter by letter looking for repetitions. It would be fifty years before an improvement was found.

8.1 Coincidences

On what does the Kasiski test depend? That repetition of parts of the ciphertext are meaningful. To recall our original example:

plaintext t o b e o r n o t t o b e t h a t i s t h e q u e s t i o n key RUNRUNRUNRUNRUNRUNRUNRUNRUNRUN ciphertext K I O V I E E I G K I O V N U R N V J N U V K H V M G Z I A

Kasiski’s idea is that the repeated KIOV and NU hint at the length of the key- word. To have such repetitions we (the crytanalysts) need some luck: the same plaintext part must lie under the same key part. Before Kasiski people presum- ably thought of such repetitions as mere coincidences – unrelated events that are unlikely to occur together but just happen to do so. Kasiski’s test showed that sometimes these coincidences are, in fact, meaningful. Can we look for these meaningful repetitions, these meaningful coincidences, in another way? A (relatively) quick way to see when individual letters are repeated is to write the ciphertext one two slips of paper, and hold one under

135 136 CHAPTER 8. POLYALPHABETIC CIPHERS the other, offset by various amounts. Let’s do this for the “KIOV” ciphertext, using the offsets or shifts of 1 through 6. Whenever there is a coincidence, we mark it with an asterisk.

1: KIOVIEEIGKIOVNURNVJNUVKHVMGZIA KIOVIEEIGKIOVNURNVJNUVKHVMGZIA *

2: KIOVIEEIGKIOVNURNVJNUVKHVMGZIA KIOVIEEIGKIOVNURNVJNUVKHVMGZIA

3: KIOVIEEIGKIOVNURNVJNUVKHVMGZIA KIOVIEEIGKIOVNURNVJNUVKHVMGZIA ******

4: KIOVIEEIGKIOVNURNVJNUVKHVMGZIA KIOVIEEIGKIOVNURNVJNUVKHVMGZIA *

5: KIOVIEEIGKIOVNURNVJNUVKHVMGZIA KIOVIEEIGKIOVNURNVJNUVKHVMGZIA *

6: KIOVIEEIGKIOVNURNVJNUVKHVMGZIA KIOVIEEIGKIOVNURNVJNUVKHVMGZIA ****

Obviously this is a very short ciphertext, so the number of coincidences is low, no matter the shift. But it is striking that the shifts of 3 (the keylength) and 6 (twice the keylength) have many more coincidences than the other shift amounts. For a longer ciphertext doing this “shift examination” would be quite time consuming. So let’s think more carefully about these coincidences. A coinci- dence will occur when the same letter occurs twice in the ciphertext. How likely are coincidences? Or, to be more precise, how likely is it that two randomly chosen letters from the ciphertext are the same? This brings us near the field of Probability. One of the first things anyone learns in probability is that the likelihood of something particular happening is the number of ways that thing can occur divided by the number of total things that can occur: Number of ways A can occur Probability that A occurs = Number of ways anything can occur

For example, my birthday is February 25th. How likely is it that I was born on a weekend? Since I didn’t tell you what year I was born, the best guess you can make is 2/7-ths. There are two ways I could have been born on a weekend 8.1. COINCIDENCES 137

(being born on Saturday or being born on Sunday) and a total of seven days of the week that I could have been born on. How does this work for us? To make the explanation clear, let’s write #A to mean the number of A’s in the ciphertext. So #B represents the number of B’s, #C is the number of C’s, and so on. How many ways are there to choose two different A’s from the ciphertext? There are #A ways to pick one A, and then #A−1 ways to pick a different A. (Minus 1 because we’ve already picked one, so there are one fewer to choose from.) So there are #A(#A−1) ways to pick two A’s. Likewise #B(#B−1) ways to pick two B’s, and #C(#C−1) ways to pick two C’s. Doing this for all the letters in the ciphertext gives us

#A(#A − 1) + #B(#B − 1) + ··· + #Z(#Z − 1) ways of having a coincidence. To see how likely a coincidence is we must divide by the number of ways of choosing any two letters. If the total number of letters in our ciphertext is N, then this number is N(N − 1).1 Putting the pieces together, we2 have reinvented Friedman’s famed . Designated Φ (“phi”), this is the likelihood that two letters, picked randomly from a ciphertext, are the same. As we’ve just determined, the formula for Φ is #A(#A − 1) + #B(#B − 1) + ··· + #Z(#Z − 1) Φ = , (8.1) N(N − 1) where #A is the number of A’s in the ciphertext, #B is the number of B’s in the ciphertext, etc., and N = #A + #B + ··· + #Z is the total number of letters in the ciphertext. Friedman was, justifiably, quite proud of his Index. In the introduction to The Index of Coincidence and Its Applications in Cryptography Riverbank Publications No 22., 1920 he wrote “when such a treatment is possible, it is one of the most useful and trustworthy methods in cryptography.”3 However, from our development, it is not quite clear what Φ tells us, or how to use it. Clearly Φ measures, somehow, the frequency of coincidences in a polyalphabetic cipher. But what does Φ = 0.045 mean? To find out, we need to think about the frequency counts in a different way.

1(Actually, from n objects there are n(n − 1)/2 ways of choosing two of them: we must divide by 2 because it doesn’t matter which one is chosen first and which is chosen second. So the denominator and each term in the numerator should have included a “/2”. Fortunately, all the two’s cancel out.) 2Our presentation of these ideas borrows liberally from that of Abraham Sinkov’s book Elementary Cryptanalysis. Sinkov (1907–1998) was one of the first three people hired by Friedman to work in the Army’s Signal Intelligence Service. He headed the Communications Intelligence Organization during World War II, the group largely in charge of intercepting and breaking Japanese messages. His book was published in 1966 and is quite influential. 3Kahn [Kahn, pg 376] tells the story that General Cartier of the French indexCartier, General Cryptographic section saw Riverbank No. 22 and “thought so highly of it that he had it translated and published forthwith – false-dating in “1921” to make it appear as if the French work had come first!” 138 CHAPTER 8. POLYALPHABETIC CIPHERS

8.2 The Measure of Roughness

It is not necessarily clear how the Index of Coincidence is connected to polyal- phabetic ciphers. To glimpse this connection we must think back to the dif- ferences between the frequencies count of monoalphabetic and polyalphabetic ciphers. We have learned to recognize a Caesar Cipher from the intense highs and lows of its frequency count: there are many letters that occur quite often (the ciphertext versions of the etaoinshr letters) and many that seldom occur (the uvwxyz-types). However, in polyalphabetic ciphertexts the frequencies are less sharp; the highs are lower and the lows are higher. We might say that a Caesar Cipher has a frequency count that is much rougher than the frequency count of a Vigen`ereCipher. Further, the longer the keyword of a Vigen`ere Cipher is, the smoother the frequency count is. To illustrate with an example, I’ve enciphered the quote of General Givierge from Section 7.6 using keys of various lengths. (I used the alphabet as the key, so the five letter key was ABCDE.) The frequency counts of the resulting ciphertexts appear in Figure 8.1. Notice that as the keys get longer, there are fewer numbers

keylength ABCDEFGHIJKLMNOPQRSTUVWXYZ one 13 1 7 10 20 4 3 4 14 0 1 9 5 19 21 3 0 11 4 16 5 1 3 1 8 0 three 3 6 6 7 15 10 8 4 6 7 4 5 3 10 13 17 9 4 6 9 11 6 2 2 5 5 five 5 4 5 8 10 5 12 7 7 7 3 5 10 7 12 10 7 12 8 4 8 9 7 6 3 2 ten 10 5 7 5 7 5 6 4 6 5 4 10 6 11 13 4 6 13 6 7 9 6 10 7 3 8 twenty 10 11 9 10 11 7 8 5 7 8 5 8 6 6 6 5 5 6 4 3 8 6 7 9 7 6

Figure 8.1: Frequency Counts: Same quote, different keylengths. that are much larger or much smaller than “average” and more that are in the middle. In general, the longer a keyword is, the smoother the frequencies are, and the shorter the keyword, the rougher the frequencies, which makes sense. After all, the point of polyalphabetic ciphers was to have each plaintext letter become many different cipherletters, causing the individual frequency counts to become more and more similar. Does the converse hold? Can we somehow measure the “roughness” of a frequency count, and then use the measurement to estimate the keylength? Let’s start by thinking about measuring roughness.

Example: Consider the three sets of numbers {3, 3, 3, 3}{4, 0, 4, 4}{1, 4, 1, 6}. We can probably all agree that all these sets move from “smoothest” to “rough- est.” Why do we feel so? The latter two sets are rougher because their numbers are more spread out, are farther away from each other. What mathematical device can measure this? 8.2. THE MEASURE OF ROUGHNESS 139

Interestingly, we all know the device that measures the opposite: if numbers are not spread out, they are all close to average. In mathematics we tend to use mean and average interchangeably, and, of course, sum of the numbers average = mean = . number of numbers Since each of the sets in our example contain 4 numbers that sum to 12, each has a mean or average of 3. This gives us an idea of how to measure roughness: a set of numbers is rougher when it contains many numbers that are far from the mean. So we might try to define roughness by “sum how far the numbers are from their mean.” How far implies distance which implies subtraction. But it also implies that 5 and 1 are both a distance of 2 from 3, no matter that one is larger than 3 and one is smaller. So we cannot use simple subtraction to measure distance (5 − 3 = +2 but 1 − 3 = −2) but would have to use absolute value ((|5−3| = +2 and |1−3| = +2). Unfortunately, as a mathematical device, absolute value is somewhat of a pain to work with. What is traditionally chosen instead is to square the result (as the square of a number is positive). This gives us our second attempt at a definition for roughness: “the sum of the squares of the distances from the mean.” For example, the middle set of numbers then would have roughness (4 − 3)2 + (0 − 3)2 + (4 − 3)2 + (4 − 3)2 = 11 + 32 + 12 + 12 = 1 + 9 + 1 + 1 = 11, while the final set would have roughness 18. (Check this!) While this agrees with our intuition that the final set is rougher than the middle one, the definition is not quite right yet. The sets {103, 103, 103, 103}, {104, 100, 104, 104} and {101, 104, 101, 106} created by adding 100 to our sets have 103 as their means and also would have roughness 0, 11 and 18. But it ought to be that an error of 2 is more forgivable when the target, the mean, is 103 than when it is 3. So we must adjust for the sum. Since we are squaring the numbers, it makes sense to take the total sum into account by dividing by its square. We thus define what Sinkov calls the Measure of Roughness or M.R.4 to be sum of the squares of the distances to the mean M.R. = . square of the sum of the numbers The first set has roughness 0, since all of its numbers, and so its mean, are 3’s. For the second we compute: (4 − 3)2 + (0 − 3)2 + (4 − 3)2 + (4 − 3)2 M.R. of {4, 0, 4, 4} = 122 1 + 9 + 1 + 1 = 144 = .076. Check that you understand this by computing M.R. for the third set.5  4For those with some statistics background, M.R. is very closely related to variance. 5.125. 140 CHAPTER 8. POLYALPHABETIC CIPHERS

Examples: Compute the mean and M.R. for each of the following sets. (1) {5, 9, 2, 10, 8}. The sum of the five numbers is 5 + 9 + 2 + 10 + 8 = 34, so their mean is 34/5 = 6.8. The M.R. is (5 − 6.8)2 + (9 − 6.8)2 + (2 − 6.8)2 + (10 − 6.8)2 + (8 − 6.8)2 42.8 = 342 1156 or .037. (Our formulation of roughness will generally lead to small num- bers. That’s ok: we are currently worried about relative degrees of rough- ness, rather than the meaning of the roughness measurement.) (2) {4, 11, 13, 8, 5, 7}.6  With roughness now understood, let us return to our goal: understanding frequency counts. What is the roughness of a frequency count? Using the same notation as before, we have (#A − x)2 + (#B − x)2 + ··· + (#Z − x)2 M.R. = , N 2 #A+#B+···+#Z N where x = 26 = 26 is the usual mathematical symbol for mean. Notice that if we multiply out the term contributed by A, we have (#A − x)2 = #A2 − 2#Ax + x2. Concentrating on the numerator of roughness, we have 26 similar terms, each with three parts. Summing the first parts gives #A2 + #B2 + ··· + #Z2. Summing the last part gives 26 copies of x2. But x = N/26, so these terms contribute  N 2 N 2 26x2 = 26 = 26 26 to the numerator. The middle terms are a bit more complicated, so we saved them for last: −2#Ax − 2#Bx − · · · − 2#Zx = −2x(#A + #B + ··· + #Z) = −2x(N) N = −2 N 26 N 2 = −2 26

6The mean is 8 and M.R. = .026. 8.2. THE MEASURE OF ROUGHNESS 141

Putting everything together, (#A − x)2 + (#B − x)2 + ··· + (#Z − x)2 M.R. = N 2 2 2 #A2 + #B2 + ··· + #Z2 2 N N = − 26 + 26 N 2 N 2 N 2 #A2 + #B2 + ··· + #Z2 2 1 = − + N 2 26 26 #A2 + #B2 + ··· + #Z2 1 = − . N 2 26 This looks a lot like the index of coincidence. Are they related? We start with the formula for Φ: #A(#A − 1) + #B(#B − 1) + ··· + #Z(#Z − 1) Φ = N(N − 1) #A2 + #A + #B2 + #B + ··· + #Z2 + #Z = (multiplying out) N(N − 1) #A2 + #B2 ··· + #Z2 + (#A + #B + ··· + #Z = (regrouping) N(N − 1) #A2 + #B2 ··· + #Z2 + N = (the sum of the letters is N) N(N − 1) #A2 + #B2 ··· + #Z2 N = + (separating the fractions) N(N − 1) N(N − 1) N #A2 + #B2 ··· + #Z2 1  = × + (factoring out N ) N − 1 N 2 N N−1

N 1 For a long ciphertext, N−1 will be very close to 1, and, likewise N will be very close to 0. So #A2 + #B2 ··· + #Z2 Φ ≈ . (8.2) N 2 Hopefully this last fraction looks familiar: it is the final form for M.R. except 1 for a 26 . Taking care of that, we (finally!) conclude 1 Φ ≈ M.R. + . (8.3) 26 The Index of Coincidence is basically a measure of roughness of the frequency table! This is how Φ is connected to polyalphabetic ciphers. William Friedman wrote any number of books and pamplets on cryptogra- phy. Of particular interest to people like us, trying to break polyalphabetic ci- phers, is The Index of Coincidence and Its Applications in Cryptography, River- bank Publications No 22., 1920. This, according to David Kahn, “must be regarded as the most important single publication in cryptology.” 142 CHAPTER 8. POLYALPHABETIC CIPHERS

“Before Friedman, cryptology eked out an existence as a study unto itself, as an isolated phenomenon, neither borrowing from nor contributing to other bodies of knowledge. ... It dwelt a recluse in the world of science. Friedman let cryptology out of this lonely wilderness.” David Kahn [Kahn, pg 383]

8.3 The Friedman Test

We have finished the preliminary work and can now start to exploit our formulas to understand Φ. First, how large and how small can the values of Φ be? From Formula 8.3, Φ is large when M.R. is large. And M.R. will be large when the frequency count is rougher. And the frequency count is roughest for monoalphabetic ciphers. Our standard monoalphabetic frequencies counts come from Figure 1.3. Using these numbers and Formula 8.2 (with N = 100 since the numbers are percentages) gives Φ = 0.065601 for monoalphabetically enciphered ciphertexts. (See exercise 8.18.) On the other hand, M.R. is always positive, and its smallest value is 0, when all the numbers are the same. So from Formula 8.3, the smallest value of Φ is 1 0 + 26 ≈ .03846. Do these values agree with our experimental data? Adding Φ to Figure 8.1, we have length ABCDEFGHIJKLMNOPQRSTUVWXYZ Φ one 13 1 7 10 20 4 3 4 14 0 1 9 5 19 21 3 0 11 4 16 5 1 3 1 8 0 0.0655738 three 3 6 6 7 15 10 8 4 6 7 4 5 3 10 13 17 9 4 6 9 11 6 2 2 5 5 0.0442563 five 5 4 5 8 10 5 12 7 7 7 3 5 10 7 12 10 7 12 8 4 8 9 7 6 3 2 0.0392122 ten 10 5 7 5 7 5 6 4 6 5 4 10 6 11 13 4 6 13 6 7 9 6 10 7 3 8 0.0387318 twenty 10 11 9 10 11 7 8 5 7 8 5 8 6 6 6 5 5 6 4 3 8 6 7 9 7 6 0.0364499

A keylength of 1 has a Φ value of near .065, and as the keys get longer, the value of Φ rapidly decreases to about 0.038. Friedman did not stop here, however, but continued until he found a direct relationship between the keylength and Φ. We won’t do that here (see exercise 8.19), but Friedman was able to show that

.065(N − k) + .038N(k − 1) Φ ≈ , k(N − 1) where k is the keylength and the values .065 and .038 are those we found above. Friedman then solved directly for k, giving us Friedman’s test. 8.3. THE FRIEDMAN TEST 143

The Friedman Test: Given a ciphertext, perhaps from a polyal- phabetic cipher, compute

#A(#A − 1) + #B(#B − 1) + ··· + #Z(#Z − 1) Φ = . N(N − 1)

Then 1. If Φ is near .065, the cipher is likely to be a monoalphabetic one. If Φ is closer to .038, the cipher is likely to be polyalphabetic.

2. The value of Φ to be expected from a polyalphabetic ciphers of given keylength is Keylength Expected Φ 1 .065601 2 .052036 3 .047515 4 .045254 5 .043898 6 .042994 10 .041186 large .038461

Suppose, for example, we compute that a cipher text has Φ = 0.47. Then Friedman’s Test7would suggest that the keyword is probably three or four letters long. Notice from the wiggle words (“suggest” and “probably”) that this method gives only an approximation and there will frequently be cases in which the Φ = .043 but the length of the keyword is actually 3. To double check that the correct keylength was found one can either 1. Do a brief Kasiski Examination of the text to see if you can find 3 or 4 sequences of letters that support the Friedman computation. Or 2. Use the keylength suggest by the Friedman test to divide the ciphertext into that number of rows. Recompute Φ for each row. These Φ’s will be between .03 and .07. If most of them are closer to .07 than .03 you have probably discovered the correct keylength. (It might seem time con- suming to recompute several mini−Φ’s, but since we will need to do a frequency count for each of the rows anyway there is relatively little extra computation.)

7 Calling this “The Friedman Test” is wildly unfair to Friedman. The Index of Coincidence has many applications, only one of which is estimating the keylength of a polyalphabetic cipher. Another application (see Exercises 8.20) is the determination of a ciphertext’s original language. Another, more important, one is to solve the superimposition problem: given several, perhaps partial, polyalphabetic ciphertexts with the same key, how should the texts be lined up under one another (“superimposed”) so that the letters in each column have the same keyletter. This is especially valuable when breaking machine ciphers of the type used in the 1930’s and 1940’s. 144 CHAPTER 8. POLYALPHABETIC CIPHERS

Example: Compute Φ for the following message, and then decrypt it. EFBTL QUUEH JMDRV EFBTL CBDPQ UWVEJ GGRQW VWPIK EOOPW FHZMF QFUIU KDUSU CZVGA GBFIK CBGMF HIQGL KCQXZ GMLRV GSGQA TFRVG PSDRG VVHVO JOWSF GRRIK VVHSL JSUYF FCHWL JSLVF CHXVW UVRAW XSUHA HTHVX WBGEE GBWED NMFVQ RHRKJ CDKCA UHKIG TSWMU CZDRV CPVXJ CQWGJ ADWEF CZBWA UWVIE RWUMU CZDRV ECQGJ GHHHS XWGOS JB The frequency count is

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z total 7 9 14 9 11 14 19 15 7 10 8 7 7 1 5 5 10 13 11 5 15 20 17 6 1 6 252

Computing, Φ = 0.0446152, which suggests a keyword length of 5. We next use Kasiski’s idea of building up a depth based on a keylength of 5:

First letters: EQJECUGVEFQKCGCHKGGTPVJGVJFJCUXHWGNRCUTCCCACURCEGXJ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z total 1 0 10 0 4 2 7 2 0 5 2 0 0 1 0 1 2 2 0 2 4 3 1 2 0 0 51 Φ1 = 0.0768627. (Here Φ1 represents the computation for the letters enci- phered by the 1st letter of the keyword.)

Second letters: FUMFBWGWOHFDZBBICMSFSVORVSCSHVSTBBMHDHSZPQDZWWZCHWB A B C D E F G H I J K L M N O P Q R S T U V W X Y Z total 06330415100030211161135004 51 Φ2 = 0.0588235.

Third letters: BUDBDVRPOZUUVFGQQLGRDHWRHUHLXRUHGWFRKKWDVWWBVUDQHG A B C D E F G H I J K L M N O P Q R S T U V W X Y Z total 03050245002200113500645101 50 Φ3 = 0.0620408.

Fourth letters: TERTPEQIPMISGIMGXRQVRVSISYWVVAHVEEVKCIMRXGEWIMRGHO A B C D E F G H I J K L M N O P Q R S T U V W X Y Z total 10105042601040122532062210 50 Φ4 = 0.0579592.

Fifth letters: LHVLQJWKWFUUAKFLZVAGGOFKLFLFWWAXEDQJAGUVJJFAEUVJSS A B C D E F G H I J K L M N O P Q R S T U V W X Y Z total 50012631053500102020444101 50 Φ5 = 0.0587755.

The values of Φ for each of the depths are larger than 0.052, so within the range of values we’d expect from the Friedman test for a monoalphabetic cipher. 8.4. MULTIPLE ENCIPHERINGS 145

Being confident of our keyword length, we now have five simple Caesar ciphers to break and soon the message is decrypted. 8 To further emphasize the power of Friedman’s Index, we provide the values of Φ that are produced assuming various keylengths.

Keylength Φ1 Φ2 Φ3 Φ4 Φ5 Φ6 Φ7 Average One .045 .045 Two .046 .042 .044 Three .047 .043 .045 .045 Four .048 .044 .052 .036 .045 Five .077 .059 .062 .058 .059 .063 Six .048 .045 .046 .041 .038 .043 .043 Seven .044 .052 .044 .043 .048 .048 .040 .046 The values in the Keylength Five row are clearly the largest. In particular, the average of .063 stands out. Five must indeed be the keylength. Especially when coupled with a computer program, Friedman’s Test provides for the almost routine determination of keylength. 

As a method for computing the keylength for a ciphertext of only a couple hundred letters the Friedman Test is too highly influenced by quirks of the indi- vidual ciphertext and keyword. However, as the ciphertext grows, the Friedman Test hones in on the keylength with amazing accuracy.

8.4 Multiple Encipherings

Kasiski’s and Friedman’s tests are a very powerful one-two combination to use on polyalphbetic ciphers, especially when used on a computer, enabling the almost automatic detection of a polyalphabetic cipher’s keylength. Is there anything about the polyalphabetic ciphers that might be saved? Ever since Section 7.6 we have been working under the (often implicit) as- sumption that determining the keylength of a polyalphabetic ciphertext was hard, but once we know the keylength breaking the cipher was easy. This is because the polyalphabetic ciphers are really just several Caesar ciphers twisted together – once we could pull the strands apart we were back to Chapter 1 material. Since it appears relatively easy to pull the strands apart, is there a way to make the strands shorter? That is, to make the number of letters per Caesar alphabet so small that decryption of the individual Kasiski depths is difficult? For example, if our message has 1000 letters in it, but we could use a 100 letter keyword, there would be only 10 letters per alphabet! Even Linquist’s method

8Keyword is CODES. “Cryptography and cryptanalysis are sometimes called twin or recip- rocal sciences, and in function they indeed mirror one another. What one does the other undoes. Their natures, however, differ fundamentally. Cryptography is theoretical and ab- stract. Cryptanalysis is empirical and concrete.” David Kahn 146 CHAPTER 8. POLYALPHABETIC CIPHERS fails with such few letters.9 How can we make such long keywords? One way would be to use phrases, or paragraphs from agreed upon books. This works well, but becomes harder to remember. Apparently, when people write to the (NSA), the cryptographic arm of the US Government, saying “I’ve discovered a great new and unbreakable code for you to use”, this is most frequently the method they describe.10 Another method we will call the Double Vigen`ere. We carefully pick two keywords and encipher the message twice, using each keyword once.

Examples:

(1) Encipher aaaaaaaaaaaaaa. First use the keyword IT and then re-encipher with the keyword WAS.

plaintext aaaaaaaaaaaaaaaaaaaa key 1 ITITITITITITITITITIT partial ciphertext ITITITITITITITITITIT key 2 WASWASWASWASWASWASWA ciphertext ETAPILETAPILETAPILET

IT and WAS act like a 6-letter keyword!

(2) Encipher banana banana banana banana using LAKE and then OCEANS. How long is the combination keyword? 11



What is the pattern? How long a “keyword” does the use of two keywords produce? The combination keyword will start to repeat whenever each individ- ual keyword starts to repeat. The last letter enciphered before this repetition starts must be enciphered with the last letter of each keyword. So each keyword has repeated some number of times up to this point. The number of letters enciphered so far must then be a multiple of each keylength, and so the smallest multiple of each is when this first occurs. The smallest multiple of each has a formal name, the least common mul- tiple or lcm. These Double Vigen`ereciphers act like a single Vigen`erecipher whose keyword is as long as the least common multiple of the original keywords.

9Another way to strengthen the polyalphabetic ciphers would be to avoid Caesar ciphers as the components but instead use better mixed alphabets. We will look at this in Exercise 8.10. 10This method was first discovered in the 1500’s by Giacolomo Cardano. He, however, started the key over each time for each word in the message. Cardano is best remembered in mathematics for discovering – or stealing, depending on whose version of history you believe – the cubic formula (like the quadratic formula but for equations of degree three). 11They act like a 12 letter keyword. ACBEL SZGCA KWACB ELSZG CAKW. 8.4. MULTIPLE ENCIPHERINGS 147

Just as the gcd is the largest number that divides both, the lcm is the smallest number that both divide. (Obviously word order matters!) Are they connected in anyway besides word order? Consider the following chart:

m n mn gcd(m, n) lcm(m, n) 4 8 32 4 8 4 6 24 2 12 3 8 24 1 24 8 10 80 2 40 5 12 60 1 60 8 9 72 1 72

From the examples it appears that gcd(m, n) × lcm(m, n) = m × n, and this is correct. Since we have a method for computing gcd’s, it is perhaps better to write mn lcm(m, n) = . gcd(m, n) In particular, if m and n are relatively prime then lcm(m, n) is just the product mn.

Examples: What keylength do the following keys produce?

(1) hamburger and FrenchFries give a 9 × 11 = 99 letter keyword (2) Francisco and California and Oakland. letter keyword.

12×10×7 (3) SanFrancisco and California and Oakland give a 2 = 350 letter keyword.

(4) SanFrancisco and California and Oakland and FrenchFries. 12



Using multiple keywords thus produces the equivalents of very long keywords relatively easily. How helpful is this? Suppose our goal is to have no more than 20 letters of depth per keyword letter in a Vigen`erecipher. There are about 70 characters per line of typed text and about 50 lines of text per page, so about 3500 characters per page. Since 3500/20 = 180, for each page of message we would need more than 180 letters worth of keyword. So Francisco and California and Oakland would safely encipher no more than (9 × 10 × 7 = 630)/180 = 3.5 pages of message. Suppose we must send 100 pages of messages a day? We must have keyword of about 100×180 = 18, 000 letters (= 6 pages), or a clever selection of 4 words, with lengths 7, 9, 13, and 17. And these words must be changed everyday. Modern life is making this problem even more severe: speaking at 200 words per minute on a cell phone means to have a secure conversation one needs in

12 12×10×7×11 (2) 9 × 10 × 7 = 630, (4) 2 = 3850. 148 CHAPTER 8. POLYALPHABETIC CIPHERS excess of 11 keyletters per minute of conversation. Or a modern business sending 2000 pages everyday needs more than 120 new pages of keywords everyday. This is our first encounter with the key management problem. A cipher system isn’t much good if every few days we must hand-deliver a brand-new book full of never-been-used keys to everyone we wish to communicate with. Even if we could develop such “books” our business would go bankrupt from the printing and delivery costs! The German Enigma machines of World War II fame, as well as some of the cipher machines used by the Americans like the M-94, were, in many ways, simply mechanical versions of multiple Vigen`ereciphers. These machines used a number of linked wheels, or “rotors,” to perform the enciphering and decipher- ing. Each rotor can be thought of as giving a monoalphabetic cipher. When coupled several such rotors combine only to give a monoalphabetic cipher. But when one rotor rotates even one step, a new monoalphabetic cipher is created. As long as the rotors continue to rotate, in whatever pattern, and do not come back to their original position, an ever-longer keyword is produced. With the 5 rotors of the latter Enigma machines, the “keyword” had functional length 265 = 11, 881, 376, longer than the Iliad and Odyssey combined!13 (The rotor machines have their own weaknesses. The rotors get stolen or captured. The setting of the rotors – how exactly they are positioned at the beginning of a message – must be sent. And each particular type of machine has its own pe- culiarities. For example, the Enigmas were unable to encipher a letter to itself. We mustn’t overstate the power of our techniques, the weaknesses of Enigma contributed a great to the Allies breaking it. But advanced versions of the techniques we’ve developed in this chapter played an important role.) Although the Vigen`ere-basedCiphers were great in their time, the develop- ment of Friedman’s and other more sophisticated tests, coupled with the modern need for vast quantities of keys, makes such ciphers relatively insecure.

13A keylength of eleven million sounds large, but numbers get big fast. For a “small” example, consider The War of the Rebellion: a compilation of the official records of the Union and Confederate armies. Pub. under the direction of the ... Secretary of War, the compilation of the formal reports, correspondence, orders, and returns of the Union and Confederate Armies. In four Series, it consists of nearly 70 volumes, many in several parts, with most of the parts between 800 and 1200 pages. For example, Volume XXIV, published in 1889, comes in three parts, and covers “Operations in Mississippi and West Tennessee, including those in Arkansas and Louisiana connected with the Siege of Vicksburg. January 20 - August 10, 1863.” Part III includes the Correspondences and is 1196 pages long. For an estimate, if we assume each volume has 2 parts, each of 1000 pages, and there are 3000 characters per page, the volumes contain 420,000,000 characters in total. This is over 200,000 characters for each day of the war! Now imagine the number of characters sent during WWII. Or during either of the Persian Gulf Wars! 8.5. VIGENERE’S` AUTO KEY CIPHER 149

8.5 Vigen`ere’sAuto Key Cipher

In a conversation on that subject [unbreakable ciphers] which I had with the late Mr. Davies Gilbert, President of the Royal Society, [he and I] each maintained that he possessed a cipher which was absolutely inscrutable. On comparison, it appeared that we had both imagined the same law. Charles Babbage, discussing Auto-key ciphers Passages from the Life of a Philosopher

So we hope to produce and use really long keywords. But producing good ones is hard, and exchanging them even harder. What can we do? Cardano had among the first ideas along this line. In modern terminology his idea is to encipher each word with a progressive key starting with the first letter of the previous word. Of course, there must be some way of enciphering the first word, and Belaso suggested picking a key letter.

Example: Encipher Lets attack them Friday using the keyletter J. So Lets is enciphered with JKLM (recall that a progressive key moves one letter forward each step), attack with LMNOPQ, and so on.

plaintext l e t s a t t a c k t h e m f r i d a y key J K L M L M N O P Q A B C D T U V W X Y ciphertext U O E E L F G O R A T I G P Y L D Z X W

So the ciphertext is UOEEL FGORA TIGPY LDZXW. 

What is the security of this method? It seems better than the Vigen`ere method since the keyword is not repeated. One possible problem is that since t starts so many words, many words much of the plaintext will be enciphered with key TUVWXY.... If word length is preserved in the ciphertext then we may simply decipher each word using the key TUVWXY... until we get lucky and find one for which this works. Since almost 16% of all words begin with T, we’ll likely only have to try 3 or 4 until we get lucky. And once we have one word we are done for it will tell us how to decipher the next word. So this is a good idea, but we need a way to make it less predictable. At this point Blaise de Vigen`ere (1523–1596) finally enters the picture. As a youth Vigen`erereceived an excellent education, attending the Diet of Worms as a very young secretary and traveling throughout Europe in diplomatic missions. In 1549 he was sent to Rome, where he apparently read Trithemius, Belaso, Cardano and Porta, among others, and studied with the experts of the papal curia. After a career in diplomacy (for the Duke of Nevers and King Charles IX) in 1570 he married a much younger wife, quit court life, gave his annuity to the poor, and devoted himself to writing. And he wrote about 150 CHAPTER 8. POLYALPHABETIC CIPHERS everything, everything under the sun, including Traict´edes Com`etes, which helped to destroy the superstition that comets come from an angry God trying to warn a wicked world to stop sinning. In 1586 he wrote the 600 page Traict´edes Chiffres which, beyond discussions of magic and the mysteries of the universe, was a comprehensive overview of cryptography. Of immediate interest to us, he wanted an easily remembered but really long and non-repeating key. When enciphering a message, what is the nearest long possible key? The message itself!

Auto Key Algorithm: Pick a priming key of one or more letters. Encipher the plaintext as a Vigen`erecipher in which the actual key is the priming key followed by the message.

Examples:

(1) Encipher My vengeance is at hand, key D.

plaintext myvengenceisathand key DMYVENGENCEISATHAN ciphertext PKTZRTKTPGMASTAHNQ

The ciphertext is PKTZR TKTPG MASTA HNQ.

(2) Encipher Beware the hand of fate with key = TO.

plaintext b e w a r e t h e h a n d o f f a t e key TOBEWARETHEHANDOFFA ciphertext U S X E N E K L X O E U D B I T F Y E

The ciphertext is USXEN EKLXO EUDBI TFYE.

(3) Decipher XZCYY INTCQ AQX, key = M. To decipher we must creep along, remembering that the next keyletter is the previous plaintext letter.

ciphertext X Z C Y Y I N T C Q A Q X key M L plaintext l ciphertext X Z C Y Y I N T C Q A Q X key M L O plaintext l o ciphertext X Z C Y Y I N T C Q A Q X key M L O O plaintext l o o 8.5. VIGENERE’S` AUTO KEY CIPHER 151

ciphertext X Z C Y Y I N T C Q A Q X key M L O O K plaintext l o o k Eventually this produces look out a comet.



The Auto Key Cipher defeats all the types of frequency analysis we’ve seen so far, making it the strongest cipher system yet. Its main disadvantage is one we haven’t seen before: even simple copying misteaks can cause havoc. Since it is easy to make a small error, and any mistake propagates through the rest of the cipher, even a single error can make an Auto Key enciphered text unreadable.

Example: Unfortunately, DSAUB ARIVI FF contains a small error. If the key is D, can you find the error?14. 

How might we go about breaking an Auto Key cipher? Consider the fol- lowing message, where we assume we know the ciphertext, and have somehow guessed one plaintext letter and know that the priming key was a single letter.

········ s ··········· ···················· YBCMGWRUVSNQITFNWRUV

By working forwards and backwards we fill in some:

······· d s ··········· ········ d s ·········· YBCMGWRUVSNQITFNWRUV and then more ······ r d s a ·········· ······· r d s a ········· YBCMGWRUVSNQITFNWRUV and before long15

backwardsandforwards xbackwardsandforward YBCMGWRUVSNQITFNWRUV

If the priming key is longer, then we need to know as many consecutive letters of plaintext as there were priming key letters, but the same forwards and backwards stepping will then decrypt the cipher.

14a simple error is the message. The R in the ciphertext should be a P 15This should look quite familiar to those who completed Exercise 7.19. 152 CHAPTER 8. POLYALPHABETIC CIPHERS

Using mathematics similar to that used in Friedman’s Index of Coincidence, given enough text any Auto Key cipher can be broken. To perhaps explain how, notice that the plaintext-key pairs (t,O), (n,I), (s, N) and (i, D) all give rise to the ciphertext letter V. But this is the only way of creating V from two high-frequency letters. So if we suspect that an Auto Key Cipher with priming keyword of length 1 was used, then we can try each of these as the possible plaintext–key pair on each V’s in the ciphertext. By probability, we very quickly will find one that gives the correct decipherment. Similarly, the pairs re and er will occur often in the plaintext, producing a large number of W’s in the ciphertext. Thus an Auto Key cipher with a priming key of length 1 is not secure. Likewise, every the in a Auto Key cipher of keylength two will give many e’s enciphered by t’s into Y’s. In general, the etaoinshr letters so frequently encipher one another they give each other away, leading to a decryption of the cipher. To summarize, we broke the Vigen`ereCipher by exploiting the pattern of its repeating keyword. The Auto Key Cipher can be broken by exploiting the usual frequency patterns of English. Removing all such patterns must be our next goal.

8.6 Perfect Secrecy

All the ciphers we’ve studied so far depend, eventually, on some sort of pattern, and this pattern eventually gives them away. What is needed is a cipher system whose keyword is both endless and and senseless. The need for endless should be clear after our work with Vigen`ereciphers. Once a keyword starts to be repeated, the cipher is in danger of being broken. Hence, for perfect secrecy we can never allow the it to be repeated, i.e., it must be endless. The reason for senseless is nearly the same: an extremely long keyword that is not senseless has some pattern to it. And a pattern is not much different from a repeating keyword. As Kahn puts it, the perfect cipher must “avoid the Scylla of repetition and the Charybdis of intelligibility.” Joseph O. Mauborgne (1881–1971) had the idea of “endless”. Mauborgne had a long and very distinguished career in cryptography. In 1914 he gave the first recorded solution of a . (We will study these in Chapter 9.) He eventually rose to the post of Chief Signal Officer in October 1937 and as a Major General built the cryptanalytic abilities of the Signal Corps to the extent that it was reading a flood of Japanese ciphers by his retirement in 1941. Gilbert S. Vernam (1890-1960) had the idea of “senseless”. Vernam was an employee of AT&T when in 1917 he proposed the use of “ devices” for automatic encryption and decryption of telegram messages. Vernam received 65 patents in the areas of cryptography and telephone switching systems and was well known for his cleverness – supposedly he asked himself “What can I 8.6. PERFECT SECRECY 153 invent now?” each night while relaxing on his sofa.16 Put endless and senseless together and we have the One-time Pad:17 Pick any random keyword of length equal to the length of your message. Treat it as the key word of a Vigen`erecipher. Throw it away after you use it (hence the name). A properly used one-time pad is the only unbreakable cipher, or, in fancier language, is holocryptic. Why is it unbreakable? Consider the ciphertext UVAET. What is the plaintext? Using this ciphertext and the ciphertext only it is im- possible to tell. This is because for every 5 letter block of letters you can pick, there is a (possibly non-nonsensical) 5 letter Vigen`erekeyword that will turn your plaintext into UVAET. And unless you have some other knowledge, all are equally possible. So it is impossible to tell what UVAET means. This idea holds on a much larger scale. If you pick a keyword that is as long as your message, make the keyword to be a random collection of letters, and use the keyword exactly twice, once to encipher and once to decipher, then there is no way that anyone can break the message. This was a favorite method of Russian spies in the 1950’s.18 It is also popular in movies, mainly because the one-time pads were usually written on very small pieces of paper that we hidden in false shoe bottoms, or inside fake cigarettes, fake nickels, etc. As Friedman put it in his Britannica article on cryptology [Britt, pg 1059] a letter-for-letter cipher system which employs, once and only once, a keying sequence composed of characters or elements in a random and entirely unpredictable sequence may be considered holocryptic, that is, messages in such a systems cannot be read by indirect processes involving cryptanalysis, but only by direct processes involving possession of the key or keys, obtained either legitimately, by virtue of being among the intended communicators, or by stealth.

Examples: Encipher and decipher using a one-time pad.

(1) Encipher holocryptic using the key SLMPQOSUCFC. (2) Decipher HROJA OPMNZ using the key NEXFA LPLCV.19



How much use do you think this system that allows perfect secrecy gets? The answer is almost none. Consider the problems you would have if you were

16Among the things he did invent was a “Secret Signaling System” that was awarded U.S. Patent 1,310,719. This was, more or less, a teletypewriter that performed Vigen´ereencryption. 17Ciphers very similar to one-time pads were also discovered in Germany and Russia about this same time [Bauer, page 144]. 18Unfortunately for them, the Russians made 9 copies of some of their one-time pads. Even this small lapse was enough for the NSA to break these messages. (This information was only recently declassified, and can be found in the “Verona Breaks” pages at the NSA website.) 19(1) ZZXDS FQJVN E, (2) unreadable. 154 CHAPTER 8. POLYALPHABETIC CIPHERS the Chief Decipherer in Spy Central. For every page of text that one of your spies wishes to send you he or she (and you) must have a page of keyword. If you are collecting 1000 pages of data from your spies everyday you must make and distribute that many different pages of keywords everyday. You obviously can’t send the keyword via radio or email because they must be private. And you can’t first encipher them before sending them because the only way to safely encipher them is with a one-time pad! The only way is to somehow physically carry the key to the spy. This is a real key management problem, and it is difficult enough to overcome that one-time pads are used vary rarely. For instance, the “hot line” between Moscow and Washington used a form of one-time pads. Unfortunately, the lure of perfect encryption is so great that some still argue that the one-time pad is the future. But as recently wrote20

[It] is the only provably secure cryptosystem we know of. It’s also pretty much useless. Because the key has to be as long as the message, it doesn’t solve the security problem. One way to look at encryption is that it takes very long secrets – the message – and turns them into very short secrets: the key. With a one-time pad, you haven’t shrunk the secret any. It’s just as hard to courier the pad to the recipient as it is to courier the message itself. Modern cryptography encrypts large things – Internet connections, digital audio and video, telephone conversations, etc. – and dealing with one-time pads for those applications is just impracticable. ... One-time pads may be theoretically secure, but they are not secure in a practical sense. They replace a cryptographic problem that we know a lot about solving – how to design secure algorithms – with an implementation problem we have very little hope of solving. They’re not the future. And you should look at anyone who says otherwise with deep and profound suspicion. So one-time pads do provide perfect secrecy, but they do so in a way that is nearly perfectly useless.

8.7 Summary

The Index of Coincidence measures the likelihood that two letters, chosen at random from some text, are the same. It is very closely related to the “rough- ness” of the frequency count of the text. Using this relation, Friedman was able to predict the length of a polyalphabetic cipher’s keyword from ciphertexts In- dex of Coincidence value. While not necessarily precise for short ciphers, when combined with Kasiski’s test, the Index of Coincidence provides an accurate and rapid method for determining the keylength used to produce that ciphertext. Hence polyalphabetic ciphers are not secure. By repeatedly enciphering a text with a variety of keywords one can produce the equivalent of a Vigen`erecipher of keylength equal to the least common

20Crypto-gram, October 15, 2002 8.8. TERMS AND TOPICS 155 multiple of the lengths of the individual keys. While this method may keep short texts secure, the amount of keyword needed is directly related to the amount of text one wishes to send (approximately 1 page of keyword must be used to safely encipher 20 pages of plaintext). Producing and exchanging the huge of amount keys needed by a modern business or government prevents any practical use of a repeated Vigen`ere system. Vigen`eredid not invent the Vigen`ereCipher, but, rather that Auto Key Cipher. This cipher uses a priming key following by the plaintext as its key, thus producing a key without repetitions. Its security is much better than that of the Vigen`ere.Nonetheless, the patterns in English provide enough of an entry that Friedman was eventually able to break this system. The only unbreakable cipher system is the One-Time Pad. A random series of letters (or numbers) is used as the key. As long as each key is used exactly twice, once for enciphering and once for deciphering, and is kept secret, the system cannot be broken. But the key is as long as the message, so the problem of transmitting the message is simply replaced by that of transmitting the key. So the immense quantity of keys that must be distributed to use this system prevent it from being used except in very special circumstances.

8.8 Terms and Topics

1. What is a “coincidence?” What is the connection between coincidences and Kasiski’s test?

2. What is the Index of Coincidence? What does it measure?

3. What is the formula for Φ?

4. How do the frequency counts of a monoalphabetic cipher and a polyal- phabetic cipher with long keylength differ?

5. What does M.R. stand for? What does it measure?

6. Under what circumstances do we expect M.R. to be a larger number? A smaller number?

7. What is the Friedman test?

8. How can we use Φ to estimate the keylength of a polyalphabetic cipher?

9. What is the Double Vigen`ere? How does it differ from the Vigen`erecipher?

10. Performing multiple encipherments with different keywords is equivalent to enciphering once with a keyword of what length?

11. Who invented the Auto Key cipher?

12. What is the Auto Key cipher? Explain how to use it. 156 CHAPTER 8. POLYALPHABETIC CIPHERS

13. What is the key for an Auto Key cipher?

14. Give a strength and a weakness of the Auto Key Cipher.

15. What is “Perfect Secrecy?”

16. How does the One-Time Pad work?

17. Give a strength and a weakness of the One-Time Pad system.

8.9 Exercises

For Problems 1 through 6, do a frequency count and use it to compute Φ. Use this value to estimate the keylength. Try to find a couple of repetitions in the ciphertext that seem to validate your keylength guess. Then decipher the message.

1. COMUB XWYKU OXTIY UBXNQ AFLXC SMIYR BLXUI UFJKF ZXSLX EUKFN ASYXU BTUNA FSUFH HUFTC IKJIN TNHXL BUYTO XKFUW FNABN MIYRC YXJGI PMLJV EFNHE CLDSI IYKBH WJHLP GXDUL FMMIU MUBXZ VXFQB UBHVN LVMIJ NBPH

2. ZYIBX YKXHT ZTENU KVBPX IKIDB YKLAM ZYIDX MIIEH LJICN XZXYU KXVET ZVRON MYXOW KCEYL UCYTB UEFYM NVINX SPJOK YLGHT RVRGM NFJTB SVXHT ZNLEG ZYISH RLXIH TZWFB TRPLR XVECA KUXHX OEJOK SRXIH TKLUL USXAB TVHHT YCSSM GCPIM YMELN K

3. (a) QZIQU DTKUT IWCUG LJZIU MSVHO KIGNZ IDCEW KIMPG VWRHP WVWIS POIOE QSDIW NWVWI STSYS VWQAG HUDMN YLLHH MQEYJ SIFWX WYJWX HVIUY SGKEW CLMLS EYSWV GSPOU KTRMK MEFW (b) What is the meaning of the quote?

4. IKUQN VOCSJ MPYKA BIJFG NXGGS YOYRV RSLCK ZEKIC EEMQY JFCMF RCNTU NHVDK EEFCN MUQIF ZCDWA PADAK EEFYR PQVCY MLGVA DLVFR EIEZE KICEE ECVVD YLZEM LRFCD GQMPC QYNUM KEKTM DFRAR PBROX DYPYK GNUQB TFSMV VDLTY QAOID CSGAL DVZAE SQRWV QLDZR DEIQL TRDKY TTGEW EDOIM LUEXG MZFCD KUKE

5. This one is a bit harder: MOVBA NZLIB GFLTR DWJBW YEJZU BKIXP ZVKHG SWVWA XLPDF RVZBB JUSSE FWOMU WVLTL OXRVY LOXJF NKLJB VNULL VWTZP LPEGW UNPKY OHLVT CSZBV EADSR WRIFM NSKHW VPUVR KVYAY EVLML TTWKV PGHWY LZFMW ABTVS LOKHJ HWKFL KHGBZ OKHWM TBCTD HRPET ZLBYF WFZMB GIVPM F 8.9. EXERCISES 157

6. As is this one: KHVBH FXPSP NMJET MGXWF RLIEN OGMOO TRHRV ADUJH KSOGN WXKSU IFYYT XVVRR HTSTF HYNZF KZVCS FJGAC UIPIG GBAML FCZGN LEWVC HWXVV UHNWX KLCHV OUJCU HRVLV UWWER HXWGQ ENWPA GFRRH KLGVN WXJSH HBXFR VISNW ODFGF BOCEH KJVMO RPUWG LILPF PRLID TTCZR MVHCH RJWYI PUNPY DIPHV WQYME VBWYF VCBBC BVVQT GQYDX QCXYU IX 7. Encipher or decipher the following texts using a Vigen´erecipher with two keywords.

(a) Encipher Peter Piper picked a peck using the keywords JACK and SPRATT. (b) Decipher FDUAP XNQHJ LCWOI UIOHF BEUDN C using the keywords LIZARD and CRAWL. (c) Encipher I need to laugh, and when the sun is out using the keywords Sun and shine. (d) Decipher RJQMU FJRQV HIRBS OIMEO ELUIJ PAAZ using keywords Good and day. 8. Matteo Argenti used the keyword Key PIETRO and a slightly mixed alpha- bet to make a Vigen`erecipher:

cipher HI LMNOPRSTABCDE FGU plain1 p r s t u a b c d e f g h i l m n o plain2 i l m n o p r s t u a b c d e f g h plain3 e f g h i l mnop r s t u a b c d plain4 t u a b c d e f gh i l mn o p r s plain5 r s t u a b c d e f g h i l m n o p plain6 op r s t u a b c d e f g h i l mn

To encipher, find the first letter of plaintext in the first plaintext alphabet, and replace it by the ciphertext letter above it. The second plaintext alphabet is used for the second letter, the third for the third, etc. For example peter becomes HECPH. Decipher LHUAP AHEAN SLMNG TSUUM BPAOT PEBNC NEALP HGSAE MAGEC ANUD. 9. If we are using a double Vigen`ere,does it matter in which order we use the keywords? Explain.

10. The “Double Vigen`ere”we discussed in the chapter used two keywords to encipher a message twice. A different and better way to do this is to use the first key to mix the cipheralphabet and the second key as a Vigen`ere keyword using the newly mixed cipheralphabet (much like PIETRO was used in exercise 8.8). 158 CHAPTER 8. POLYALPHABETIC CIPHERS

To do this, use the first keyword to mix the alphabet. Then under the usual alphabet copy the cipheralphabet a number of times so that the first letters of the rows spells the second keyword. Then use the alphabets cyclically.

(a) Use a Keyword Mixed cipher with keyword ONCE to mix the cipheral- phabet, and then a Vigen´erecipher with keyword TWICE to encipher three times a lady. (b) Use a Keyword Transposed cipher with keyword AGAIN to mix the cipheralphabet, and then a Vigen´erecipher with keyword REPEAT to encipher Over and out. (c) The ciphertext UIKQEYM BU TN JRBA was enciphered with a Vigen´ere cipher with keyword SIMPLE, using a cipheralphabet mixed via a Key- word Mixed cipher with keyword TWIST. What was the plaintext? (d) The plaintext microwave oven was enciphered with a Vigen´ereci- pher with keyword QUICK, using a cipheralphabet mixed via a simple Keyword cipher to produce the ciphertext CCYNYMUZU YFYR. What was the Keyword Cipher’s keyword?

11. A Vigen`ereCipher with two keywords has the mathematical formulation C = P + K1 + K2, where K1 and K2 are the first and second keywords. What if we instead used C = P ∗ K1 + K2, perhaps calling this the Linear Vigen`ereCipher? The Linear Cipher (from Chapter 4), while not very secure, is certainly harder to break than the Caesar Cipher. Is the Linear Vigen`ereCipher more secure than the Vigen`erewith two keywords? Explain.

12. Encipher or decipher the following definitions using an Auto Key cipher with the given priming key.

(a) Encipher Autobiography is a story with priming key T. (b) Encipher autoclave is an oven with priming key F. (c) Decipher OUNHQ TRCAG ASGUJ ZVEZQ RG with priming key O. (d) Decipher CUNHU OZFWA SHLPT KQDIX V with priming key C. (e) Encipher autograph is a signature with priming key RI. (f) Decipher CLTIF CNWMM TWIUA T with priming key CR. (g) Decipher PYTIF OFIVO GGVSY AMBNL VGGOV LWEWA with priming key PE. (h) Decipher FLXOH HALWE GVVMC HRSIA FI with priming key FRE. (i) Decipher SNTSS OZUWM ZOSFB ABBBZ LDGVQ XF with priming key STAE.

(j) Decipher ZSTIF BUFIE IEXWA PTYO with priming key ZY. 8.9. EXERCISES 159

13. It has been remarked that one can detect an Auto Key Cipher (or a Vigen`ereCipher with long keyword) from the large number of VAISX’s and the lack of DQNJ’s in the ciphertext. Explain this remark.

14. Break the following Auto-Key Ciphers. A hint has been given.

(a) KFGWW SRVVV HYDWZ PXLSR VFMFY Hint: priming key was length 1 and message rhymes with “true” (b) RZDVQ BUXLR HGKXS SGSTS SHUDE SWMIY QFTZM VXSSG STSSA UWE Hint: priming key had length 2, and message contains several the’s (c) ORMMI MWIAD HTBAW I. The priming key had length 1, and ee ap- pears somewhere in the plaintext. (d) SNGHC UDRYI MCFLS UEZAN AGUP. The priming key had length 2 and one of the letters is N.

15. There are many ways of combining the several types of ciphers we have seen into new, and possibly more difficult ciphers. Here is one way. Pick a Vigen`erekeyword. For each letter in this keyword pick another keyword. Use these secondary keywords to form mixed alphabets using the Keyword Mixed method. Then to encipher the message use the Vigen`eretechnique, but use the mixed alphabet corresponding to the primary keyword letter. For example, we pick Suit as our primary key, and Club, Diamond, Heart and Spade as secondary keys. Then when enciphering the 1st, 5th, 9th, etc. letters of the message we use as cipheralphabet the one formed from Club and Vigen`erekey S. For the 2nd, 6th, 10th, etc. letters we use as alphabet the one formed from Diamond and Vigen`erekey U. Similarly for the rest of the message.

(a) Does this system have an increase of security over a regular Vigen`ere system with keyword Suit? If so, where is the increase? For example, is it because the Kasiski test fails? Or for other reasons? If the security has not increased, why not? (b) What if we make five secondary keys, perhaps Wild as the fifth, and cyclically use both the Vigen`erekey and the five cipheralphabets?

16. Suppose you capture a Vigen`ere-encipheredtext, and somehow you know that it was enciphered using a mixed cipher alphabet. Will or will not a Kasiski test still work? Explain.

17. How does the use of a keyword with repeated letters effect the accuracy of Φ? (Hint: What sort of encipherment and roughness would the keyword TTTATTT give?) 160 CHAPTER 8. POLYALPHABETIC CIPHERS

18. (a) Finish the computation of Φ for a text consisting of English with the normal distribution started in section 8.3. (Use Figure 1.3. So there are 8.2 A’s, 1.5 B’s, and N = 100.)

(b) Compute the value of Φ, if the plaintext consisted of English with the normal distribution that was enciphered with a Caesar cipher with key B.

(c) Compute the value of Φ, if the plaintext consisted of English with the normal distribution that was enciphered with a monoalphabetic cipher.

(d) Compute Φ for the ciphertext that has all 26 letters occurring an equal number of times.

19. In this exercise we work out the formula for Φ given in Equation 8.3.

Assume we have a ciphertext of N letters enciphered with a Vigen`ere cipher with keylength k. As we did in Section 7.7, lets assume we’ve written the ciphertext in an array, so there are k rows, each containing all the ciphtertext letters that were enciphered with the same keyletter.

Now Φ is the probability that two letters, chosen at random from the text, are the same. Either these letters were chosen from the same row, or from different rows.

Same row: There are N total letters and k rows, so about N/k letters per N N row. Hence there are k ( k − 1)/2 ways to choose two from any particular N N row. There are k rows and so k k ( k − 1)/2 ways to choose two letters a row. Finally, the likelihood that two letters are the same in any of these rows is 0.0656. So the contribution here is

N ( N − 1) N(N − k) .0656 ∗ k k k = .0656 2 2k

Different rows: There are k(k − 1)/2 ways to pick to two rows. There are N/k letters in each row, so there are N/k ∗ N/k ways to pick one letter from each. Finally, we have no way of relating the keyletters from the different rows, so we can only approximate the likelihood that these two letters are the same with 1/26. So the contribution here is

1 N N k(k − 1) 1 N 2(k − 1) ∗ ∗ = 26 k k 2 26 k

Adding the two probabilities, and simplifying, 8.9. EXERCISES 161

N(N − k) 1 N 2(k − 1) Φ ≈ .0656 + 2k 26 k .0656(N − k) + N(k − 1) 1 = 26 k(N − 1) .0656(N − k) + .0385N(k − 1) = k(N − 1)

This is Equation 8.3. (a) If our work has been correct, substituting k = 1 into 8.3 should produce the value of Φ in a monoalphabetic cipher, 0.0656. Does it? (b) If k = N, then the keylength is the same as the length of the text. This should produce a very flat frequency count, resulting in a value of .0385 for Φ. Does it? (c) It is sometimes of value to have a formula for k, the keylength. Mul- tiply both sides of 8.3 by k(N − 1) and then solve for k to find one. 20. (a) In section 8.3 we saw that Φ = .0656 when the language is English. Referring to Figure 1.5, compute Φ for French, German and Spanish. (b) Jeremiah 25:26 and 51:41 (taken from BibleGateway.org) are 25:26: and all the kings of the north, near and far, one after the other-all the kingdoms on the face of the earth. And after all of them, the king of Sheshach will drink it too. 51:41: How Sheshach will be captured, the boast of the whole earth seized! What a horror Babylon will be among the nations! (Sheshach is Babel (= Babylon) written with a reversed alphabet, in Hebrew, of course.) Translating the combined verses into German and French, and then enciphering with a monoalphabetic cipher pro- duces Text 1: FINPC DYCVN TCSPC YGIYU IVTNU GVNKL YCNPY DNTBU YCFPJ PUCYI FPJFP IVYCY IFINP CDYCV NQFPM YCSPM NUSYO PTCNU ICPVD FRFKY SYDFI YVVYY IDYVN TSYCK LYCKL FKANT VFFGV YCYPJ YLOPN TCKLY CKLFK YCIGV TCYKY DDYSN UIDFB DNTVY VYMGD TCCFT IINPI YDFIY VVYYC IKNUO PTCYY LOPNT AFAQD NUYYC ISYIV PTIYF PMTDT YPSYC UFITN UC

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z total 3 2 27 14 0 19 5 0 21 3 10 7 4 20 4 20 2 1 8 17 11 17 0 0 42 0 257 and Text 2: PUSGD DYRMU TCYSY AUMVS YUAST YUGKY UPUSS TYEYV UYUYT UYUUG JKSYL GUSYV UPUSG DDYRM UTCVY TJKYS YVYVS YSTYG PESYV EDGJK YSYAY VSBMS YUAAT USPUS SYVRM UTCWM UAJKY AJKGJ KAMDD UGJKT KUYUH VTURY UNTYT AHAJK YAJKG JKYTU CYUML LYUPU SSYVV PKLSY VCGUX YUYVS YYVMB YVHNT YTAHB GBYDX PLYUH AYHXY UCYNM VSYUP UHYVS YUUGH TMUYU A B C D E F G H I J K L M N O P Q R S T U V W X Y Z total 13 4 6 8 3 0 13 8 0 10 13 5 11 3 0 9 0 4 25 17 40 18 1 3 51 0 265 162 CHAPTER 8. POLYALPHABETIC CIPHERS

Which text is in German, and which is in French? (Compute Φ of each and, remembering that monoalphabetic encipherments do not change the Φ value of the text, use part (a).)

21. Here is a variant on the auto-key method. Rather than use the message as the key, which perhaps seems rather dangerous, instead use the secret message Encipher mysterious with priming key m and use your answer to help you discuss the security of this method. 22. Joseph Willard Brown wrote

“Ciphers are undiscoverable in proportion as their changes are fre- quent, and as the messages in each change are brief. When alphabetic ciphers are used, the aim should be never to allow any letter to ap- pear twice alike. The number of letters under each key is to be as small as possible [Brown page 99].”

Explain the meaning of this quote.

23. On March 8, 1913, the Secretary of State, William Jennings Bryan, sent the following message to all American Diplomatic and Consular Officers.

“Gentleman: For the purpose of affording a means of direct secret communica- tion between officers of the Army and Navy and the Diplomatic and Consular Service, the respective Departments are providing the nec- essary officers with the Larrabee Cipher Code, a copy of which is transmitted herewith. The following arrangements as to key words will be observed: (1) Key words may be arranged between officers using code either directly or through their Departments in Washington, when more direct secret communication is not possible. (2) In the absence of other agreement, the key word is to be the name of the month in which the message is sent. The Navy Department is supplying the Larrabee Cipher to the com- manding officers of all ships in the service.”

Enclosed with this letter was a chart, Figure 8.2, and instructions:

“Write down the words, message, or cipher to be converted, and write over them, letter for letter, the key word agreed upon, repeating the key word as often as necessary. For each letter to be converted enter the table with the letter of the key word found above it, as a marginal letter. If converting message into cipher [read] in the upper line abreast the marginal letter, the letter of the message, the letter of the cipher is [then] directly below it. If converting cipher into message [read] the lower line abreast the marginal letter [, then] the letter of the cipher the letter of the message is directly above it. 8.9. EXERCISES 163

A: ABCDE FGH I J K LMNOPQRSTUVWXYZ N: ABCDEFGH I JKLMNOPQR S TUVWXY Z abcdefghi jklmnopqrstuvwxyz nopqrstuvwxyzabcdefghi jklm B: ABCDE FGH I J K LMNOPQRSTUVWXYZ O: ABCDEFGH I JKLMNOPQR S TUVWXY Z bcdefghi jklmnopqrstuvwxyza opqrstuvwxyzabcdefghi jk lmn C: ABCDE FGH I J K LMNOPQRSTUVWXYZ P: ABCDEFGH I JKLMNOPQR S TUVWXY Z cdefghi jklmnopqrstuvwxyzab pqrstuvwxyzabcdefghi jk lmno D:ABCDE FGH I J K LMNOPQRSTUVWXYZ Q: ABCDEFGH I JKLMNOPQR S TUVWXY Z defghi jklmnopqrstuvwxyzabc qrstuvwxyzabcdefghi jklmnop E: ABCDE FGH I J K LMNOPQRSTUVWXYZ R: ABCDEFGH I JKLMNOPQR S TUVWXY Z efghi jklmnopqrstuvwxyzabcd rstuvwxyzabcdefghi jklmnopq F: ABCDE FGH I J K LMNOPQRSTUVWXYZ S: ABCDEFGH I JKLMNOPQR S TUVWXY Z fghi jklmnopqrstuvwxyzabcde stuvwxyzabcdefghi jklmnopqr G:ABCDE FGH I J K LMNOPQRSTUVWXYZ T: ABCDEFGH I JKLMNOPQR S TUVWXY Z ghi jklmnopqrstuvwxyzabcdef tuvwxyzabcdefghi jklmnopqrs H: ABCDE FGH I J K LMNOPQRSTUVWXYZ U: ABCDEFGH I JKLMNOPQR S TUVWXY Z hi jklmnopqrstuvwxyzabcdefg uvwxyzabcdefghi jklmnopqrst I: ABCDE FGH I J K LMNOPQRSTUVWXYZ V: ABCDEFGH I JKLMNOPQR S TUVWXY Z i jklmnopqrstuvwxyzabcde fgh vwxyzabcdefghi jklmnopqr stu J: ABCDE FGH I J K LMNOPQRSTUVWXYZ W:ABCDEFGH I JKLMNOPQR S TUVWXY Z jklmnopqrstuvwxyzabcde fghi wxyzabcdefghi jklmnopqr s tuv K:ABCDE FGH I J K LMNOPQRSTUVWXYZ X: ABCDEFGH I JKLMNOPQR S TUVWXY Z klmnopqrstuvwxyzabcdefghij xyzabcdefghijklmnopqrs tuvw L: ABCDE FGH I J K LMNOPQRSTUVWXYZ Y: ABCDEFGH I JKLMNOPQR S TUVWXY Z lmnopqrstuvwxyzabcdefghijk yzabcdefghijklmnopqrstuvwx M:ABCDE FGH I J K LMNOPQRSTUVWXYZ Z: ABCDEFGH I JKLMNOPQR S TUVWXY Z mnopqrstuvwxyzabcdefghi jkl zabcdefghijklmnopqrstuvwxy

Figure 8.2: Larrabee’s Cipher Code

To apply to figures let A B C D E F G H I J represent 1 2 3 4 5 6 7 8 9 0

To indicate that figures are used, write the letter Q followed by the number of figures expressed in letters as above that follow: then for the figures write the letters, thus QEICDJF means 93406, Q indi- cating that figures follow, E 5 the number of figures, and ICDJF representing 03496.

Example Key word Pekin Key words PEKI NPEKI NPE Message SEND QCCJJ MEN Cipher HIXL DRGTR ZTR Ordinarily however it is to send numbers in words, i.e. three hundred instead of 300. Before transmission the cipher is arranged in five letter groups. Thus, the above example would appear HIXLD RGTRZ TR etc.

There were plans for a new codebook for the Departments of State, War and Navy, and the Larrabee Code was to be only in temporary use. 164 CHAPTER 8. POLYALPHABETIC CIPHERS

(a) On November 8, 1913 The Adjutant General’s Office in the War Department, Washington sent the following to the Chief of Staff, U.S. Army. HBYID EEKRA VVOIU RROGI FUIIJ ONADJ XKRBO SKPYU VFZGF JRZBV ZYKFS WYOMV MCIVP CYIUO IOPVV RSFSW PWKLQ SQVFG VKQTF VGKZI ZHSMR FIQQO XRYRZ TMSXD RBOWM DOEBK GIPHI KYWN. AAPH. What did it say? (b) The HQ of Coast Defenses of Chesapeake Bay, Fort Monroe VA, struggled with this new cipher, as they reported in a letter of Nov 9, 1913. To attempt a decipherment of a Larrabee message they first crossed “out the inadmissible code letters”, and then used (probably) the Red code book to treat the remaining letters as code groups. This gave “Appropriations for the 4th quarter British Minister J. F. Reynolds Landis Henry J. Nichols Facts do not justify having party arrested” as the message. Trying a slightly different technique gave “Congress did not appropriate merited unknown unknown permanent arrangement.” They eventually realized their mistake: “It was evident that the meaning was not that which was in- tended. [And, after further consideration] It is thought that this message might be translated by the Larrabee Cipher Code Card which was sent to the Depot Quartermaster, Newport News, Va., on March 10, 1913, but which has apparently been misplacedby [sic] that officer as he is unable to furnish it.” Then on November 12 the same HQ sent ESAID FRTRQ DTTFV CRHOI DOSCN FNENF ITBRZ GMSHI RQSMH FHYNJ ZQMEI LFSJJ AOIZA DJWEF WJVCI SRLIG BHLYM SXVLA VWFFV ERKKS DURVJ GQEDF ULRGO PMSVR OSZGU QLVLQ JHQCI WHFIM EIIU Have they figured it out? What does the message say? (c) How is the Larrabee cipher related to the Vignere?

24. (a) Two very long message have been enciphered with a Vigen`erecipher using the same five letter keyword. How do the ciphertexts’ frequency charts compare? (“Very long” here indicates that the plaintext’s frequencies are close to that of typical English.) (b) Continuing, how the two ciphertext’s Φ values compare? (c) Two very long message have been enciphered with Vigen`ereciphers using five letter keywords. If the two ciphertexts’ frequency charts are very similar, can we conclude that the keywords were the same? If yes, why? If no, why not, and what may we conclude?

25. Explain in your own words why an endless, senseless key is needed for perfect secrecy. 8.9. EXERCISES 165

26. Decipher SVQYU EFNKW BHNRQ VIUZI CVSNR TVEZF WYXFA UNJYW MOADU KSVRS LUSJI JWVKC WCKYX PBVZK CDNOQ BZVGQ BMYMX QARTU G if the one-time pad key was FHEUC MFHGO JPNMM NVSRT VRBTE IRMHM PUNBC FGSYE IGIBG YDVAS KJOBV YSIEJ PGCFQ WURNG KLNIM TGDCF WXYVN MJKLB N. 27. Decipher LYLWZ FDHIG AWYOE GRKTZ TJGWC OQDQK CAEOO VBXLQ ITHJL REZOI POMZT QOMQS GTABM MJRMS BVFQC CNLQD RCAJA AYZM if the one-time pad key was PRNDS BQZQN TOGUT NJYTG PHYHV KZQCR UNKBG AXGTQ XZPFK NCZUQ LAHGM MWTWD CGXNS UTXMF INMIY KZGGZ TLWTG SHVJ. 166 CHAPTER 8. POLYALPHABETIC CIPHERS Chapter 9

Digraphic Ciphers

Although Hill’s cipher system itself saw almost no practical use, it had a great impact on cryptology. David Kahn

What has been constant about all the cipher methods that we have tried? All of the monoalphabetic ciphers had one letter replaced by a different letter as indicated in the plainalphabet–cipheralphabet chart. There were different ways to make up the cipheralphabet, but we always ended up with the pairs of alphabets. The polyalphabetic ciphers differed because they didn’t use just one cipheralphabet, they used several, and sometimes very many. However, when we were enciphering any particular letter of the plaintext, there was always some plainalphabet–cipheralphabet chart that we were using. The constant pattern is this letter-for-letter replacement. A single letter has always been replaced by a single letter. The idea behind polygraphic ciphers is to replace multiple strings of letters by other multiple strings.

9.1 Polygraphic Ciphers

Consider first the digraphic ciphers, in which pairs of letters are replaced by other pairs.1 Porta used these ciphers in his 1563 De furtivis literarum notis. The idea behind this is simple: replace each pair of possible plaintext letters (plain 1, plain 2) by a pair of ciphertext letters (cipher 1, cipher 2).

1Bigraphic, meaning two letters at a time, might be a more reasonable name, but, for some reason, digraphic is the word used when two letters are enciphered as a pair.

167 168 CHAPTER 9. DIGRAPHIC CIPHERS

Examples: The algorithm is to consider the letters two-by-two. If the letters are in alphabetical order, replace both by the letters that follow them. If they are not, replace them by the letters before them. So et→ FU and of → NE.

(1) Encipher nice doggy. (2) Decipher EZSBB U.2



There are a couple of problems. What do we do with double letters? (Prob- ably they are not in alphabetic order.) Or single letters at the end? (Add a meaningless letter to make a pair.) But basically this is a simple digraphic cipher. What does a general digraphic cipher look like? For our monoalphabetic ciphers we always ended up with a plaintext alphabet and a ciphertext alphabet. So for digraph ciphers we’ll have a (really big) chart telling which pairs of letters got replaced by which pairs of letters. To develop a very simple such system we start with a 26 × 26 array of all possible letter–letter pairs. We will use this as our ciphertext pairs. Next we use some method, say the keyword mixed method, to make two orderings of the alphabet. We put these alphabets across the top and down the sides of the chart. To encipher a pair of letters, call them (α, β), find α along the top row and β down the column. The pair of letters appearing in the α-th column and β-th row is our ciphertext.

Example: Using the keywords first and second gives the alphabets fagmuzibknvrcjowsdkpxtelqy and sajryebktzcfluogmvnkpwdiqx. Putting these above and along the side of the chart gives Figure 9.1. Then telephones=te-le-ph-on-es is enciphered as VFXFT TOSWA (keeping the usual 5 letter split.) LBGML OBW is deciphered as railroad. 

There are several drawbacks to this system. Since the chart has 676 = 26×26 entries it is not something a person would like to have to repeatedly produce. Further, while deciphering is not bad, because the edges are not in order it is a bit of a pain to encipher. What would be quicker would be to rearrange the chart so that the top and side alphabets were in order. But this means rewritting the entire chart, not an attractive idea. Even if these difficulties were surmounted, this cipher is simply not a good one. For example, all plaintext pairs f* will be enciphered to A*, and all plain- text pairs *s will be enciphered to *A, with similar results for the other letters. This cipher is a monograph-digraph hybrid of some sort, and is not much more secure than a pure monographic cipher.

2(1) MHDFE NFFXY, (2) fat cat. 9.1. POLYGRAPHIC CIPHERS 169

fagmuzibhnvrcjowsdkpxtelqy s AA BA CA DA EA FA GA HA IA JA KA LA MA NA OA PA QA RA SA TA UA VA WA XA YA ZA a AB BB CB DB EB FB GB HB IB JB KB LB MB NB OB PB QB RB SB TB UB VB WB XB YB ZB j AC BC CC DC EC FC GC HC IC JC KC LC MC NC OC PC QC RC SC TC UC VC WC XC YC ZC r AD BD CD DD ED FD GD HD ID JD KD LD MD ND OD PD QD RD SD TD UD VD WD XD YD ZD y AE BE CE DE EE FE GE HE IE JE KE LE ME NE OE PE QE RE SE TE UE VE WE XE YE ZE e AF BF CF DF EF FF GF HF IF JF KF LF MF NF OF PF QF RF SF TF UF VF WF XF YF ZF b AG BG CG DG EG FG GG HG IG JG KG LG MG NG OG PG QG RG SG TG UG VG WG XG YG ZG k AH BH CH DH EH FH GH HH IH JH KH LH MH NH OH PH QH RH SH TH UH VH WH XH YH ZH t AI BI CI DI EI FI GI HI II JI KI LI MI NI OI PI QI RI SI TI UI VI WI XI YI ZI z AJ BJ CJ DJ EJ FJ GJ HJ IJ JJ KJ LJ MJ NJ OJ PJ QJ RJ SJ TJ UJ VJ WJ XJ YJ ZJ c AK BK CK DK EK FK GK HK IK JK KK LK MK NK OK PK QK RK SK TK UK VK WK XK YK ZK f AL BL CL DL EL FL GL HL IL JL KL LL ML NL OL PL QL RL SL TL UL VL WL XL YL ZL l AM BM CM DM EM FM GM HM IM JM KM LM MM NM OM PM QM RM SM TM UM VM WM XM YM ZM u AN BN CN DN EN FN GN HN IN JN KN LN MN NN ON PN QN RN SN TN UN VN WN XN YN ZN o AO BO CO DO EO FO GO HO IO JO KO LO MO NO OO PO QO RO SO TO UO VO WO XO YO ZO g AP BP CP DP EP FP GP HP IP JP KP LP MP NP OP PP QP RP SP TP UP VP WP XP YP ZP m AQ BQ CQ DQ EQ FQ GQ HQ IQ JQ KQ LQ MQ NQ OQ PQ QQ RQ SQ TQ UQ VQ WQ XQ YQ ZQ v AR BR CR DR ER FR GR HR IR JR KR LR MR NR OR PR QR RR SR TR UR VR WR XR YR ZR n AS BS DS ES FS GS HS IS JS KS LS MS NS PS QS RS SS TS US VS WS YS ZS h AT BT CT DT ET FT GT HT IT JT KT LT MT NT OT PT QT RT ST TT UT VT WT XT YT ZT p AU BU CU DU EU FU GU HU IU JU KU LU MU NU OU PU QU RU SU TU UU VU WU XU YU ZU w AV BV CV DV EV FV GV HV IV JV KV LV MV NV OV PV QV RV SV TV UV VV WV XV YV ZV d AW BW CW DW EW FW GW HW IW JW KW LW MW NW OW PW QW RW SW TW UW VW WW XW YW ZW i AX BX CX DX EX FX GX HX IX JX KX LX MX NX OX PX QX RX SX TX UX VX WX XX YX ZX q AY BY CY DY EY FY GY HY IY JY KY LY MY NY OY PY QY RY SY TY UY VY WY XY YY ZY x AZ BZ CZ DZ EZ FZ GZ HZ IZ JZ KZ LZ MZ NZ OZ PZ QZ RZ SZ TZ UZ VZ WZ XZ YZ ZZ

Figure 9.1: A Simple Digraphic Substitution Chart

To make a better digraph cipher we need to better scramble the ciphertext pairs. What we really need is to take the 676 main entries of the previous chart and more randomly mix them around, producing a table like Figure 9.2. Enciphering is easy: hide becomes CTOW. But now deciphering is hard: try to decipher JGPS.3 This method has some problems as well, however. First, it is a pain to con- struct such a chart. Second, deciphering is hideously slow. Since the ciphertext pairs are all mixed, it takes a full search to find each pair for decipering. To comfortably use this method one would be need to make an additional chart of 676 entries for deciphering. Nonetheless, either by hand or with the use of a computer, two such charts can certainly be created. What would the security of such a digraphic cipher be? Pretty good, at least compared to our monographic ciphers. As always, this system is broken by frequency analysis, only now we are analysing the frequencies of pairs of letters, or bigrams, rather than of individual letters. Not surprising, the most common bigram is th. The top 18, according to the Brown Corpus, are listed in Figure 9.3. So when doing frequency analysis on a digraphic cipher, rather than looking

3seek 170 CHAPTER 9. DIGRAPHIC CIPHERS

abcdefghijklmnopqrstuvwxyz a RA XW DT JP PU VQ BN HS NO TK ZP FM LI RN XJ DG JC PH VD BA HF NB TX ZC FZ LV b IB OX UT AQ GV MR SN YS EP KL QQ WM CJ IO OK UG AD GI ME SA YF EC KY QD WZ CW c ZB FY LU RZ XV DS JO PT VP BM HR NN TJ ZO FL LH RM XI DF JB PG VC BZ HE NA TW d QC WY CV IA OW US AP GU MQ SM YR EO KK QP WL CI IN OJ UF AC GH MD SZ YE EB KX e HD NZ TV ZA FX LT RY XU DR JN PS VO BL HQ NM TI ZN FK LG RL XH DE JA PF VB BY f YD EA KW QB WX CU IZ OV UR AO GT MP SL YQ EN KJ QO WK CH IM OI UE AB GG MC SY g PE VA BX HC NY TU ZZ FW LS RX XT DQ JM PR VN BK HP NL TH ZM FJ LF RK XG DD JZ h GF MB SX YC EZ KV QA WW CT IY OU UQ AN GS MO SK YP EM KI QN WJ CG IL OH UD AA i XF DC JY PD VZ BW HB NX TT ZY FV LR RW XS DP JL PQ VM BJ HO NK TG ZL FI LE RJ j OG UC AZ GE MA SW YB EY KU QZ WV CS IX OT UP AM GR MN SJ YO EL KH QM WI CF IK k FH LD RI XE DB JX PC VY BV HA NW TS ZX FU LQ RV XR DO JK PP VL BI HN NJ TF ZK l WH CE IJ OF UB AY GD MZ SV YA EX KT QY WU CR IW OS UO AL GQ MM SI YN EK KG QL m NI TE ZJ FG LC RH XD DA JW PB VX BU HZ NV TR ZW FT LP RU XQ DN JJ PO VK BH HM n EJ KF QK WG CD II OE UA AX GC MY SU YZ EW KS QX WT CQ IV OR UN AK GP ML SH YM o VJ BG HL NH TD ZI FF LB RG XC DZ JV PA VW BT HY NU TQ ZV FS LO RT XP DM JI PN p MK SG YL EI KE QJ WF CC IH OD UZ AW GB MX ST YY EV KR QW WS CP IU OQ UM AJ GO q DL JH PM VI BF HK NG TC ZH FE LA RF XB DY JU PZ VV BS HX NT TP ZU FR LN RS XO r UL AI GN MJ SF YK EH KD QI WE CB IG OC UY AV GA MW SS YX EU KQ QV WR CO IT OP s LM RR XN DK JG PL VH BE HJ NF TB ZG FD LZ RE XA DX JT PY VU BR HW NS TO ZT FQ t CN IS OO UK AH GM MI SE YJ EG KC QH WD CA IF OB UX AU GZ MV SR YW ET KP QU WQ u TN ZS FP LL RQ XM DJ JF PK VG BD HI NE TA ZF FC LY RD XZ DW JS PX VT BQ HV NR v KO QT WP CM IR ON UJ AG GL MH SD YI EF KB QG WC CZ IE OA UW AT GY MU SQ YV ES w BP HU NQ TM ZR FO LK RP XL DI JE PJ VF BC HH ND TZ ZE FB LX RC XY DV JR PW VS x SP YU ER KN QS WO CL IQ OM UI AF GK MG SC YH EE KA QF WB CY ID OZ UV AS GX MT y JQ PV VR BO HT NP TL ZQ FN LJ RO XK DH JD PI VE BB HG NC TY ZD FA LW RB XX DU z AR GW MS SO YT EQ KM QR WN CK IP OL UH AE GJ MF SB YG ED KZ QE WA CX IC OY UU

Figure 9.2: A More Complicated Digraphic Substitution Chart for etaoinshr, as we would in a monographic cipher, we look for th-he-in-er- re-.... But the basic idea is the same: do a (very large) frequency count, guess that the top occurring pairs come from therein, and start working. So while we need more data, the techniques are very similar to the ones with which we are comfortable. How to make this system better? I guess a 26 × 26 × 26 hyper-chart listing all 17576 three-letter triples. Maybe one would do this as 26 charts, where the first letter of the triple was the name of the chart, the second one was on the left of the chart and the third one was one the top? The extreme cumbersomeness of a system like this would limits its use to those who have a computer to help.4

9.2 Hill Ciphers

Lester S. Hill (1881-1961) was an assistant professor of mathematics at Hunter College in New York when, in 1931, he published the method now named for him as “Cryptography in an Algebraic Alphabet” in The American Mathematical

4The idea of enciphering letters not individually but in groups, so-called block ciphers, reappears in almost every modern cipher system. 9.2. HILL CIPHERS 171

th 2.96 es 1.34 at 1.16 in 1.88 on 1.31 ed 1.14 er 1.75 st 1.24 ti 1.10 an 1.56 nt 1.21 nd 1.07 re 1.45 en 1.20 to 1.04

Figure 9.3: 18 Most Frequent Bigrams, in percent

Monthly, the undergraduate journal of the American Mathematical Society. He eventually received U.S. patent 1,845,947 for an apparatus that mechanically performed his cipher. Much of Hill’s work involved the use of mathematics in communications, for example, methods for splicing telephone cables. His cipher provides us with an easy example of a polygraphic cipher, but is more important because it shows that by the middle of the first half of the 20th century cryptology was being done primarily by mathematicians. To explain the cipher we need to introduce matrices (plural of matrix). Matrices are simply rectangular arrays of numbers and are quite important in mathematics, chemistry and physics. We will deal only with two-by-two  3 0 matrices, such as . Matrices are very easy to add and subtract – −1 4 simply perform the operation entry-by-entry. For example

 4 3 0 5 4 8  1 9 −2 3  3 6 + = and − = . −2 7 2 4 0 11 0 2 5 1 −5 1

Multiplication is a bit more complicated as each row of the first matrix is multiplied by the column entries in the second. (Rows go across and columns go down.) For example,

 3 5  2  3 × 2 + 5 × 9   6 + 45   51  × = = = , 14 23 9 14 × 2 + 23 × 9 28 + 209 235 and

 3 5  15  3 × 15 + 5 × 19   45 + 95  140 × = = = . 14 23 19 14 × 15 + 23 × 19 210 + 437 647

a b e a × e + b × f The general form for multiplication is × = . c d f c × e + d × f 172 CHAPTER 9. DIGRAPHIC CIPHERS

The then simply multiplies pairs of letters by a matrix.

Examples:

 3 5  (1) Encipher bios using . 14 23

b 2 First, = . Then multiplying: i 9

 3 5  2  51  25 Y × = ≡ (mod 26) = . 14 23 9 235 1 A

So bios is enciphered to YA.

Similarly, os becomes

 3 5  15 140 10 J × = ≡ (mod 26) = . 14 23 19 647 23 W

(2) Use the same matrix to encipher math.

13 20 ma= , and th= , so we have 1 8

 3 5  13 44 18 R × = ≡ = 14 23 1 89 23 W

and  3 5  20 100 22 V × = ≡ = 14 23 8 464 22 V

So the ciphertext is RWVV.

(3) Encipher hill using the same enciphering matrix.

 3 5  (4) Multiply by to decipher SWBAG RERQG YV. 5 14 23



5(3) QGJQ, (4) polygraphic. 9.2. HILL CIPHERS 173

We will be using a special type of 2 × 2 matrices known as involutions. They are special in that enciphering and deciphering is done using the same matrix.6 We need to take care in setting up these matrices.

Method for using Hill Ciphers: 1. Pick any number a from 0 to 25. 2. Pick a number b that is relatively prime to 26. Find its multiplicative inverse β from Figure 3.2. 3. Compute c = (1 − a2)β%26 and d = (−a)%26.

4. The enciphering and deciphering matrix is a b  a b  = . c d (1 − a2) · β −a

Examples: Suppose we choose a = 7 and b = 9. From Figure 3.2 b’s inverse is 3 = β. Then (1 − a2) · β = (1 − 72) × 3 = −146, so c = (−146)%26 = 12.  7 9  Since −7 ≡ 19 (mod 26), d = 19. So the matrix is . 12 19

(1) To encipher matrix we first convert the plaintext into numbers: matrix 13 20  9  = , and then compute 1 18 24

 7 9  13 22 V × ≡ = , 12 19 1 19 S  7 9  20  8  H × ≡ = , 12 19 18 26 Z  7 9   9  19 S × ≡ = 12 19 24 18 R

So matrix → VSHZSR.

(2) Decipher TIDDYVKM. 7



Hill ciphers may be done with matrices of any square size, like 3 × 3 or 5 × 5, and the entries in the array can be chosen with more latitude than our examples suggest. See Exercise 9.11 for consideration of the latter.

6For the use of non-involutions in Hill ciphers see Exercise 9.11. 7multiply. 174 CHAPTER 9. DIGRAPHIC CIPHERS

9.3 Recognizing and Breaking Polygraphic Ci- phers

What makes a digraphic cipher digraphic? It encipher pairs of letters, rather than single letters. For example, the word the will be broken up either as *t-he or as th-e*. In a monoalphabetic cipher each plaintext letter is always replaced by the same ciphertext letter. Don’t let the “di” in digraphic fool you – digraphic ciphers also have this one-to-one replacement. Each pair of plaintext letters is always replaced by the same pair of ciphertext letters. So there should be many copies of the ciphertext versions of th and he in our ciphertext. Further, and here is the key point, these repetitions will all occur at even distances. If there are an odd number of letters between two occurrences of the, then these two occurrences will be broken differently, and so will not lead to a repetition. It is only when the is broken the same way that we will get a repetition, and this only happens when there are an even number of letters in between, that is, when the distance is even. So a cipher that is not monoal- phabetic but has many many digraph repetitions occurring at even distances is almost certainly a digraphic cipher. Let’s demonstrate with an example.

Example: Decrypt XCCNQ FUARE LXELM XSBUM WLKVO KTWJU EELXA NZUQJ KCWFM KSKYN QOEYR QLXFK ELRKQ FYCSK OXELZ GZHQN UPIUM NYNVQ OXBRA VQAAN IPRYJ YKOQO WXUMK JELOZ YSCII EJLXI MQAGJ FNIKO BJJMH EIURL RKQFR SLXSR KJKOW FRQLX FKELR KEZ Of course, this is a Hill cipher. But how might we determine this if we didn’t know it? As always, we start with a frequency count:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z total 6 3 5 0 12 8 2 2 7 9 16 13 7 8 9 2 12 11 6 1 8 3 5 11 7 5 178

This is not a Caesar cipher. Nor does it really look like a monoalphabetic cipher, since the rare letters are not that rare. Is it a polyalphabetic cipher? Computing, Φ = 0.04621 suggests a keylength of about 4. In particular, this is not a monoalphabetic cipher. If it is polyalphabetic, we should be able to find some repetitions. Figure 9.4 contains all repetitions longer than length 2. The repetition RQLXFKELRK is too long to be ignored: if this is a polyalphabetic cipher the keylength must divide 112. But if we pick 7, 14, 28 or 56 as the keylength, we must ignore three nice repetitions (LRKQF, RLFER, and QXKLK), all of whose distance is divisible by 22. This seems strange. Further, there are 57 repetitions of length 2 (which we didn’t list). This is a huge number, far more than we’ve ever seen in a polyalphabetic cipher. And almost all of them occur in lengths divisible by 2. We have ruled out any of the ciphers we have ever seen, except for digraphic 9.3. RECOGNIZING AND BREAKING POLYGRAPHIC CIPHERS 175

Repetition Start Positions Distance Factors ELX 10 22 2 × 11 XEL 12 60 2 × 2 × 3 × 5 RQLXFKELRK 55 112 2 × 2 × 2 × 2 × 7 LRKQF 62 88 2 × 2 × 2 × 11 RLFER 28 56 2 × 2 × 2 × 7 QXKLK 117 56 2 × 2 × 2 × 7 LKF 120 44 2 × 2 × 11

Figure 9.4: Repetitions in the unknown cipher. ones. And the large number of repetitions of length 2 at distances divisible by 2 would lead us to digraphic ciphers anyway. So digraphic it is. Since we’ve recognized it, how do we break it? To attack a digraphic cipher we start as we always did – with a frequency count – except we count pairs of letters, not singletons. The only pairs appearing more than twice are EL and LX, 5 times each, and KO, QF and RK, 3 times each. Now what? We will grant ourselves the knowledge that it is a Hill cipher. In many ways Hill ciphers are a fancy version of the Decimation and Linear Ciphers from Chapters 3 and 4. They do do a better job of mixing up the frequencies than those ciphers did, but, just as correctly guessing two letters of a ciphertext enciphered with a Linear Cipher (and some math) leads to a de- cryption, correctly guessing two bigrams (and some math) leads to a decryption of a Hill cipher. (However, it is generally much harder to guess bigrams than letters.) Here, the pairs EL and LX must be very common bigrams. If we know which, we can solve for the enciphering matrix. For ease, we will make a (miraculously correct) guess, that LX=th and EL=at. This is, actually, fairly reasonable as th is the most common bigraph a b and at is the eleventh most common. So there is some matrix with c d a b L t a b E a = and = . Substituting the values for c d X h c d L t a b 12 20 a b  5   1  the letters, and we have = and = . c d 24 8 c d 12 20 Multiplying out gives two sets of equations

12a + 24b = 20 and 5c + 12d = 20 12c + 24d = 8 5a + 12b = 1.

These two sets of two equations in two unknowns may now be solved to see  9 5  that is the original enciphering matrix. From here, the quote may 10 17 176 CHAPTER 9. DIGRAPHIC CIPHERS be deciphered.8 

9.4 Playfair

Sir Charles Wheatstone, like several other cipher inventors we have met, was interested in many things. His name is attached to a method for measuring elec- trical resistance, the “Wheatstone bridge” but he also constructed an electric telegraph, invented the concertina, and wrote about acoustics. His accomplish- ments led him to be elected as a fellow of the Royal Society and knighted. As the Exposition Universelle in 1867 he displayed his “Wheatstone cipher machine,” a disc cipher device of some complexity that is very simple to use. But it is the Playfair Cipher we are currently interested in. Wheatstone invented it in 1854 and showed it to his friend, Lyon Playfair, Baron of St. Andrews, and in one of the more humorous episodes in cryptologic history they brought it to the attention of British Government officials. As David Kahn tells the story [Kahn, page 201]

Wheatstone and Playfair explained the cipher to the Under Secretary of the Foreign Affairs. ... When the Under Secretary protested that the system was too complicated, Wheatstone volunteered to show that three out of four boys from the nearest elementary school could be taught it in 15 minutes. The Under Secretary put him off. “That is very possible”, he said, “but you could never teach it to attach`e’s”.

Playfair, reasoning that this reflected more on the diplomats than on the cipher, remained enthusiastic about it.

The cipher was indeed used, perhaps first in the Crimean and Boer Wars. It is easy to use, and was popular as field cipher because it does not need tables or other equipment. The British attempted to keep it secret, but by the First World War the Germans could routinely solve Playfair ciphers. Wheatstone’s clever idea was to notice that in an rectangular arrangement of the alphabet there are many smaller rectangles, with four letters forming the corners. Exchanging two of the corners for the other two is a way to perform the (plain 1, plain 2) → (cipher 1, cipher 2) substitution at heart of a digraphic cipher.

8This is the opening quote from Chapter 3. 9.4. PLAYFAIR 177

Playfair Ciphers: Pick a keyword. Using the Keyword Mixed method Section 5.2 to mix the alphabet, arrange the the alphabet in a 5 × 5 array, considering i and j to be the same letter. To encipher, the plaintext is replaced two letters at time using the following rules. 1. If the letters are in the same column, replace each by the letter below it. 2. If the letters are in the same row, replace each by the letter to its right. 3. If the letters are in neither the same row or column, then use them to form a rectangle. The individual letters are replaced by the opposing corners of the same height, i.e., the opposing corner in the same row. (Notice that we must pretend that leaving the bottom of the array to be entering its top, and leaving the right-hand-side to be entering its left-hand- side.) Deciphering is simply the opposite: replace by the letters above, to the left, and by the opposing corners.

For example, if we use the alphabet in its usual order (with J omitted) the array is ABCDE FGHIK LMNOP QRSTU VWXYZ Then examples of each of the three enciphering steps are as follows:

1. bm→GR, uz→ZE 2. lo→MP, gk→HF 3. ls→NQ, er→BU

Similarly, here are examples of the three deciphering steps.

1. LA→fv, JO→dj 2. QU→ut, CE→bd 3. TW→ry, LI→ of

If there are repeated letters in a pair, insert a null, often x, to break them up. So balloon becomes ba lx lo on. Notice that oo is naturally broken, so no x is needed. If the message contains an odd number of letters either send the last letter un-enciphered, or add a final null letter before enciphering. 178 CHAPTER 9. DIGRAPHIC CIPHERS

Examples:

(1) Using keyword square, encipher rectangle The arrangement of the alphabet is

SQUAR EBCDF GHIKL MNOPT VWXYZ

Then we follow the directions. re forms corners of a rectangle with SF. Since S is on r’s row, and F is on e’s, re becomes SF. ct forms a rectangle with FO, and an with QP. gl are on the same row, so moving each one to the right g becomes H and (circling around) l becomes G. Finally, perhaps adding an x as a null, ex becomes CV. So the final ciphertext is SFFOQ PHGCV.

(2) Use the same keyword to decipher RKNKQ DFM. Each of the pairs RK, NK,QD, and FM are two of the four corners of a rect- angle. Replace each letter by the corner of the rectangle with the same height.

(3) Encipher foreign affairs using the keyword playfair.

(4) Use the same keyword to decipher QMIGH PSZQF BKKN.9



Of course, naming this cipher for Playfair is already unfair to Wheatstone. But further, Wheatstone recommended using the cipher with rectangular ma- trices, rather than only square ones, and mixing the alphabet as in the Keyword Mixed ciphers rather as in the simpler Keyword ciphers. Unfortunately, these improvements tended to be forgotten. The Playfair cipher is quick to set-up and easy to use. It is harder to break than a monoalphabetic cipher, but only somewhat so. To see why, notice that the only possible substitutes for e are the letters in its row and the one letter directly beneath it. (In the keyword SQUARE example earlier, these are the letters BCDFG.) So the letters in e’s row will be used a lot. This gives some hope of setting up an entire and row in an unknown Playfair cipher. Additionally, bigrams such as re and er will always be encoded by a reversal, for example as SF and FS. From here, with some guided guesswork and some training, anyone can be taught to break Playfair ciphers accurately and fairly quickly.

9(2) alphabet, (3 ) LTIGR EQPYZ PY, (4) three attaches. 9.5. SUMMARY 179

9.5 Summary

Polygraphic ciphers, or block ciphers, work by enciphering groups of letters at a time, rather than one letter at a time. In particular, digraphic ciphers treat pairs of letters, or bigrams, as units when enciphering and deciphering. Since there are 676 pairs of letters, enciphering and deciphering tables for digraphic ciphers are large and cumbersome, and hence never used (outside of introductory examples). Digraphic ciphers are the “monoalphabetic” ciphers for bigrams. That is, each plaintext bigram is replaced by the same ciphertext bigram throughout the entire message. Hence frequency analysis may be used to attack a digraphic cipher. But whereas e and t, the two most common single letters, make up nearly 20% of all letters, the two most common bigrams, th and in, make up less than 5% of bigrams. So using frequency analysis only to break digraphic ciphers is difficult unless one has a very large amount of ciphertext. In particular, a good digraphic system is much more secure than a good monoalphabetic system. The Hill Cipher System enciphers by treating the block of letters as a matrix with only one column and multiplying the column by another matrix. Using a 2 × 2 involution, a matrix with 2 columns and 2 rows that is its own multiplica- tive inverse, is most common, but non-involutions may be used, as may 3 × 3, 4 × 4 or larger matrices. Hill Ciphers can be thought of as a digraphic version of Decimation Ciphers. In particular, once one knows the plaintext equivalents of two ciphertext bigrams, then one can use some simple mathematics to de- termine the enciphering matrix, and hence break the message. The importance of the Hill Cipher is that it indicates that by the middle of the 20th century cryptography was almost entirely a mathematical subject. Playfair Ciphers were invented by Charles Wheatstone and are the most common digraphic ciphers, being used in both World Wars. After setting up a rectangular array of letters, enciphering proceeds by replacing pairs of letters with the pair to the right if they are in the same column, underneath if they are in the same row, or the opposite corners of the rectangle produced otherwise. The strengths of Playfair are its ease of use, its rapidity, and its use of a keyword. Conversely, each letter can be replaced by only 6 others, and this allows entries into the system.

9.6 Topics and Techniques

1. What is a digraphic cipher? How does it differ from a mono-graphic cipher?

2. What does the use of “di” in digraphic refer to?

3. Is a digraphic cipher monoalphabetic or polyalphabetic? Explain.

4. What is the general appearance of a digraphic cipher? That is, explain 180 CHAPTER 9. DIGRAPHIC CIPHERS

how a chart for a digraphic cipher is built and used.

5. Explain how to use two monoalphabetic ciphers to build a digraphic ci- pher.

6. What is a matrix?

7. How are two matrices of the same size added?

8. What is the Hill Cipher system based on?

9. How does the Hill Cipher encipher? How does it decipher?

10. Does the frequency count of a digraphic cipher have any special appear- ance?

11. What sort of frequency count is useful when trying to break a digraphic cipher?

12. What should the value of Φ suggest for a digraphic cipher?

13. What sorts of repetitions will Kasiski’s test find in a digraphic cipher? Why?

14. Who was Playfair? Who invented the Playfair cipher?

15. Explain the three enciphering rules for the Playfair Cipher.

9.7 Exercises

1. Here is a proposed digraphic cipher. To encipher a pair of letters p1p2 keep whichever letter is later in the alphabet and replace the other by the

letter equal to the distance between p1 and p2. For example, fe becomes FA, since f is later and f and e are 1 letter apart. Similarly, mr becomes ER since r is 5 letters later than m.

(a) Encipher pair. (b) Encipher twosomes. (c) Decipher CRAERING. (d) Decipher KLPHABOT. (e) There is a problem with this cipher. Can you determine what it is? Hint: decipher WRLZ.

2. (a) Here is a candidate digraphic cipher: To encipher a pair of letters, swap them if they are alphabetical ordering, if not, leave them fixed. So pair becomes PARI. Critique this proposed cipher. (Encipher some words and try to decipher your results.) 9.7. EXERCISES 181

(b) The use of “alphabetical” in part (a) didn’t work. Instead, when the letters are in alphabetical order, add 4 to each. When they are not, subtract 3. Leave doubles alone. Will this work? (Again try to encipher and decipher some words.)

3. A slightly fancier digraphic cipher is as follows. If the letters are both vowels, replace each by the vowels before them in the alphabet (ea become AU.) If the pair is two consonants, replace each by the next consonant in the alphabet (ns becomes PT). And if the pair is is one each, swap them (ho becomes OH).

(a) Encipher vowels. (b) Encipher breakers. (c) Decipher ROEAP VLA. (d) Decipher EBUOVZ. (e) We’ve learned to recognized monoalphabetic ciphers by their odd appearance – too few good etaoinshr letters, too many bad jkvwxyz letters. What sort of appearance will this cipher produce?

4. To make a combination Caesar-AutoKey Digraphic cipher, encipher each pair of letters by using a Caesar cipher with key k on the first letter, and an Auto Key Cipher on the second letter, using the first letter as the key.

Symbolically, for each pair p1p2 of plaintext letters, the ciphertext pair is (p1 + k)%26, (p1 + p2)%26. So if k = 3, then to becomes WH, as W is three letters past t, and o enciphered with key T is H.

(a) Encipher combinations with key 7. (b) Decipher JCLWH R, if the key was 12. (c) Encipher primer if the key is 21. (d) Decipher KZKVE MZAKV, if the key was 6. 5. In a table-based digraph cipher, like Figure 9.2, it is best if each letter appears in each row exactly once, and similarly in each column exactly once. If we had a three letter language “ABC” or four letter language “ABCD” then examples of such tables are

DC AD BB CA AB BC CA CD BC AA DB CC AA BB and AB CA CC BD BA CB AC BA CB DD AC

These arrangements are called Greek-Latin Squares, and exists for ev- ery size of square, except 2 × 2 and 6 × 6. (The 6×6 case is often referred to as the 36 Officers Problem: Suppose we have 36 military officers of 6 ranks from 6 different states, no two of whom 182 CHAPTER 9. DIGRAPHIC CIPHERS

have the same rank and are from the same state. Can they be arranged in rows of 6 so that each state and each rank is represented exactly once in each row and each column? In 1779 Leonard Euler showed no.) (a) Explain, maybe by using examples, why AA, AB, BA, BB cannot be put into a 2 × 2 Greek-Latin Square. (b) Find, through experimentation, a 5 × 5 Greek-Latin Square. (c) Sometimes the “Greek” part of the name is reserved for those squares whose front and back diagonals also have the “Latin” property. Find a 3 × 3 square of this type.  3 11 −2 1   4 9  8  6. Given A = , B = , C = , and a = , 19 4 10 21 −7 5 12 9 b = , compute the following. 2 (a) A + B. (b) B − C. (c) C + A. (d) Aa. (e) Ab. (f) Ba%26. (g) Bb%26. 7. Given a and b, find the corresponding Hill Matrix. (a) a = 8, b = 9. (b) a = 2, b = 17. (c) a = 11, b = 5. (d) a = 21, b = 19. 8. Use the given matrix to encipher and decipher the following. 4 21 (a) , mountain and ARKTC D. 3 22 10 15 (b) , valley and BXOWI N. 9 16 17 5 (c) , desert and JULGU Y. 10 9

9. Complete the Hill Matrix, and then use it to encipher and decipher. (a) a = 7, b = 11, apogee and EATJU N. (b) a = 18, b = 3, escarpment and DPWWZ EOLID. (c) a = 23, b = 25, impediment and AOFTV CGD. 9.7. EXERCISES 183

10. Just as we combined addition and multiplication to form Linear Ciphers, we can form Linear Hill Ciphers by first multiplying by a matrix, and then adding another matrix to the result. That is, the pairs m of the message are enciphered to Am + B, where A and B are the multiplier and addend matrices, respectively. 13 5   8 10 (a) Use A = and B = to encipher linear. 8 13 12 14 (b) Use the same matrices to decipher CSKKI SEB. (Remember to sub- tract before multiplying by the inverse.) 11. The matrices we used for Hill Ciphers were involutions – they are their own inverses. Limiting ourselves to just these matrices is rather restrictive.10 a b In general, the inverse of a matrix exists only when gcd(26, ad − c d  ed −eb bc) = 1. If this it so, then the inverse matrix is , where e −ec ea is the inverse of ad − bc modulo 26 found from the Euclidean Algorithm. (The value ad − bc is called the determinant, since it determines whether or not the matrix has an inverse.) (a) Which of the following matrices have inverses modulo 26?  8 4  5 10 11 6 (i) . (ii) . (iii) . 13 9 16 23 7 4 2 5 (b) Find the inverse of . 3 9 2 5 (c) Encipher determinants with . 3 9 2 5 (d) Decipher JWOSY QGKVO. It was enciphered with . 3 9 12. Modern cipher systems all encipher blocks of text at a time. (The block size in this chapter has generally been two, meaning two letters at a time.) Here we look at a cipher with block size two. To do this, first, rather than converting the message blocks to 2-12-15- 3-11-19 before multiplying, we convert it to 0212-1503-1119. Notice the zeros. With them aa=0101=101 is distinct from l=11; without it is not. Then performing a Decimation Cipher with a block size of two differs from our previous Decimation Cipher only in that the modulus must be chosen to be larger than 2525. That is, choose a number N larger than 2525, and an enciphering key e. Use the Euclidean Algorithm to check that gcd(e, N) = 1 and to find the deciphering key d. Encipher by multiplying

10Recall that for Decimation ciphers there are 12 proper key pairs modulo 26, but only 2 in which the enciphering and deciphering key are the same (i.e., involutions). In the Hill ciphers there are 157, 248 proper pairs of 2 × 2 matrices, but only 312 of these are involutions. 184 CHAPTER 9. DIGRAPHIC CIPHERS

pairs of letters by e modulo N, and decipher by multiplying by d. (The resulting ciphertexts will be numbers, rather than letters.) (a) If the modulus N is chosen to be 2999, show that e = 100 is a proper enciphering key. Find the corresponding deciphering key d. (b) Use e = 100 and N = 2999 to encipher blocks. (c) Use the deciphering key found in part (a) to decipher 2543-1513- 1460. (d) If N = 3001 and e = 29, find d. (e) Use e = 29 and N = 3001 to encipher number. (f) Use the deciphering key found in part (d) to decipher 218-2644-17- 2037. 13. (Continuing the ideas of Exercise 12.) A problem with the cipher from Exercise 12 is that the ciphertext is numbers not letters. If we want a bigram-decimation cipher, perhaps we should use 676 = 262 as the modulus, rather than 26. Here is one way to try to do this: Choose a key k with 1 ≤ k < 676, relatively prime to 26. For each pair of plaintext letters p1p2, first convert the letters into their numerical equiva-  lent, then compute the cipher-number c = k ∗ (p1 ∗ 26 + p2) %676. The ciphertext is then C1C2, where C1 and C2 are the quotient and remainder when c is divided by 26. For example, let k = 19. Then to = 20 15 has ciphernumber c = 19 ∗ (20 ∗ 26 + 15)%676 = (19 ∗ 535)%676 = 25. Since 25 ÷ 26 = 0 = Z and 25%26 = 25 = Y, to is enciphered to ZY. To decipher, first find the inverse of 19 modulo 676, which is 427. Then multiply ZY = 25 = 00225 by 427%676 to receive 535. Finally 535 ÷ 26 = 20 = t and 535%26 = 15 = o. (a) Use k = 101 to encipher stat. (b) Use k = 219 to encipher data. (c) Find the inverse of k = 87 modulo 676, and use it to decipher BONO. (d) Carefully examine the ciphertexts from this Exercise. What appears to be happening? What does this say about the value of this cipher? In particular, how “digraphic” is this digraphic ciphers? 14. It was hinted in the text that Hill Ciphers are simply a fancy Decimation Cipher. In this Exercise we see why this is so. (a) Find the form the enciphering matrix takes if we choose a = 0.

(b) Encipher the “message” p1p2p3p4p5p6 using the matrix you found in part (a). (c) Is the cipher performed in part (b) a Decimation Cipher? If not, how close is it? 9.7. EXERCISES 185

(d) (For those who have done Exercise 11.) The general Hill matrix has a b the form . What choices do we need to make for a, b, c and c d d so that performing Hill encipherment with this matrix is the same as a Decimation Cipher with key k? 15. You’ve somehow capture a ciphertext sent via Hill Cipher. The signa- ture is CVVOFY and you’ve reason to think that this is Lester. Use this information to compute the matrix used. 16. Encipher or decipher the following texts using a Playfair cipher based on the normal ordering of the alphabet. Add an x if double letters need to be divided. (a) Blue Balloons (b) Pat the Bunny (c) IMYNC OKHIS NPYNSC. (d) CBSMV DTBYC CLDA. 17. Encipher or decipher the following texts using a Playfair cipher based on the given keyword. (a) Keyword Northeast. Message Nine battalions will attack from the north at noon. (b) Keyword South. Message Enemy unit crossing river. (c) Keyword Western. Message ATCWS WEDDF TWSRW SCEFC HV. (d) Keyword Eastmarch. Message CPULD OHVFR QEQPH PSKPO. 18. One Army Signal School used as an example of Playfair the following message enciphered with the keyword CLIQUE [SSA1]. CP LE AH RZ PF IG TP NZ GA FG DG ZM PS PN TE CA MT PH CG DA FG OB OA IW CA BY What was the message? 19. Use a Playfair cipher with keyword Poems to decipher the following three phrases: (a) QKMSE WPQ. (b) OZCEMRHSXP. (c) XHRPGVAMEMUTZY. 20. [Kahn, pg 198] The first known demonstration of the Playfair Cipher was at a dinner party in January 1854 when Baron Playfair explained “Wheat- stone’s newly-discovered symmetrical cipher” to Prince Albert, Queen Vic- toria’s husband, and Lord Palmerston, Home Secretary and future Prime Minister, among others. The earliest remaining written description ap- pears on page 199 of Kahn, and is transcribed as follows. 186 CHAPTER 9. DIGRAPHIC CIPHERS

26 March, 1854 Specimen of a Rectangular Cipher

m b p y a d q z g f r n h s e u t k v i l w c o x

A despatch in the above cipher preserving the separation of the words.

We have received the xn epis erxhgfrf knr following telegraphic despatch gxaabytet inxrdscxekp frhybipk

A dispatch in the same cipher with no external indication of the separation of the words. We have received the xnznyinferxhgfrfvgeh following telegraphic despatch itmyymxtsqgvrxsfemhpkxutsexckwh

The same cipher arranged in a different rectangle.

m b p y a d q z g f r n h s e j u t k v i l w c o x −

We have received the following telegraphic despatch cs sycr ugswbpas feh jkddapqjk fhchbtiwnly hynasgle

Key to the permutated alphabet employed in the preceding ciphers

m a g n e t i c b d f h j k l o p q r s u v w x y z mbpyadqzgfrnhsejutkvilwcox

[signed] C. Wheatstone 9.7. EXERCISES 187

The rules that Wheatstone used to encipher his messages are a bit different than the ones we use. Can you figure them out? Hint: consider Playfair’s name for this cipher.

21. Another way to use Playfair is construct a 6 × 6 square that consists of the 26 letters followed by the 10 numbers 0, 1, ..., 9. Then encipher and decipher as usual. (Below we use z as the double-breaking letter, rather than x. I’ve also used o for the letter “O”, to prevent confusion with the number 0.)

(a) Encipher 1800 Pennsylvannia Avenue. (b) Encipher The Lodge, the residence of the Australian Prime Minis- ter. (c) Decipher INWBG RBSoM TUo0. (d) Decipher FHBCM LPVWA. (e) Decipher 76CPT QHo HTXNB 2BW. (f) Decipher Y8Y0T YWAVF oLWD. 22. During World War II there was a widespread network of Australian coast- watchers spread about on the numerous island in the South Pacific. [Kahn, pages 591–3.]

They observed enemy activity from the peaks and cliffs of enemy-held islands, collected bits of information from native allies, and radioed their information to Allied military commands. They frequently gave valuable early warning of Japanese bombing raids and ship move- ments, and they assisted in the rescue of downed Allied airmen. In the early morning hours of August 2, 1943, coast-watcher Lieu- tenant Arthur Reginald Evans of the Royal Australian Naval Volun- teer Reserve saw a pinpoint of flame on the dark waters of Black- ett Strait from his jungle ridge on Kolombangara Island, one of the Solomons. He did not know then that the Japanese destroyer Ama- giri had rammed and sliced in half an American patrol torpedo boat, PT 109, lieutenant John F. Kennedy, United States Naval Reserve, commanding. But at 9:30 that morning he received

the following message:

(a) KXJEY UREBE ZWEHE WRYTU HEYFS KREHE GOYFI WTTTU OLKSY CAJPO BOTEI ZONTX BYBWT GONEY CUZWR GDSON SXBOU YWRHE BAAHY USEDQ The keyword is Royal new zealand navy. What does it say? (b) Evan reported back that Object still floating between Meresu and Gizo, and was told later that there was a possibility that survivors landing either Vangavanga or islands. In fact, Kennedy had led his men to Plum Pudding Island. This was behind enemy lines, and only 3 or 4 miles from Gizo Island where there was a Japanese garrison. 188 CHAPTER 9. DIGRAPHIC CIPHERS

On Saturday morning, August 7th, Even received news from a native that the crew had been found and moved to Gross Island. He sent the following message: XELWA OHWUW YZMWI HOMNE OBTFW MSSPI AJLUO EAONG OOFCM FEXTT CWCFZ YIPTF EOBHM WEMOC SAWCZ SNYNW MGXEL HEZCU FNZYL NSBTB DANFK OPEWM SSHBK GCWFV EKMUE The key was Physical Examination. Decipher the message. (c) Another message reused the key Physical Examination, and began XYAWO GAOOA GPEMO HPQCW IPNLG RPIXL TXLOA NNYCS YXBOY MNBIN YOBTY QYNAI

Later messages explained the rescue arrangements. The combination of messages could have been easily solved in an hour by any moderately experienced cryptanalyst. If solved, the Japanese could have gotten both the shipwrecked crew and the rescue force. For whatever reason, there was no enemy action taken, and Lt. Kennedy was rescued and went on to become President.

23. H.F. Gaines [page 199] proposed a cipher that uses the Saint Cyr Slide to implement a digraphic cipher, calling it, logically enough “Slidefair”. Start by setting the keyletter under a, as usual. To encipher a pair of letter, find the first letter in the plaintext strip and the second in the cipherstrip. Replace them by other corners of the rectangle then formed, first letter from the plainstrip and second from the cipherstrip. If the letters happen to occur as a horizontal pair, replace them by the letters directly to their right. Her example used the keyword HERCULES and produced the ciphertext XZ ZR RU KC TI HO KX US MZ NI JU KO TI PO SC MW PR PM XY RW GZ AT What was the plaintext? Chapter 10

Transposition Ciphers

In one word, the transposition methods give a nice mess [salade] of cleartext letters. Etienne Bazeries

So far each of the methods of ciphering we’ve studied have been substitution methods. They obtain their secrecy by hiding the meaning of the letter. In this chapter we turn to transposition ciphers, ciphers which obtain their secrecy by hiding the location of the letter. We start with a couple of simple classics.

10.1 Route Ciphers

Route ciphers are also called rail fence ciphers, for an obvious reason.

Examples: Decipher the route ciphers.

a e i c l (1) Decipher AEICL BLNON. b l n o n (2) Decipher TEIEP NDSR. (3) Decipher OBNN LAIECL DLO. (Hint: three rows.) (4) Decipher ILE ALPITR RST. (5) Decipher ILVGI IOIAE ITSRN MANHM NG.1 (Try three rows.)



1(2) Two rows, read backwards: president, (3) Old Abe Lincoln, (4) rail splitter, (5) I am leaving this morning – from [HITT, page 28].

189 190 CHAPTER 10. TRANSPOSITION CIPHERS

10.2 Geometrical Ciphers

Traditionally, the most common way of using transposition was to arrange the plaintext letters in some sort of rectangle. When the insertion and reading of the letters is done in patterns other than rows and columns, we call the ciphers Geometrical Ciphers.

Examples: Decipher the geometrical ciphers.

(1) Decipher ANOOR RDXOA OWEOD UNDAN. (Write in 4 rows and read off in a spiral.)

(2) Decipher IINIZ ATGDZ MTVYY GEERX. (3) Decipher MOGIN VNNOA IAGLI DAIRC ISTKY.2



10.3 Turning Grilles

A grille is simply a piece of paper in which holes have been cut. To encipher, lay the grille on a clean sheet of paper, write your message in the holes, and, once finished, remove the grill and fill in the leftover spaces with nulls. Deciphering is easy – lay the grille back on the ciphertext to see the meaningful letters. Giacolomo Cardano, last seen in Chapter 8.8.4 apparently invented this method in the 16th century. The problem with this method is that to be secure there must be many nulls, maybe up to half, and this makes the ciphertext long in relation to the plaintext. To reduce the number of nulls, we need to be more careful about where we put the holes.3 Start with a piece of paper that has a grid marked on it. (A grid here simply meaning a series of horizontal and vertical lines that visually divides the paper into equal sized squares.) Select some 2n × 2n collection of these small squares that forms a large square and divide it into four n × n squares. Fill up the upper left-hand sub-square with the numbers 1 through n2 with 1 through n on the top row, etc.. Then turn your paper one-quarter of the way around and similarly fill the new upper left-hand square with 1 through n2. Repeat this process twice more, so that each of the numbers is repeated four times in the big square. Next, pick exactly one of the four copies of each of the numbers and highlight or circle it. It’s best if you pick about the same number of numbers from each of the small squares. Take a new sheet of paper and, using the old as a reference,

2(1) around and around we go, (2) Read off going down and up the columns: i am getting very dizzy, (3) Read off the diagonals: moving in a diagonal is tricky. 3To be precise, the Cardano method actually produces a open cipher rather than a transposition, but we aren’t that precise. 10.3. TURNING GRILLES 191

1 2 3 7 4 1 4 5 6 8 5 2 7 8 9 9 6 3 3 6 9 9 8 7 2 5 8 6 5 4 1 4 7 3 2 1

Figure 10.1: A 3 × 3 turning grille. cut out boxes in the new sheet that are where and of the same size as the boxes you highlighted on the first sheet. We have made a turning grille. In Figure 10.1 a set of numbers has been chosen, as the bold typeface indicates. To encipher the message lay your grille onto a clean sheet of paper. Moving from left to right and top to bottom, as usual, write your text in the holes in the grille. Once you have filled the holes give your grill a quarter turn and continue with your message. If your message is quite long you may have to repeat the filling process on a second or even third sheet of paper, while if your message is short, you’ll probably want to fill the remaining spots with nulls. The deciphering process is the natural opposite of the enciphering: lay the grill over the sheet and copy the letters off that you can see, turning the cipher- sheet one quarter each time you have copied down all the letters.

Examples:

(1) Encipher Girolomo Cardano was a mathematician using the grille in Figure 10.1. Nine letters at a time will be entered. After the first nine letters, the partial ciphertext looks like

G I R O L O M O C

Now give the grill one quarter twist and enter nine more. This make

A G R I D R A O N L O O W M S O A C 192 CHAPTER 10. TRANSPOSITION CIPHERS

Do this twice more. Since the plaintext has only 31 letters we add five nulls letters at the end. (I chose stopt as the nulls.) Then pull off the ciphertext in rows and regroup.

(2) Decipher UTOWL MHHDE IAFTO NADOG MBIHD OEAEU CNCOS Y using the same grille.4



For about four months in 1917 the German Army used Turning Grilles. (The German Army was quite lost, cryptographically, during much of World War I, and tried just about every method they could get their hands on.) ANNA was 5 × 5, BERTA was 6 × 6, all the way up to FRANZ, who was 10 × 10. The French, who were crytologically far more advanced than the Germans, treated Anna, Berta and Franz as old friends, and happily read most of the messages sent in this manner.

10.4 Columnar Transposition

Columnar transposition is the transposition method. It was widely used, especially in World War I, and fell out of favor only because of the rise of machine ciphers.5 To encipher, pick a keyword and write the message under the keyword in rows with as many columns as letters in the keyword. Number the columns alphabetically from the code word, just as in the keyword transposed ciphers from Section 1.5.3. (Remember we decided to consecutively number repeated letters, so REPEATER is numbered 62531847.) Then pull the columns out one- by-one in this numerical order. Separate the letters into groups of five and you have enciphered your message. (It is possible that your message will not make a perfect rectangle. That is fine. Either ignore the blanks, or fill them with nulls.) To decipher the message we need to know not only what the proper ordering of the the columns is (which we do, as we know the keyword), but also how many letters go into each column. Suppose we have a message N letters long, and the keyword has k letters. Start by drawing a rectangle with k columns. Next divide the columns into rows. There will be N ÷ k full rows (here N ÷ k means this integer part of division), and a final row that will contain N%k letters at its beginning and be blank at the end. Cross out these blank spaces so you won’t be tempted to put letters into them. Now put in your ciphertext

4(1) MACGA IRITA DHNER ASOMN TALOO OWMTP SOIAT C. (2) who did anything he could to become famous, (which is true about Cardano, as his biographers can attest.) 5Machine ciphers are clearly much faster than ciphering by hand. Further, since trans- position generally deals with large groups of letters at one, as opposed to substitution which enciphers small groups at once, the early limits on computer memory made pure transposition machine-ciphers less secure than substitution machine-ciphers. 10.4. COLUMNAR TRANSPOSITION 193 column by column, following the order given by the keyword. Reading across the rows then gives the message.

Examples: Encipher or decipher using columnar transposition.

(1) Encipher Here is my very clever secret message using the keyword cipher.

c i p h e r HEREIS MYVERY CLEVER SECRET MESSAG E

Reading down we have the c column: HMCSME, the e column: IREEA, the h column: EEVRS, the i column: EYLEE, the p column RVECS, and the r column: SYRTG. Putting these together and regrouping we obtain HMCSM EIREE AEEVR SEYLE ERVEC SSYRT G.

(2) Encipher Monday morning arrives too early using the keyword simple filling last row with null m’s.

(3) Decipher DKOEY TGHRT OORAE NTFSO TEIAL GE using the keyword square. There are 27 letters in the message and 6 in the keyword, so there will be 27 ÷ 6 ≈ 4 full rows and 27%6 = 3 letters in a fifth row. So we will fill in

SQUARE

First we fill down the A column (putting in the first four letters of the message), SQUARE D K O E 194 CHAPTER 10. TRANSPOSITION CIPHERS

then fill down the E column with the next four,

SQUARE D Y K T O G E H

then the Q column with the next five, etc. This gives us

SQUARE FRIDAY STAKET OOLONG TOGETH ERE

From here we simply read off the message (ignoring the keyword): Fridays take too long to get here.

(4) Decipher VOOGH KEBWN TEEDR FEERY KOWNY SIRED with keyword daily.

(5) Decipher ENMPK EAKWN TESUE VYEES SDAAN with keyword later.6



There are two standard tricks to use with transposition. The first we’ve seen before – the use of nulls. As with monoalphabetic substitution, nulls are simply meaningless letters added to the plaintext before enciphering with the hope of making the decrypter’s life more difficult. While nulls may appear anywhere, we will restrict ourselves to putting nulls at the beginnings and endings of mes- sages. As we will see, nulls used in this way add only modestly to the security of columnar transposition. More worthwhile is performing interrupted colum- nar transposition in which certain spots are skipped during the enciphering and deciphering process.

6(3) YNVEO OASRA IIONR RTLDN ROYMM GEA. (4) A completely filled rectangle with 6 rows: Everybody’s working for the weekend. (5) The te and nns are nulls: Seven days make up a week. 10.5. TRANSPOSITION VS. SUBSTITUTION 195

Examples:

(1) Encipher some spots are simply skipped using the keyword START with the interruptions as given: START ∗ ∗ ∗

∗ Enciphering is almost the same as before: put the message into the ar- ray, skipping the designated spots. Then when pulling out the columns, remember to forget to write down the ∗’s. (2) Decipher STPAR STEIA GPSTC NUTMN RNRIU SEENH IRSID EOUET. The key- word is EXPLODE and the interruption pattern is ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗

7



10.5 Transposition vs. Substitution

How do we recognize that a cipher is possibly from a columnar transposition, rather than from some form of substitution? A message enciphered with a any transposition consists of exactly the same letters as the plaintext message. A frequency analysis of the ciphertext will thus result in a chart that matches the a-e-i, no, rst, and uvwxyz patterns exactly, and in their natural positions. So it is easy to detect a transposition cipher: its frequency chart will look like one from common English but the text won’t be readable.8 Conversely, a substitution

7(1) MPAIS ORMKE STSLP OSSYP EEPID. (2) Missed interruptions cause anger eruptions. 8Since position and not identity is now what is important, we probably want to make any nulls we use to be similar to the other letters in the plaintext, in the hope that the enemy will be confused as to which letters are meaningful. So, while in a we use letters like jkxqz as nulls to try to make frequency analysis harder, in a transposition cipher we use etaoinshr letters to add insignificant significant-looking letters to the ciphertext. 196 CHAPTER 10. TRANSPOSITION CIPHERS cipher will generally have too many jvkxyq’s and too few etaoinshr’s. To our eyes, trained to read English, transposition ciphers will look like a salad of letters (to mis-translate Bazeries’ quote), worthwhile components but oddly mixed, whereas substitution ciphertexts look unappetizing, with too many odd letters.

10.6 Letter Connections

Decrypting a transposition cipher is a bit like putting Humpty Dumpty back together again – all the pieces are given, we just need to determine which goes where. When doing this, we will continually have a letter and will be wondering which letter most likely came just before it in the message, and which letter likely came just after it. These are questions of conditional probability. Notice the difference between

“This letter is a T. How likely is it that the next letter an H?” and

“How likely is it that this letter is a T and the next letter is an H?”

Figure 10.2 provides the answer to the first question, in percent.9 In the row centered by T, just to the right of T appears H32, which indicates that T is followed by H almost 1/3 of the time. Similarly, the appearance of U4E17O18A22I26 just before N show that N is almost always preceded by a vowel, with that vowel being I about 1/4 of the time. And that when the rare V does appear, it is usually followed by E.10 For comparison, the answer to the section question (“how likely is it that this pair is “TH?”) appears in Figure 10.3. This figure is the bigram companion to our standard frequency chart Figure 1.3, as it shows how likely each possible bigram is in standard English.11

9 These percentages were computed using “The Brown Corpus.” The Brown Corpus of Standard American English was compiled by W.N. Francis and H. Kucera at Brown University, Providence, RI, from one million words of American English texts printed in 1961. The texts sampled came from fifteen different categories ranging from “Reportage” (The Philadelphia Inquirer, May 10, 1961, p.49) to “Popular Lore” (Jack Kaplan, “The Health Machine Menace: Therapy by Witchcraft”) to “Romance” (Samuel Elkin, “The Ball Player,” Nugget, October, 1961). By modern standards, this corpus is considered small and dated. 10We remarked back in Chapter 6 that vowels like to combine with consonants. Look at the vowels rows to ascertain the validity of this statement. Likewise, that H is more likely to precede vowels while N and S are more likely to follow them. 11The numbers are in %%, meaning that you should divide by 100 and add % to the end. So the 13 in the BA entry means that BA appears .13% of the time, about 1/10 of 1 percent of the time. Values smaller than 0.01% have been left out. 10.6. LETTER CONNECTIONS 197

F2G2I2Y2P3D4C5L5W5M6N6S7T7R8H10E12 AN19T14L10R10S10C5D4I3M3B2G2P2V2Y2 L3I4N4R4U4Y4M5T5S7O9D10A13E13 BE30L12O11U11A8Y7I5R5S2 D2Y2T3R4U4O5S7N11A13I16E19 CO19E15H14A13T9I6K4L3R3U3 D2O4R4L6I7A9N27E28 DE15I13T10A9O7S6B4H3R3U3W3C2D2F2L2M2 G2I2P2W2B3C3E3D5L5M5N5S5V5T7R11H20 ER14S10D9N9A7T6C4L4E3I3M3W3F2O2P2 Y2L3R3T3D4N4A5F5S5I6E13O36 FO17T15I11A9E8R7F5L3U3C2H2S2 D2S2O4R4U6A9E9I11N42 GE15H12A10I8O8R8T7S4L3U3N2 D2N2E4G4W6S7C8T54 HE48A16I13O8T4 C2G2V2A3F3M3W4E5N5D7L7R8S8H9T15 IN25S12T12C7O7L5D3E3M3R3A2F2G2V2 G2Y4O5R5T5A6S6B7D8N10E14F15 JU28O25E19A13I3T2 L4E7I7S7N8O11R12A14C20 KE34I15N7S7A6O4T4H2L2W2 C2N2R2S2T3B4P4U6O7I8L12E13A19 LE16I12L12A11Y9O8D6S3T3U2 Y2D3M3N3T3U4S5R6A10I10E18O18 ME25A19I11O11P6U4B3M3S3T3 R2U4E17O18A22I26 NT17D15G11E9O7A6S6C5I5 B2G2W2D3M3O3P3Y3E4L4F5H5N6C7I7S7R8T13 ON17R13F11U10M6T6L4W4O3S3C2D2P2V2 L2X2Y2D3I3N3T3R4P5U5M8A9O10S11E17 PE18R16O13A12L10I5P5T4H3U3S2 O2R2C3T3A4D4N5I8S12E43 QU98 D2G2F3I4P5T5U6A14O16E28 RE23A10O10I9S7T7D3Y3C2M2N2 L2Y2D3O4T5U5R6N7S7A12I13E20 ST18E11A9I9O8S7H6C3P3U3W3M2 H2Y2C3F3U3D4O5R5T5E8I9A12N13S13 TH32I11O11E10A6T5S4R3U2W2 G2H2N2P2A3C3E3F3L3M3Q3D4R4B6T7S9O28 UR14S13T13N12L10C5G4P4A3E3I3M3B2D2 D2L2N4R5O15A17I20E24 VE65I19A9O5 L2A4R4Y4N5D6T10S11O17E20 WA22H17I17E16O11N4 O5I7A8E74 XP26T17I12A11C10E7O2 H2M2O2D3S3N6B7E9T10R11A12L21 YO13T11A10S9E7I6W5B4C3F3H3M3P3D2L2R2 N2U2T3O4E5Z7A14I51 ZE44A18I11Z7O5L2Y2

Figure 10.2: Appearances before and after the given letter, in %.

To use the charts correctly, we need to carefully note the meanings of three different possible numbers:

1. If we ask “I have V. How often is V followed by E?”, look for E in the V row of Figure 10.2. The answer is 65% of the time. V is (almost) always followed by a vowel, and E is the mostly likely one.

2. If we ask “How often is E preceded by V?” the answer is much smaller, only 5%, as seen also in Figure 10.2. Why smaller? That is the “conditional” part of the probability. Looking at a V one feels that a vowel is probably next. Looking at an E we are somewhat surprised if the preceding letter is V.

3. If we ask “Here are two letters. Are the VE?” we find 64 in Figure 10.3, meaning .64% = 0.0064 of all letter pairs are VE’s. So this pair is quite uncommon, as matches our intuition. 198 CHAPTER 10. TRANSPOSITION CIPHERS

.A .B .C .D .E .F .G .H .I .J .K .L .M .N .O .P .Q .R .S .T .U .V .W .X .Y .Z A. 2 20 40 37 11 18 3 28 1 9 82 27 156 1 18 86 81 116 9 17 8 1 21 1 B. 13 1 47 8 1 18 17 9 3 1 17 12 C. 41 5 48 46 19 13 12 59 11 2 30 10 2 D. 36 16 8 9 63 9 5 14 52 1 8 9 8 30 7 12 25 40 12 2 12 6 E. 98 20 59 114 45 32 17 22 39 2 4 54 47 120 35 34 4 175 134 80 8 24 39 14 16 F. 22 2 4 2 20 13 1 5 26 2 7 3 1 41 3 18 5 35 8 2 1 G. 21 2 2 1 30 3 3 25 16 6 2 5 17 2 17 7 15 7 3 1 H. 88 2 3 1 260 1 3 72 2 3 3 44 1 8 5 22 6 3 4 I. 19 7 52 28 28 15 21 1 5 36 27 188 54 7 25 89 88 20 1 1 4 J. 2 3 4 5 K. 4 22 1 1 10 1 5 3 5 3 1 1 L. 46 5 5 27 69 7 1 3 52 2 52 4 1 35 5 3 15 15 10 2 4 37 M. 48 8 1 63 1 2 28 1 8 1 30 17 3 9 7 10 2 4 N. 49 7 36 107 63 10 82 11 40 1 5 9 7 9 50 6 4 48 121 7 4 10 10 O. 13 14 16 18 6 86 8 7 10 7 30 48 131 24 21 100 30 49 77 15 32 1 4 P. 25 36 7 11 20 1 27 11 32 4 9 7 Q. 10 R. 64 7 14 18 145 7 9 8 60 8 11 17 15 66 8 12 43 47 11 5 8 19 S. 62 11 22 7 73 13 4 40 63 1 4 11 14 9 57 23 1 6 46 124 25 1 21 5 T. 61 9 10 5 95 7 2 296 110 1 13 9 4 104 6 35 38 49 20 20 18 U. 10 7 13 7 10 1 11 8 27 10 33 1 11 39 36 35 V. 9 64 19 5 W. 41 1 30 32 33 1 1 7 20 3 3 3 1 1 X. 2 2 1 2 5 3 Y. 17 7 6 4 12 5 2 6 11 4 6 3 23 6 4 16 19 1 8 Z. 1 4 1

Figure 10.3: Bigram Frequencies (in %%), from the Brown Corpus.

10.7 Breaking the Columnar Transposition Ci- pher

Once we know we have a transposition cipher (and for us this means columnar transposition), to start trying to decrypt it first count the length of the message, and use this to find all the possible shapes. (In exchange for making your life easier by giving you only completely filled rectangles to decrypt, I may complicate things by throwing in nulls both at the beginning and end of the message.) So if there are 40 letters in the message we’d expect an 8 × 5 or 5 × 8 rectangle. (Variants like 2 × 20 or 4 × 10 are possible but less likely.) And if the message is only 35 letters the rectangle must be 5 × 7 or 7 × 5. On thin strips of paper write the message in columns of the proper length. Then start flipping the pieces of paper around until a message starts to form. How does one “flip around”? Start by putting the strips near one another and making a vowel count of the rows. In English, vowels make up between 35% and 40% of all letters. It is uncommon to find even a ten-letter segment that has fewer than 3 or more than 5 vowels. As a quick test of your rectangle, count the number of vowels in each row. If many of the rows have much less than 40% vowels, or much more, then you may wish to instead try a rectangle of a new size. Once you have a rectangle with a good vowel count turn to any peculiarities of the message you can find, such as those coming from the appearance of low- 10.7. BREAKING THE COLUMNAR TRANSPOSITION CIPHER 199 frequency letters. For example, a q in the message must be followed by a u; j, v and z are almost always followed by a vowel; x is generally preceded by a consonant. Counting the contacts will help determine which strips fit together the best. For each pair of columns that may possibly fit together, sum the values from Figure 10.3 for that pair. Often the pair of columns with the highest sum will be neighbors in the plaintext. Finally, look for words. Decrypting a transposition cipher is very similar to the final step in decrypting a complicated monoalphabetic ciphers in that you must just work hard, carefully sort through the possibilities, and let your English intuition solve it for you.

Examples: Decrypt the following transposition ciphers.

(1) EERHE LARGE GNEDH IWIDD OERET AIYTT SSERT.

This has 35 fairly standard letters, so must be a transposition cipher, and is either 5 × 7 or 7 × 5. Let’s guess 5 rows. Then our columns are exactly those 5 letter groupings given.

1 2 3 4 5 6 7 1 E L G I O A S 2 E A N W E I S 3 R R E I R Y E 4 H G D D E T R 5 E E H D T T T

First, there are seven letters per row, and 40% of 7 is about 3, so we’d expect each row to contain about 3 vowels. Rows 1, 2 and 3 all have exactly 3 vowels, and row 5 has 2. Only row 4, with 1 vowel, hints that this rectangle is the wrong one. On the other hand, there are only 12 vowels among these 35 letters, a percentage closer to 33% than 40% so row 4 by itself is probably not a reason to discard this arrangement.

Second, column 6 has a y, a very uncommon letter. In English y appears almost exclusively as the final letter of a word, so we should not worry about what letter follows it. Using Figure 10.3, y is mostly preceded by one of the letters NBETRAL. (These are they only significant non-zero entries in the .Y column of Figure 10.2.) Can we make any of these pairs? Assuming that y is not the first letter of the row we can make three ry’s, with the column-pairs 16, 26 and 56, and two ey’s, with the column-pairs 36 and 76. (This means five pairs to check, out of a possible (7 × 6)/2 = 21.)

Next we count the total number of contacts we have for each pair of 200 CHAPTER 10. TRANSPOSITION CIPHERS

columns, using the chart in Figure 10.3.12

columns 1 6 2 6 3 6 5 6 7 6 E A 98 L A 46 G A 21 O A 13 S A 62 E I 39 A I 28 N I 40 E I 39 S I 63 R Y 19 R Y 19 E Y 16 R Y 19 E Y 16 H T 22 G T 15 D T 40 E T 80 R T 47 E T 80 E T 80 H T 22 T T 49 T T 49 totals 258 185 139 200 237

The column pairs 16 and 76 have the highest totals so one of these is probably right. The pair 16 has the largest total but this is build from two big and three small values. The pair 76, on the other hand, has four good pairs and the one given EY pair. For this reason I would probably start with 76 rather than 16, despite the slightly smaller total. 13 Since y occurs almost exclusively at the end of words we next try to see which columns could come before our column pair 76.

columns 1 7 6 2 7 6 3 7 6 4 7 6 5 7 6 ESALSAGSAOSASSA ESIASINSIESISSI REYREYEEYREYEEY HRTGRTDRTERTRRT ETTETTHTTTTTTTT

None of these look encouraging. The TTT triplets in 476 and 576 force us to reject those combinations. Similarly 276 and 376 have bad GRT and DRT triplets. Only 176 works at all, and even then the HRT is not good. We can always come back, so let’s temporarily abandon the 5 × 7 and try 7 × 5. This give the columns

1 2 3 4 5 ERHET EGIRT REWES HGITS ENDAE LEDIR ADOYT

12Note again the difference between Figures 10.3 and 10.2. When we knew we were working with y but didn’t know which letter came before it we used 10.2. Now that we have definite pairs of letters we use 10.3. 13Some authors advise taking the product of the contact values, rather than their sum. This prevents a column with a couple very large contact counts and the rest very small from “winning.” The product counts demonstrate much more clearly the superiority of the 76 combination. For simplicity, we will stick with sums. 10.8. DOUBLE TRANSPOSITION 201

Since 5 × 40% = 2 we expect about 2 vowels per row, and except for one 1 and one 3, all the rows have exactly 2 vowels, so this rectangle looks good. Starting again with the y both the 14 and 54 column-pairs fit well together and should be investigated:

columns 1 4 5 4 E E 45 T E 95 E R 175 T R 35 R E 145 S E 73 H T 22 S T 124 E A 98 E A 98 L I 52 R I 60 A Y 21 T Y 18 totals 548 503

I’d probably continue with the pair 14, except that the word there on the top row leaps off the page, as does the word today on the bottom row. From here we are quickly done. (This example provides a good example of why we frequently add nulls at the start and end of a message.)

(2) EGONH UITES ERMON QEELI HSAII TNCVF ENATT Again we have a 5 × 7. This time, besides the v, we have a q, probably giving us an easy two-column match. Since no u’s will be near the q if we use 7 rows, there must be five rows. Since a vowel must follow the qu this gives us three columns to choose from.

(3) AADOI SEEDD OGRCE TRNGE TTSES ODRSH IBMNA DGHNR ISLOA AVT (Hint: there are nulls at the beginning and/or end.)14



10.8 Double Transposition

A double transposition cipher is one of the most difficult simple ciphers to decrypt. It was used so extensively by the Unites States that it is sometimes called the US Army double transposition, and was one of the many methods used by the German Army in WWI. A double transposition is exactly what it sounds like: after using a trans- position method to encipher the message same keyword is used again to re- encipher the ciphertext. Deciphering is accomplished simply by deciphering twice. For example, Attack your left flank at once enciphered once with

14(1) Three tigers were sighted near Deli today. (2) The three e columns make this one a bit harder: The Queen’s reign came to a violent finish t. (3) sr diamond and gold are good. straight cash is even better. The sr are nulls. 202 CHAPTER 10. TRANSPOSITION CIPHERS keyword bombs gives AKLLT AUTKC TOFNN TYEAO CRFAE. Guessing that this was a 5×5 square would lead to almost immediate decryption. But if we re-encipher it with the same keyword we get AATTC LKNAA LTFEF KUOYR TCNOE. Suppose we try to decrypt this cipher, and for ease we grant ourselves the knowledge that it is 5 × 5. Arranging it gives

1 2 3 4 5 ALLKT AKTUC TNFON TAEYO CAFRE

There are only 7 vowels out of 25 letters, or 28%, which is a smaller percentage than normal, but it still stands out that there is only one vowel in each of the first two rows. Cheating (by looking at the previous paragraph) how will we guess that AUTKC is the correct ordering of the second row? The purpose of a transposition is to mix-up the position of the letters. A sin- gle columnar transposition cipher can be broken because this mixing is far from complete. The second enciphering causes many more of the letter connections to be shredded, making decryption very difficult. If the cryptanalyst has a number of messages, all enciphered with the same key, and all of exactly the same length, then, by moving from one message to another and back again, it is possible to decrypt the whole set. (This is the trick that the brilliant French cryptanalyst Georges Jean Painvin used to break the German UBCHI¨ double transposition cipher of World War I.) However, even the experts cannot routinely decrypt a single message carefully enciphered with a double transposition.

10.9 Transposition during the Civil War

The Civil War turned out to be an interesting test of Kerckhoff’s maxim that the enemy knows the system being used. Not only did the two armies speak the same language, but most of the trained officers knew each other, having attended West Point together, and some even had fought together during the Mexican–American War of 1848. We have already seen that the South put their trust largely in polyalphabetic ciphers, with rather little success. The North would turn to transposition. Anton Stager was born in 1833, in New York, and worked in a printer’s office and as a bookkeeper before becoming a telegraph operator in Pennsylvania. He was rapidly promoted and by his early thirties he was the general superintendent of the Western Union Telegraph Company, head-quartered in Cleveland. After the outbreak of hostilities in the Civil War, Governor Denison of Ohio gave him responsibility for all telegraph lines in the Ohio military district, and asked him to prepare a cipher so that the governors of Ohio, Indiana and Illinois could 10.9. TRANSPOSITION DURING THE CIVIL WAR 203 conduct secret communications with one another. This was, apparently, “the first telegraphic cipher used for war purposes” [Plum, page 44]. Later George B. McClellan, a recent railroad executive who became a gen- eral of volunteers, asked Stager to prepare a cipher for use during the campaign in West Virginia. Stager based this new cipher on his previous one. It was after- words adopted as the official cipher of the War Department [Weber]. The cipher was simply a route cipher. As J.E. O’Brien, a former US Military Telegraph operator, put it [SSA3, vol 1]

The principle of [this] cipher consisted in writing a message with an equal number of words in each line then copying the words up and down the columns by various routes, throwing in an extra word at the end of each column, and substituting other words for important words and meanings.

The cipher information was printed on cards, about 3 by 5 inches in size. Printed on the cards were the code words, called arbitrary words, the keys, called commencement words, and the nulls, called check words or blind words. The route was also indicated on the card. Successive editions of the ciphers were aided by the practical experience of the users, both in Washington and in the field. More code words, more nulls, and variations on the routes were added. Eventually the cards abandoned and pamphlets were substituted, in pocket-sized editions, the first with 16 pages, the last with 48 [SSA3]. The genesis of Stager’s system can perhaps be seen in a simple example that appears in [Myers]:

The enemy has changed his position during the night. Deserters say that he is retreating. Smith

We put it into a rectangle in the following given order: down column 1, up column 4, down column 2 and up column 4, and then add a column of nulls.

Column 1 Column 2 Column 3 Column 4 Column 5 The night. Smith the attacking enemy Deserters retreating during summer has say is position unchanged changed that he his him.

The ciphertext can then be read off:

The night. Smith the attacking enemy Deserters retreating during summer has say is position unchanged changed that he his him.

Notice the use of significant looking insignificant check words – “attacking” and “unchanged.” 204 CHAPTER 10. TRANSPOSITION CIPHERS

The early form of Stager’s ciphers was similar to the Myers example. Here is one sent at the very beginning of the war.

Example: [Plum]

Parkersburg, VA. June 1, 1861. To Maj. Gen. G.W. McClellan, Cincinnati, Ohio: Telegraph the have be not I hands profane right hired held must start my cowardly to an responsible Crittendon to at polite ascertain engine for Colonel desiring demands curse the to success by not reputation nasty state go of superseded Crittenden past kind of up this being Colonel my just the road division since advance sir kill. (Signed) F.W. Lander

With possession of the codebook we’d look up the keyword “Telegraph” to learn that this message is eight lines long, with seven columns. Further that the plaintext was inserted in the order up the 6th column, down the 1st, up the 5th, down the 2nd, up the 4th, and down the 3rd. (From now on we will abbreviate this as U6, D1, U5, D2, U4, D3.) Finally that the seventh column is filled with nulls and the message must be pulled off row by row in the usual order. To decipher, we reverse the steps. So we first write the message in 8 rows of 7 columns each, and then pull off the columns in order.15

the have be not I hands profane right hired held must start my cowardly to an responsible Crittended to at polite ascertain engine for Colonel desiring demands curse the to success by not reputation nasty state go of superseded Crittenden past kind of up this being Colonel my just the road division since advance sir kill

Up the 6th column and down the 1st begins the message:

“Sir. My past reputation demands at my hands the right to ascertain the state of the advance. Colonel Crittendon, not desiring to start, I have hired an engine to go up road. Since being superseded by Colonel Crit- tendon, [I] must not be held responsible for [the] success of this division.”



15We have always done transposition by putting the plaintext words or letters in rows in the usual order and pulling off the columns via the keyword order. This is an example of the reverse process: put the plaintext message into a rectangle via some order and then pull it off row by row, and then to decipher put the ciphertext into the rectangle in order and pull off the plaintext by the keyword ordering. 10.9. TRANSPOSITION DURING THE CIVIL WAR 205

By 1863 the ciphers had become more complicated.

Example: (From [Weber, pages 114-5].) Here is the plaintext message. Headquarters Department of the Cumberland Chattanooga, October 16, 1863 - 7 p.m. Major-General Burnside, Knoxville, Tenn.; The enemy are preparing pontoons and increasing on our front. If they cross between us you will go up, and probably we too. You ought to move in the direction, at least as far as Kingston, which should be strongly fortified, and your spare stores go into it without delay. You ought to be free to oppose a crossing of the river, and with your cavalry to keep open complete and rapid communications between us, so that we can move combined on him. Let me hear from you, if possible, at once. No news from you in ten days. Our cavalry drove the rebel raid across the Tennessee at Lab’s Ferry, with loss to them of 2,000 killed, wounded, prisoners, and deserters; also five pieces of artillery. Yours, Rosecrans Answer quick. To encipher we first choose a key: Enemy. This keyword demands that we use an array with 10 rows of 6 columns each. So we enter the first 60 words of the message: For Burnside The enemy are preparing Pontoons & increasing numbers on our front If they cross between us you will go up and probably we too . You ought to move in this direction at least as far as Kingston which should be strongly fortified and your spare stores go into it without delay You ought to be fee to Next, some giveaway words are replaced by their code equivalents as the code indicates: Burnside becomes BURTON and enemy becomes WILEY. The key also indicates that nulls are to be added at the tops and bottoms of certain columns. After doing this we have boy greatly For BURTON the WILEY are preparing Pontoons & increasing numbers on our front If they cross between us you will go up and probably we too . You ought to move in this direction at least as far as Kingston which should be strongly fortified and your spare stores go into it without delay You ought to be free to Not surely some Finally, the key tells how the plaintext is to be taken off: D3, U4, D2, U5, D1 206 CHAPTER 10. TRANSPOSITION CIPHERS and U6. This gives the ciphertext

the increasing they go period this as fortified into some be it and Kingston direction you up cross numbers Wiley boy Burton & If will too in far strongly go ought surely free without your which at ought and between on are greatly For Pontoons front you we move as be stores You Not to delay spare should least to probably us our preparing Since the entire message has not yet been enciphered, we proceed by choosing a new key and enciphering as the new key demands. In this case (since this was an actual message) Stanton and McDowell were chosen. They each represented the same system: 6 × 6 with nulls atop columns 1, 5, and 6 and below columns 1, 2 and 3. fortune the time oppose a crossing and with your RELAY to keep open complete and rapid communications between us so that we can move combined on him Let me hear from you if possible at once No news from speed this more (Relay is the code word for cavalry.) The pattern is up the diagonal from bottom right to top left, followed by D1, U6, U5, D3, U4. Putting the final portion of the message into cipher as well gives To Jaque Knoxville, Enemy the increasing they go period this as fortified into some be it and Kingston direction you up cross numbers Wiley boy Burton & if will too in far strongly go ought surely free without your which it ought and between or are greatly for pontoons front you we move as be stores you not to delay spare should least to probably us our preparing Stanton from you combinedly between to oppose fortune roanoke rapid we let possible speed if him that and your time a communication can me at this news in so complete with the crossing keep move hear once3e more no from us open and McDowell julia five thousand ferry the you must drove at them prisoners artillery men pieces wounded to Godwin relay horses in Lambs of and yours truly quick killed Loss the our minds ten snow two deserters Bennet Gordon answer also with across day E.P. Alexander, the founder of the Confederate Army Signal Corps, received this ciphertext and was asked to translate it. His account (from Gary W. Gal- lagher Fighting for the Confederacy: The Personal Recollections of General Edward Porter Alexander, pages 302-03) of this incident is quite informative: I had never seen a cipher of this character before, but it was very clear that is was simply a disarrangement of words, what may be called, for short, a jumble. Each correspondent, of course, had what was practically a list of the natural numbers, say from one up to 50, or whatever limit was used, taken in an agreed jumble, as for instance beginning 19, 3, 41, 22 &c. Then, the first word of the cipher would be the 19th of the genuine message, the 2nd cipher would be 3rd of message, the 3rd cipher the 41st, &c. 10.10. THE BATTLE OF THE CIVIL WAR CIPHERS 207

If [the jumble] were used twice or three times, I could, by comparison & trial, probably decipher the whole business. But if the jumble were not repeated, I could never decipher it without getting another message in the same jumble in order to compare the two. ... I found one pair of words which certainly belonged together, ‘Lambs’ & ‘ferry’ – for there was a ‘Lamb’s Ferry’ on the Tennessee River. But it only made the demonstra- tion absolute that the jumble ws not repeated [and so I could not make sense of it.] I afterward found that the Federals made their jumbles by means of diagrams of row & columns, writing up & down in different or- ders & then taking the words across ... They also used some blind words to further confuse the cipher. This made, indeed, a most excellent cipher, quick & easy, both to write & to decipher, which is a very great advan- tage. But there is one objection to it, in that it required a book, & that book might get into the wrong hands. 

10.10 The Battle of the Civil War Ciphers

The ciphers used during the Civil War give an interesting comparison. The North used relatively simple word transposition ciphers, which sends the plain- text, only jumbled. The South used what was thought to be the unbreakable Vigen´erecipher. Why, then, did the North enjoy more success, cryptographi- cally? There are several reasons. The North’s use of meaningful word in their reduced the number of garbles, errors, while Vigen´ereciphertexts, which have no context, had many more errors [Weber]. The extensive use of code words in Stager’s system generally prevented the South from getting any easy entries into a message. Conversely, the clumsy use of Vigen´ere(the failure to encipher the entire message, keeping word lengths) by the South gave the codebreakers of the North many clues about the message. Having a very few number of keys also contributed to the weakness of the South’s cryptography. “Judged by the standards of its own day, [Stager’s] cipher was adequate: it was not too complex to be practicable, and yet it delayed solution for a sufficient time.” [SSA2, page 20].

10.11 Summary

Transposition Ciphers receive their security by the manner in which they mix the letters of the plaintext. There are many methods for providing this mixing; route ciphers, grilles and columnar transposition being the most famous, with the latter being by far the most popular. To perform a columnar transposition first select a keyword. Write the plain- text underneath the keyword, in as many columns as the keyword has letters. 208 CHAPTER 10. TRANSPOSITION CIPHERS

To find the ciphertext pull out the columns of this array in the alphabetical order of the letters of the keyword. Deciphering is the reverse process, with the bit of complication that the length of the ciphertext and length of the keyword must first be used to determine how many letters each column contains. Recognizing transposition ciphers is easy – the frequency chart of a cipher- text appears to be that of normal English. Breaking transposition ciphers is, obviously, a matter of putting the letters back into their proper positions. For columnar transposition one first decides upon a guess of the size of the ciphertext rectangle and then creates the proposed columns of the plaintext. By knowing which letters are more likely to contact which one then proceeds by putting the columns next to their proper neighbors. The North used columnar transposition of words to great success during the Civil War. Keeping the words largely prevented -based garbles, and the extensive use of code words, for proper names and locations, prevented the South from having any easy entry into the messages.

10.12 Topics and Techniques

1. From what do transposition cipher receive their security?

2. What is the main difference between transposition ciphers and substitution ciphers?

3. What is a route cipher? What does its name mean?

4. What is a geometrical cipher?

5. What is a turning grille? Explain how to construct one.

6. Explain how to use a keyword to set up a columnar transposition.

7. In a columnar transposition, how is the plaintext entered?

8. In a columnar transposition, how is the ciphertext removed?

9. Given the number of letters in the ciphertext and the length of the key- word, how are the lengths of the columns used that transposition deter- mined?

10. How does one decipher a columnar transposition cipher?

11. How does one recognize that a cipher is a transposition?

12. How does one decide whether a ciphertext is from a substitution cipher or from a transposition cipher?

13. Explain how to break a columnar transposition ciphertext. 10.13. EXERCISES 209

14. What is a double transposition? Is it a better or a worse cipher than single transposition? Why?

15. Explain the basics of the North’s system of transposition ciphers during the Civil War.

16. Was Stager’s system more like route ciphers or more like columnar trans- position ciphers? Explain.

10.13 Exercises

1. Decipher the following geometric ciphers.

(a) SIYEE OLYMS MLYTS ESOOE SQUAR. (b) ASNRE NDPUT SELZT RESPA UTCLP. (c) ARBHUN OSOQM IEUSA ASRE.

2. Decipher the following rail fence ciphers.

(a) RIRAS ALOD. (b) ABDIB BREWRE. (c) FET ECPSS NO.

3. These ciphers are based on the 12, 3, 6 and 9 o’clock positions on a clock. Decipher them.

(a) HORPCK TEMSERUHEOC EUATL. (b) OHHLL FRWOTELTLS WNBO. (c) HSUS URINESCO OMTED.

4. Encipher the following short messages using a columnar transposition based on the given keyword.

(a) let us hear from you at once concerning the jewels. Keyword jewels. (b) rft diamonds must be received by thursday morning ee. Keyword wealth. (c) rubies hidden in oatmeal containers in pantry floor. Keyword cereal. (d) sapphires were fake so was painting weve been setup flee. Key- word manetmonet. 210 CHAPTER 10. TRANSPOSITION CIPHERS

5. Decipher the following short messages using columnar transposition based on the given keyword. (a) LTIAD IEEUU NRIER SRANI AEROR XEILO VACEE MTBOR VMUUT CO. Key- word SHARP. (b) SMAES LEHEH AUATN OIIYW CTCET RTAED SOOEK IHNOL MROGI ARGUB MS. Keyword OPENING. (c) BLWHS NNCTI EQTNF SEOET JEWRR EUIOT RTPSA HDUSE EDAEE XTOTH US. Keyword BROADWAY. (d) EWLMR RNIIH VOLEO CINLE RNAPD OGGTO CBWTT ESBHT POLIT BRSYT LE. Keyword QUICK. 6. Decrypt the following short messages enciphered using columnar transpo- sition on full rectangles. (a) OQYHN AGEYE LGAPE SEMIN OLTAF VTUCP HDIOE OUITR RUUEY SSU. (b) AEWFU HRSLO SOLTE RIENA QTETS NTN. Hint: nulls at end. (c) HHTHM HMETS GDESU MSRED DNDFE DPXRE ODOON ITENE EIHNX EOTHI LELND DAEOX TSTOR TELNH XIAJA RLPOT. (d) LTESU AAGEO NEORE OSAGG LWERN IGDSD TNLIA TRIMD ODLLR OOTND GPNE. 7. [Hitt, pages 31–33] Since the ciphertext of transposition ciphers consist of the same letters as the plaintext, knowledge of cribs – words suspected of appearing in the plaintext – is especially helpful. Break the following 108 letter transposition cipher in which the word villa appears. HIIGF TNGHI NTCVN IEIOT CYIFY LHAEA ESNBA EEEEN RWGBN YDELR OAESG RNEBO VNLDA ICAOA LCNDT IRGVA CDOIE SEREC DVPEI AFIFL RINEH ETT 8. [HITT, pages 34-35] A weak form of double transposition uses two key- words in the following way. As usual, put the plaintext into a rectangle under the (first) keyword. The put the (second) keyword alongside the first column. Mix the plaintext by reordering the columns according the first keyword and then reordering the rows according to the second key- word. For example, to encipher radios may now be used xx with keywords SIGNS S r a d i o SIGNS and SEND, first set up the rectangle E s m a y n N o w b e u D s e d x x IGNSS S a d i r o The first rearrangement gives E m a y s n N w b e o u D e d x s x 10.13. EXERCISES 211

IGNSS E m a y s n and the second gives D e d x s x N w b e o u S a d i r o So the ciphertext is maysn edxsx wbeou adiro. The following message was enciphered using this technique. Decrypt it. Hint: The size is 7 × 10. WVGAE EGENL TFTOH TEIEF RBTSE INENG ONWRM GXIXN GOITN ROMRO ESPAL HNEAC UDNNH DERME 9. [HITT PAGE 47] Hitt considers the following slightly tricky example. LT. J. B. SMITH, Royal Flying Corps, Calais, France. DACFT RRBHA MOOUE AENOI ZTIET ASMOS EOHIE YOCKF NOHOE NOUTH OMEAH NILGO OSAHU OHOUE APCHS TLNDA CFTEN INTWN BAFOH GROHT AEIOH ABRIS ODACF TRREN OSTSM AYBIS DFTEN EFAPH OSMNI ZTIEA HLILL TWSOU GDENO UTHOM EAHBH AMOOU EAYOE QISUU OLEHA DENOE NHOOQ OBBOR TSLHO BAHEO UBHOB IHTSW ENOHO PAHIH ITUAS BIHTL Graham-White

Is this a transposition or substitution cipher? Explain your answer. For help, here is the frequency count.

22 11 5 6 23 7 3 26 16 0 1 8 7 15 37 3 2 7 14 18 11 0 3 0 3 2 ABCDEFGHIJKLMNOPQRSTUVWXYZ

10. [Plum, page 51] General Halleck wished the following cipher to be sent.

Washington, 10:30am, July 15, 1863 For Genl S.A. Hurlbut, Memphis: If Gen. W.T. Sherman’s movements have sufficiently occupied the enemy to render your line safe, send all the forces you can spare to Brig.-Gen. Prentiss to operate on Price’s rear if he advances toward Missouri. H.W. Halleck, Maj.-Gen’l.

His clerk, T. T. Eckert, chose McClellan as the indicator. This specifies the codewords specified in Figure 10.4. The message is to be pulled off by first starting at the bottom right-hand corner and moving up the diagonal, followed by D1, U6, D2, U5, D3, U4. In addition, there is be a null added after every six words. That is, after the diagonal, after D1, after U6, after D2, etc.. Please encipher the message. 212 CHAPTER 10. TRANSPOSITION CIPHERS

Plain Code Hurlbut bear Sherman Blubber movements zebras enemy wiley Prentiss valley Price query rear world advances wafers Missouri chorus 10:30am Clara Halleck applause

Figure 10.4: Codewords for keyword McClellan.

11. Cipher No. 7 was used early in the Civil War.

To Louisville, Ky. Sept 29, 1862 Colonel Anson Stager, Washington16 Austria await I in over to requiring orders olden rapture blissful for your instant command turned and instructions and rough looking fur- ther shall further the Camden me of ocean September poker twenty I the to I command obedience repair orders quickly pretty Indianapo- lis your him accordingly my fourth received 1862 wounded nine have twenty turn have to to to alvord hasty. (signed) William H. Drake

The keyword Austria indicates that this has nine lines, and is to be read U1, D6, U2, D3, U5, D4. The only codeword in this message is “Camden,” Major General G. H. Thomas. By this time Stager had regularized the location of the blind words: always insert one at the end of a column being pulled off. Decipher the message. 12. [Antonucci] Stager’s Cipher No. 12 was used from September 1862 until August 1864. It had many code words, for times of days, prominent officers, place names, and even commonly used words and phrases. For example, in the following message Arabia is Major General Don Carlos Buell, Adam is Major General Henry Halleck, and Lincoln is Louisville, KY.

Louisville, Ky. Sept 30, 1862 To George C. Maynard, Washington Regulars ordered of my to public out suspending received 1862 spoiled thirty I dispatch command of continue of best otherwise worst Arabia my command discharge duty of my last for Lincoln September period your from sense shall duties the until Seward ability to the I a removal

16Since Stager was the head of the Military Telegraph Office in Washington, during some years most messages to Washington were sent to his name. Likewise, Drake must be the telegraph operator stationed with the commander the message is for, Major General D.C. Buell in this case. 10.13. EXERCISES 213

evening Adam herald tribune. (signed) Philip Bruner This response to Exercise 11 uses the keyword Regulars to specify a rectangle of nine lines and five columns. The ordering is U4, D3, U5, D2, U1. Decipher the message. (Hint: remember that each column ends with an extra check word that is then ignored.) 13. [Plum] Cipher No. 4 was the last one of the war, only going into service on March 23, 1865. By now the codesheets were really code books. This one had 1608 codewords, many key words and routes, and even it was in code. Page seven in its entirety is:

3 7 4 2 8 10 14 12 13 11 9 6 5 1

Bedroom. 1. Lazy. — Blonde. 11. Liniment. Bedstead. 2. League. — Bloody. 12. Lion. Beverage. 3. Leather. — Bosom. 13. Liquid. Beyond. 4. Legacy. — Boy. 14. Loafer. Big. 5. Lemon. — Bread. 15. Log. Bill. 6. Lesson. — Bride. 16. Lomax. Billiards. 7. Let. — Brush. 17. Long. Bilious. 8. Library. — Bulk. 18. Lucky. Blanket. 9. Life. — Bushel. 19. Luscious. Bliss. 10. Linen. — Buxom. 20. Luxury.

Words are the line indicators, only one was used (unless the message was more than 20 lines, in which case others were added.) The route was given, reading left-to-right, using only the top and bottom lines. So here the route is up the 6th, down the 3rd, up the 5th, down the 7th, up the 1st, down the 4th, down the 2nd. The middle lines are only to confuse the enemy. (Remember that there is an extra word at the end of each column.)

Word Code Word Code . flea, hound battle knit , ass, bat commander locust , bear, bitch corps Madrid , cat the enemy village 3 Madison fight optic 15th Brown fighting oppressing 18 Norris in the quail 60 Knox of the serenade July Steward one Harry Sunday Tyler over skeleton Couch ax night Rustle A. Lincoln Adrian relieved trammeled Meade Bunyan the river turnip Smith children signature upright Washington, D.C. Incubus 214 CHAPTER 10. TRANSPOSITION CIPHERS

Washington, D.C. To A. Harper Caldwell, Cipher Operator, Army of the Potomac: Blonde bless of who no optic to get and impression I Madison square Brown cammer Toby ax the have turnip me Harry bitch rustle silk Adrian counsel locust you another only of children serenade flea Knox County for wood that awl ties get hound who was war him suicide on for was please village large bat Bunyan give sigh incubus heavy Norris on trammeled cat knit striven without if Madrid quail upright martyr Stewart man much bear since ass skeleton tell the oppressing Tyler monkey. (signed) D. Homer Bates17

Please decipher. 14. (a) [Bates] In November 1862 David Homer Bates, one of Lincoln’s tele- graph operators in Washinton, was worried that a Confederate oper- ator had tapped the telegraph line leading from Washington to Major General Burnside’s location in Virginia. So rather than using one of their usual cipher systems he sent the following message. Washington, D.C. November 25, 1862. BURNSIDE, Falmouth, Virginia; Can Inn Ale me withe 2 oar our Ann pas Ann me flesh ends N.V. Corn Inn out with U cud Inn heave day nest Wed roe Moore Tom Darkey hat Greek Why Hawk of Abbott Inn B Chewed I if. BATES Can you make sense of it? Hint: read it backwards. (b) The same technique was used in this message from the very end of the war, supposedly to conceal the news from the cipher operators who might happen to see it while relaying it to Washington. City Point, Va., 8:30 A.M., April 3, 1865 TINKER, War Department: A Lincoln its in fume a in hymn to start I army treating there possible if of cut too forward pushing is He is so all Richmond aunt confide is Andy evacuated Petersburg reports Grant morning this Washington Secretary War. BECKWITH What does the message say?

15. On June 1, 1863 President Lincoln sent the following message GUARD ADAM THEM THEY AT WAYLAND BROWN FOR KISSING VENUS CORRE- SPONDENTS AT NEPTUNE ARE OFF NELLY TURNING UP CAN GET WHY DETAINED TRIBUNE AND TIMES RICHARDSON THE ARE ASCERTAIN AND YOU FILLS BELLY THIS IF DETAINED PLEASE ODOR OF LUDLOW COMMISSIONER 17David Homer Bates, died 1926, was from 1861–1866 a telegrapher and cipher clerk in the telegraph office in the old War Department Building [SSA, vol 1]. His war remembrances, Lincoln in the Telegraph Office, make for interesting reading. 10.13. EXERCISES 215

Several codewords were in use: VENUS for colonel, WAYLAND for captured, ODOR for Vicksburg, NEPTUNE for Richmond, ADAM for President, NELLY for 4:30pm. The message was also enciphered as a complete rectangle. With these hints (that the South didn’t have), can you decrypt the mes- sage?

16. Break the double transposition UELOB DARES RNIOI STOBE LIYVA APSRN TALWY ALE Hint: the word double appears in it.

17. [Hassard] One of the favorite enciphering techniques during the contro- versy following the presidential election of 1876 was transposition ciphers on words with the important tell-tale words (people’s names, state names, etc.) being replaced by (often) proper nouns, usually geographical names. (In the messages below, however, the names of the states in question – Florida, Louisiana and Oregon – are un-encrypted.) Then the entire mes- sage was transposed, using words as units. Fortunately for the decrypters, there were a very large number of dis- patches. A very common word was Warsaw, and a lucky guess confirmed this stood for telegram. The short ciphertext

Warsaw they read all unchanged last are idiots cant situation

is then fairly easily seen to read

Cant read last telegram. Situation unchanged. They are all idiots.

This provides the key (47296381015), meaning, take the 4th work of the plaintext, put it first in the cipher, followed by the 7th word or the plain- text, then the 2nd word, and so on. This ordering helped decrypt most messages of ten words, but not others. However, it was noticed that the telegrams always came in word lengths that were multiples of five. Perhaps each length had its own key? If so, multiple anagramming on a selection of ciphers of the same length should lead to their decryption. In Figure 10.5 are five messages, written in column form, with their words numbered. Dispatch 1 has adjourned until to-morrow. The only pos- sible noun is London, so 29, 27, 19, 28 must be part of the key. Using this order in Dispatch 2 gives us out if a. Looking at the words in Dis- patch 2, intend to count us out if a seems reasonable. So now the key must contain 25, 5, 10, 29, 27, 19, 28. This is an example of multiple anagramming – using a reasonable ordering of one text to help understand the ordering of another, and visa versa. Use multiple anagramming to decrypt this message. 216 CHAPTER 10. TRANSPOSITION CIPHERS

Num Dispatch 1 Dispatch 2 Dispatch 3 Dispatch 4 Dispatch 5 1 Me Very Figure To Rochester 2 you news France situation of 3 do say capture prospects answer 4 to Copenhagen and ans America 5 did to over Africa yesterday 6 to from what desperate to-day 7 question can see intend understands 8 when Florida answer Thames Thomas 9 you you Europe soon my 10 you count Moselle Europe Africa 11 to much Russia report about 12 morning in shall every but 13 asked be?he little mischief it 14 want give and the first 15 where what appearances Warsaw avail 16 go Louisiana about in at 17 supposed am best dispatch my 18 this placed hope in nothing 19 until if Glasgow acting Bavaria 20 come mixed will this as 21 to-night insure up will will 22 important London keep stall Copenhagen 23 and Oregon Oregon all once 24 answer few America concert fear 25 here intend be morning reported 26 Warsawed things can parties small 27 adjourned out Potomac France by 28 to-morrow a behind in and 29 London us Edinburgh and satisfied 30 you here I received hope

Figure 10.5: Five similar dispatches from 1876.

18. Robert Patterson was an immigrant to the United States from Ireland, eventually becoming professor of mathematics and vice-provost of the newly established University of Pennsylvania. At some point he made the acquaintance of Thomas Jefferson, becoming vice-president of the Amer- ican Philosophical Society while Jefferson was its president. In 1805 Jef- ferson appointed him the Director of the US Mint. On 19 December 1801 he wrote to Jefferson

“A perfect cypher should possess the following properties: 1. It should be equally adapted to all languages. 2. It should be easily learned & retained in memory. 3. It should be written and read with facility & dispatch. 10.13. EXERCISES 217

4. (which is the most essential property) It should be absolutely inscrutable to all unacquainted with the particular key or secret for decyphering Patterson believed he had discovered one method that satisfied his first three requirements and was “absolutely impossible, even for one perfectly acquainted with the general system, ever to decypher the writing of an- other without his key.” In the same letter he sent what we will call Patterson’s Cipher. As revised by Jefferson and himself, and slightly modified to fit our manner of doing things, the cipher works as follows. (a) Choose two keywords of the same number of letters. Number the letters alphabetically: B e n j a m i n 2 3 7 5 2 6 4 8 F r a n k l i n 2 7 1 6 4 5 3 8 (Repeated letters are thought of as occurring in a “second” alphabet.) (b) Write the message (ignoring non-alphabetic characters and spaces) in rows, using a consistent number of letters per row (the last row may have fewer). (c) Cyclically number the columns in order, with the largest number being the number of letters in the keyword. (d) Take the columns off in the numerical order indicated by the line keyword. Before transcribing them in rows insert as many nulls as the letter keyword indicates. Add some number of nulls at the end. As Patterson further explained

It will be proper that the supplementary letters used at the beginning and end of the lies, should be nearly in the same relative proportion to each other in which they occur in the cypher itself, so that no clue may be afforded for distinguishing between them and the significant letters. On calculating the number of changes, and combinations, of which the above cypher is susceptible even supposing that neither the number of lines in a section, nore the number of arbitrary letters at the beginning of the lines, should ever exceed nine, it will be found to amount to upwards of ninety millions of millions ... nearly equal to the number of seconds in three millions of years! Hence I presume the utter impossibility of decyphering will be readily acknowledged. Jefferson responded on 22 March, 1802 that

I have thoroughly considered your cypher, and find it is much more convenient in practice than my wheel cypher, that I am proposing it to the secretary of state for use in his office 218 CHAPTER 10. TRANSPOSITION CIPHERS

and on 18 April 1802 that although the cypher was difficult to understand, once understood it would be “the easiest to use, the most indecypherable, and varied by a new key with the greatest facility of any one I have ever known.” Although the method did offer splendid security, it was extremely time consuming, and there is no evidence it was ever actually used. The example given by the Franco-phile Jefferson is about the success of Napoleon.

wsataispapsevhhrntpvntutano eaaoobceuterdhecebnsbsabdepdno ehnoeethhnpalaernnotntutioh nemeyeesannihattoaatieeondoi rtlrcwreheguriljnesydothdsear seeobinleihtsheeenaeeartanrm arpenweertdtaecenhyanoabi uvclstihiedcfinsxnahenytenrf sdtrodiesuaunosiahriaeheirp stoetlsoaoadhsettieuahrdeiuy ftshopfcfebeintrtttehoxeoypu perterspiaeeoockrycfnnhtefeyo tlrlpwruuwipsattwhhnment erretealeiwarwsaotditofnge wharcrcnttcwiglnenrdhfowsh ettdneelkhosuhaxhtjoruiyi santrhtdiwufgethatdfsltm adtrodiiegaaiwtistnrhesrnct nonoaeiedlinysyioseagodllann sftaewtroiwntwanonxyoureh The keys are keylines 57328162 and keyletters 81393420. To decipher – (1) cross out the known nulls at the start of each line. (2) Number the lines using the numbers of the line key. (3) Pull out the lines in order of key, putting them into columns. (4) Read the message (the ending nulls will be clear). Chapter 11

Knapsack Ciphers

Merkle had great confidence in even the single iteration knapsack system and posted a note on his office offering a $100 reward to anyone who could break it. ...[After Shamir broke it,] Merkle, always one to put his money where his mouth was ... paid Shamir the $100 is prize money. ... Merkle’s enthusiasm [wasn’t] dampened. He promptly raised his bet and offered $1000 to any- one who could break a multiple iteration knap- sack. It took two years, but in the end, Merkle had to pay. Whitfield Diffie The First Ten Years of Public-Key Cryptography

In cryptography one takes plaintext (that is easy to read) and attempts turn it into ciphertext (that is apparently hard to read). In other words, cryptography involves an easy and hard version of the same problem. There are several mathematical problems that also have both an easy version and a hard version. Can the parallel between an easy/hard math problem and the easy/hard to read text be made useful? As it will turn out, yes.

11.1 The Knapsack Problem

One day, whilst out hiking, you discover a cave containing many gold bars. Being ethically challenged, you wish to carry out as much gold as possible. Unfortunately the owner of the gold, a dragon, will probably return soon and your backpack (knapsack) can only carry so much weight without breaking. How can you decide which gold bars to take? This is the Knapsack problem. (To fit the knapsack problem into the cave-gold-dragon analogy, you can carry however much your knapsack can hold, you have time to make one trip, and, since the dragon will not the mistake of leaving his/her cave unguarded again, one trip only.) 219 220 CHAPTER 11. KNAPSACK CIPHERS

Example: There are 8 piles of gold bars. Every bar in a given pile weighs the same. Each in pile 1 weighs 18oz. In pile 2 each weighs 29oz. Pile 3: 34oz, Pile 4: 41oz, Pile 5: 67oz, Pile 6: 88oz, Pile 7: 101oz and Pile 8: 119oz. Your backpack can carry a maximum of 300 oz (about 19 lbs). Which bars and how many should you pick?1 

11.2 A Related Knapsack Problem

An apparent simplification is to assume the cave contains exactly one bar of any particular weight. So you only need to decide “yes” or “no” to each bar, rather than deciding how many bars from each pile to take. We also modify the problem to demand an exact amount of gold.

Examples:

(1) The weights of the bars are 4, 7, 12, 19, 22, and 25 lbs. If the total amount demanded is 55 lbs., this weight can be supplied: 4 + 7 + 19 + 25 = 55. However, there is no way to choose bars of total weight exactly 50 pounds.

(2) The weights of the bars are 27, 31, 41, 48, 55, 59, 62, 65, 73, and 77. Total amount wanted is 257. Is this weight possible? What about 364?2



This problem may seem easier than the first, but it is still quite hard. Why? Think of the number of possibilities that must be considered. For each bar we must decide whether we want it or not:

For one weight, say 27 lbs.: Do we want it? yes or no? (2 = 21 choices).

For 27 and 31: yes/no and yes/no, or yy, yn, ny or nn (4 = 22 choices).

For 27, 31, and 41: y/n and y/n and y/n gives yyy, yyn, yny, nyy, ynn, nyn, nny, and nnn (8 = 23 choices). By the time we have 5 weights we must make 25 = 32 choices, 8 weights de- mand 28 = 256, and 10 weights, as in the last example, leads to 210 = 1024 possibilities, too many for a quick answer if checking by hand. In fact, if there were 100 bars to choose from (a moderately rich dragon, in other words), there would be

2100 ≈ 1, 000, 000, 000, 000, 000, 000, 000, 000, 000, 000

1I can make 295 as 2 × 18 + 2 × 34 + 3 × 41 + 67. Can you do better? 2255 is possible (in seven different ways!) but 364 is not. 11.3. AN EASY KNAPSACK PROBLEM 221 possible choices, far too many to check even by computer. And maybe none of them give the correct total anyway!! This is a real needle-in-a-haystack problem: there are lots of easily found potential answers, it is simple to determine whether a potential solution is indeed correct, but there is no apparent way to focus in on a solution except to try all the possibilities.

11.3 An Easy Knapsack Problem

We have now seen a “hard” mathematical problem. What is the easy version? Assume the numbers (or weights) are super-increasing, which means that when put into increasing order, the next number on the list is always larger than the sum of all the previous ones.

Example: Numbers are 2, 3, 6, 13, 27, and 55. 3 > 2 6 > 5 = 2 + 3 13 > 11 = 2 + 3 + 6 27 > 24 = 2 + 3 + 6 + 13 55 > 51 = 2 + 3 + 6 + 13 + 27. So these numbers form a super-increasing set of numbers. 

Why does the numbers being super-increasing make the knapsack problem easy? Because now the greedy algorithm will work – look over the bars (or weights, or numbers), starting with the largest and work toward the smallest, always taking the largest one that will fit into your knapsack.

Examples:

(1) Can the total 90 be made using {2, 3, 6, 13, 27, 55}? We put the amount that is still needed in parentheses. 90 = (90). Still need 90, so we will use 55. 90 = (35) + 55. Used 55, still need 35 more. So we will use 27. 90 = (8) + 27 + 55. Used 27, still need 8 more. 13 > 8 so don’t use 13. 90 = (8) + 27 + 55. Didn’t use 13, still need 8. Use 6. 90 = (2) + 6 + 27 + 55. Used 6, still need 2. 3 > 2 so don’t use 3. 90 = (2) + 6 + 27 + 55. Didn’t use 3, still need 2. Use 2. 90 = 2 + 6 + 27 + 55. Done.

(2) Same weights. Demand is for 77. 77 = (77). Yes to 55. 77 = (22) + 55. No to 27. Yes to 13 77 = (9) + 13 + 55. Yes to 6. 222 CHAPTER 11. KNAPSACK CIPHERS

77 = (3) + 6 + 13 + 55. Yes to 3. 77 = 3 + 6 + 13 + 55. Done. Found a total that works.

(3) Same weights. Demand is for 81. 81 = (81). Yes to 55. 81 = (26) + 55. No to 27. Yes to 13. 81 = (13) + 13 + 55. Yes to 6. 81 = (7) + 6 + 13 + 55. Yes to 3. 81 = (4) + 3 + 6 + 13 + 55. Yes to 2. 81 = (2) + 2 + 3 + 6 + 13 + 55. Done. We didn’t find a total that was correct, and since we can only use each weight once, this means there is no solution.



In general, when the numbers or weights are super-increasing, the greedy method will find a solution when there is one, and will indicate that no solution is possible when it is impossible.

Example: Weights 3, 7, 12, 24, 49, 104, and 215. Which of the totals 298, 421, 358 and 311 can be found? 3 

Thus, if the weights are super-increasing the knapsack problem is very easy. If they are not, it can be very very difficult. Next we transform the easy knapsack problem into (apparently) a hard one.

Example: Consider the five weights W1 = 3, W2 = 7, W3 = 12, W4 = 24 and W5 = 49. These are in super-increasing order and so form the weights of an easy knapsack problem. For example, can these weights produce the total of 80? Yes: 80 = 49 + 24 + 7. Now let e = 39 and P = 101. e is our “enciphering” multiplier. If we multiply the W ’s by e modulo P we obtain new weights, U1 through U5:

W1 × e = 3 × 39 ≡ 16 ≡ U1 (mod 101)

W2 × e = 7 × 39 ≡ 71 ≡ U2 (mod 101)

W3 × e = 12 × 39 ≡ 64 ≡ U3 (mod 101)

W4 × e = 24 × 39 ≡ 27 ≡ U4 (mod 101)

W5 × e = 49 × 39 ≡ 93 ≡ U5 (mod 101)

The weights 16, 71, 64, 27 and 93 are not in super-increasing order! Can the total 90 (= (80 × 39)%101) be produced from them?4 We have succeeded in

3Only 298 and 358 are possible. 4Yes, but only modulo 101: 90 = (93 + 27 + 71)%101. 11.4. THE KNAPSACK CIPHER SYSTEM 223 turning the easy knapsack problem into a hard one. Further, if we use the Euclidean Algorithm to solve 39 × d ≡ 1 (mod 101) for d = 57, then we can turn the hard problem back into the easy one, for then multiplying the U’s by d will give us the W ’s back. 

This example suggests somehow using the total weight to send messages. To produce 80 we used W2, W4 and W5, hinting at the pattern no-yes-no-yes-yes or nynyy or 01011. What can 01011 mean?

a = 00001 g = 00111 m = 01101 s = 10011 y = 11001 b = 00010 h = 01000 n = 01110 t = 10100 z = 11010 c = 00011 i = 01001 o = 01111 u = 10101 d = 00100 j = 01010 p = 10000 v = 10110 e = 00101 k = 01011 q = 10001 w = 10111 f = 00110 l = 01100 r = 10010 x = 11000

Figure 11.1: Binary Equivalents for the Alphabet

11.4 The Knapsack Cipher System

Many modern ciphers are built around a mathematical problem that is (hope- fully) impossible to solve unless you have some special information that makes the problem very simple. One of the first was the suitably named Knapsack Cipher System. It was invented by Ralph Merkle and Martin Hellman [MH].5 In this cipher the intractable problem is the general knapsack problem, and the simple version is the super-increasing knapsack problem. The Knapsack Ci- pher attempts to exploit this method for turning the simple problem into an apparently impossible one. The first four steps of the set-up we’ve already seen. The final two make this into what was then a radically new type of cipher.

5M. E. Hellman and R. C. Merkle, received U.S. Patent 4,218,582, filed October 6 1977, issued August 19 1980 for “Public Key Cryptographic Apparatus and Method.” It expired in 1997. 224 CHAPTER 11. KNAPSACK CIPHERS

Knapsack Cipher Setup:

1. Pick a set of super-increasing numbers: W1, W2, W3, W4, W5 and P . 2. Pick e < P with gcd(P, e) = 1. Use the Euclidean Algorithm to find d such that e × d ≡ 1 (mod P ).

3. Compute the U’s: Ui = e × Wi%P . 4. Make the U’s public. Keep the W ’s, e, d and P secret. To Encipher: 1. Turn each letter of the message into its binary equivalent (a1, a2, a3, a4, a5). 2. Use the U’s to find the enciphered number M, of each letter:

M = a1 · U1 + a2 · U2 + a3 · U3 + a4 · U4 + a5 · U5.

To Decipher: 1. For each received cipher number M, compute N = (d · M)%P .

2. Solve N = x1 · W1 + x2 · W2 + x3 · W3 + x4 · W4 + x5 · W5 for the x’s.

3. Convert (x1, x2, x3, x4, x5) into its letter equivalent.

Examples:

(1) Use the weights U1 = 16, U2 = 72, U3 = 64, U4 = 27 and U5 = 93 to encipher swim.

s = 10011 → 1 · U1 + 0 · U2 + 0 · U3 + 1 · U4 + 1 · U5 = 16 + 27 + 93 = 136.

w = 10111 → 1 · U1 + 0 · U2 + 1 · U3 + 1 · U4 + 1 · U5 = 16 + 64 + 27 + 93 = 200.

i = 01001 → 0 · U1 + 1 · U2 + 0 · U3 + 0 · U4 + 1 · U5 = 72 + 93 = 165.

m = 01101 → 0 · U1 + 1 · U2 + 1 · U3 + 0 · U4 + 1 · U5 = 72 + 64 + 93 = 229.

The ciphertext message is 136, 200, 165, 229. (2) Decipher 136, 200, 165, 229. Of course, we know the answer. But how could we find it if we didn’t already know it? First, if we didn’t know e and P (the private key) but 11.4. THE KNAPSACK CIPHER SYSTEM 225

did know the W ’s, we would be stuck solving a hard knapsack problem. This hints that this cipher is a good one. However, if we do know e = 39 and P = 101 we may use the Euclidean Algorithm to find that d = 57 is solution to 39 × d ≡ 1 (mod 101). Then, as in Decimation Ciphers, since multiplying by e enciphered, multiplying by d must (mostly) decipher. Starting with 136, first multiply by d:(d × 136 = 57 × 136)%101 = 76. Then write 76 as a sum of the original W ’s:

76 = 3 + 24 + 49

= 1 · W1 + 0 · W2 + 0 · W3 + 1 · W4 + 1 · W5 → 10011 = s.

So 136 → s. For the other letters we do the same. First, 190 → d · 190 = (57 · 190)%101 = 88, and then

88 = 3 + 12 + 24 + 49

= 1 · W1 + 0 · W2 + 1 · W3 + 1 · W4 + 1 · W5 → 10111 = w.

So 88 → w. Similarly, 43 → i and 45 → m.



This cipher system (also known as the Merkle–Hellman knapsack to differ- entiate it from other knapsack systems is a public key cipher. We haven’t seen any public key ciphers since Section 3.6, so let us briefly remind ourselves of the advantage of such a cipher. Each person who wishes to be part of a conversation may compute and make public a set of U’s, as in the Knapsack Cipher Setup. As we will see, this allows anyone to send them a secret message, even though they have never before communicated. There is no need to somehow pass a secret private key: just look up the person’s public key and start enciphering!

Examples: Using the values W={5, 7, 15, 30, 59}, P = 179, e = 33:

(1) Encipher dark cave. (2) Decipher 302, 260, 294, 157, 417, 459, 260, 294, 252, 52, 294, 417, 302. 6



Now with five weights the hard knapsack problem only has 25 = 32 possible answers to check, and so it is not really very hard. (In the previous example, we could have determined that 136 consisted of U1, U4 and U5 with only a little

6(1) 137, 157, 260, 304, 252, 157, 397, 294. (2) treasure chest. 226 CHAPTER 11. KNAPSACK CIPHERS trial-and-error.) But it is easy to use the Knapsack System as a polygraphic cipher.

Examples:

(1) Use the weights {3, 4, 7, 15, 30, 47, 93, 279, 466, 749}, and the values e = 211 and P = 1507 to encipher swim as pairs of letters. First, we compute the U’s:

U1 = (3 · 211)%1507 = 633

U2 = (4 · 211)%1507 = 844

U3 = (7 · 211)%1507 = 1477

U4 = (15 · 211)%1507 = 151

U5 = (30 · 211)%1507 = 302

U6 = (47 · 211)%1507 = 875

U7 = (93 · 211)%1507 = 32

U8 = (279 · 211)%1507 = 96

U9 = (466 · 211)%1507 = 371

U10 = (749 · 211)%1507 = 1311

Then we encipher as before, just with more weights:

sw = {10011}{10111} = 1001110111

→ 1 · U1 + 1 · U4 + 1 · U5 + 1 · U6 + 1 · U8 + 1 · U9 + 1 · U10 = 633 + 151 + 302 + 875 + 96 + 371 + 1311 = 3739

Similarly, im becomes 2585.

(2) Decipher 2585. We first need the deciphering number. From the Euclidean algorithm d = 50 is the solution to (d × 211)%1507 = 1. Next, 2585 becomes (d × 2585)%1507 = 1155. Finally, write 1155 as a sum of the original weights:

1155 = 4 + 30 + 93 + 279 + 749

= U2 + U5 + U7 + U8 + U + 10 = 0100101101 = {01001}{01101} → im

 11.5. PUBLIC KEY CIPHER 227

Even with 10 weights our adversary needs to check at most 210 = 1024 possible combinations to determine each piece of plaintext, long by hand but quick by computer. Suppose, however, we used 100 weights. We could then encipher and decipher 20 = 100/5 letters at a time. Further, once we have computed the disguised weights, each block of 20 letters is enciphered with only 20 quick additions. Likewise, deciphering involves only one modular mul- tiplication and then a quick greedy algorithm, per block of letters. So enci- phering and deciphering are very quick. On the other hand, there are 2100 = 1267650600228229401496703205376 ways for the 100 weights to be combined. Even with the use of a a supercomputer, our adversary will be unable to deter- mine the meanings of the blocks of letters by brute force.

11.5 Public Key Cipher

This is our first real example of a Public Key Cipher.7 Modern companies and governments must strive to overcome not only the breakability of their ci- phersystems but also the key-management problem: there are just too many people and to whom they’d like to send messages that doing all of (1) meeting with everyone who you’d like to send messages to and agreeing upon a cipher- system and a private key, (2) keeping all of that information very secret (so no one who shouldn’t learns it), and (3) keeping all that information easily acces- sible (so you can send messages back and forth), is very difficult. “It’d sure be nice”, someone thought one day, “if there was a secret message system that worked just like the phone book: you just look up someone’s cipher information and send them a message.” 8 It seems contradictory, making public the very information used to send secret messages. But if it were possible, difficulties (1) and (3) would vanish.9 The knapsack cipher method was the first example that seemed plausible to use. Once everyone made their U’s public, anyone could send anyone else a message and the only items needed to be kept secret were the individual’s W ’s, e, d and P . Unfortunately, what made this method into a viable method didn’t work. Multiplying by e is meant to destroy the super-increasing nature of the W ’s but doesn’t completely do it. Using some fancy Linear Algebra [Shamir] it is possible to recover the original weights even without knowing e, d or P . The knapsack ciphers seemed the wave of the future, but suddenly became merely an interesting relic from the past.

7In Chapter 3 we discussed Koblitz’s Kid-RSA. It is only a toy system, never used. The Knapsack Ciphers, however, were (nearly) very important. 8The someones were Martin Hellman, a Stanford professor of Electrical Engineering, and Whitfield Diffie, then (1976) an undergraduate. 9As a very simple example (due to Saloma), think of a telephone book. To call someone on the phone all you need to know is their name: their telephone number you can always look up. Similarly, once you’ve gotten a telephone number anyone, including people you’ve never met, can call you up. 228 CHAPTER 11. KNAPSACK CIPHERS

11.6 Summary

In the past 25 years it has become fashionable to take “intractable” mathemati- cal problems and try to turn them into cryptosystems. One of the first examples of a problem put to such use was the Knapsack Problem. In its full general- ity (from a large set of items choose some subset that maximizes value while keeping the total weight under some given bound) the Knapsack Problem, also known as the Subset Sum Problem, is known to be such an intractable problem. In 1977 Merkle and Hellman announced a cipher system based on a version of the Knapsack problem. It had the advantage of being a public key system, plus was relatively fast. This system starts with a set of super-increasing weights (each successive weight is larger than the sum of all the previously chosen ones). After a modular multiplication (and a permutation of their order) the weights are made public. To send a binary message send the sum of the weights cor- responding to the 1’s in the message. The creator of the system can undo the modular multiplication, and so to read the message needs only to solve the sum-subset problem for a super-increasing set, which is easy. It was quickly suspected that hiding the super-increasing nature of the weights by a modular multiplication and permutation was not really sufficient, and this proved to be the case. By 1984 most forms of the Knapsack Ciphers had been shown to be insecure. Nonetheless, they provide a lovely case study of a modern cryptosystem.

11.7 Topics and Techniques

1. What is the general Knapsack Problem? What is the meaning of “knap- sack” in the name?

2. What is super-increasing?

3. Why is a Knapsack Problem with super-increasing weights easy to solve? How to solve it?

4. How can a set of super-increasing weights be modified into a set of not super-increasing weights?

5. Explain the setup of a Knapsack Cipher.

6. How does a Knapsack Cipher encipher? What steps are involved?

11.8 Exercises

1. Given the weights 41, 28, 39, 57, 49 and 31, and being allowed to use them possibly multiple times, can the weight 782 be achieved, and how? 11.8. EXERCISES 229

2. Given the weights 63, 39, 81, 57, 48 and 30, and being allowed to use them possibly multiple times, can the weight 283 be achieved. If so, how? 3. Given the weights 3, 7, 12, 23, 47, 95 and 190, and using each weight at most once, which of the following amounts can be realized, and how? T = 323, T = 310, T = 117, T = 270. 4. Given the weights 3, 8, 13, 29 and 60, and P = 129, e = 10: (a) Encipher gold. (b) Decipher 32, 84, 62, 146. 5. Given the weights are 5, 8, 14, 29 and 58, and P = 127, e = 17. Decipher 111, 70, 97, 66, 75, 105. 6. Given the weights 6, 8, 17, 35, 81, 200, 403, 800, 1589, 3223 and e = 322, P = 6551: (a) Find the corresponding U’s. (b) Encipher stalactite. (c) Decipher 20632, 13837, 11968, 22524, 12265. Hint: it’s often the next word in the dictionary. 7. [Hellman] Hellman’s 1979 Scientific American article, The Mathematics of Public-Key Cryptography, was one of the very first public announcement of this new kind of cipher. Included in this article is an overview of public vs. private key, and an introduction to the Knapsack Ciphers and RSA. Here we look at his knapsack example. (a) Show that 3, 5, 11, 20, 41, 83, 169, 340, 679, 1358 forms a super- increasing sequence. (b) Use P = 2731 and e = 764 to find the corresponding U 0s. (c) Hellman’s sample message was enciphered as pairs of letters (we have 10 weights), with the binary starting at 00000, so a=00000 and z=11001. He also used the binary equivalent for 26 to represent a space, and 29 to represent ?. Encipher Hellman’s message: How are you?. (d) Find d. (e) Decipher 2908, 7643, 9799. 8. The public key codes tend to be slower on a computer than the traditional “private key” codes are. So they are most frequently used only to send the key for a traditional method. Decipher DNTFG EASCN HAFAO NNSBH SIOAI HEPTA CBROE EGFSR RBTGE ISIOT LAIEM C. It was enciphered using the double transposition method with keyword 94, 240, 155, 46, 197, 188, 151, 131. Hint: the W’s are 2, 7, 13, 31, 55, with P = 157 and e = 23. 230 CHAPTER 11. KNAPSACK CIPHERS

9. A different way to hide the super-increasing nature of the weights would be to put the “superincreasingness” inside the numbers. For example

W1 = 2900104

W2 = 3200209

W3 = 5500401

W4 = 1300810

W5 = 2401604.

(The boldness only serves to show where the super-increasing is hidden.) The modulus would be correspondingly large, say P = 200785471.

(a) Use e = 41233 to find the corresponding U’s. (b) Encipher huge. (c) Which of the following may be written as a sum of the W ’s? a) 36902109 b) 33501123 c) 10001420 d) 9705114? (Hint: only three digits really matter. Which three?. (d) Decipher 195046101, 76263779, 175978897, 160072206. Chapter 12

RSA

Take two large prime numbers, q and p. Find the product n, and the totient φ. If e and φ have GCD one and d is d’s inverse, then you’re done! For sending m raised to the e reduced mod n gives secre-c. Daniel G. Treat

We have now reached our final chapter. So it is natural that we have reached the more difficult and useful cipher in the book. Fortunately, our cipher also brings us back around to the beginning of the book. We started out, in Chapter 1, with the Caesar Cipher: translate letters into numbers and add three. This was too easy to decrypt, so Chapters 3 and 4 we turned to Decimation Ciphers: translate letters into numbers and multiply by three. This also turned out to be easy to decrypt. After several other stops along the way, we now complete the triple: having used addition and multiplication how about exponentiation?

Example: Encipher power by raising to the 3-rd power modulo 26.

plaintext p o w e r plainnumbers 16 15 23 5 18 cubed 4096 2275 12167 125 5832 ciphernumbers %26 14 13 25 21 8 ciphertext NMYUH

The ciphertext is NMYUH 

231 232 CHAPTER 12. RSA

At this point in discussing a new cipher there are always two questions we ask: “How to decipher?” and, “What is the security?” Answering the first, if you think about it for a bit, will be difficult. To undo cubing we must take the cube root, But what we want to take the cube root of is not the ciphernumber but the cube of the plainnumber. So we have the problem of trying to determine the cubed plainnumber from the ciphernumber. As for the security, just as always adding 3 or always multiplying by 3 is insecure, always raising to the 3rd power is insecure. We should choose as our key some integer to take powers with. And large integers would have to be a possibility if we are to prevent the enemy from simply trying each possible power. But this leads to a new problem: what letter does the power 33 encipher S to? 1933 = 1580770532156861979997149793605296459437459 consists of 42 digits, and so is far too big for a calculator to handle. Similarly, 19125 has 159 digits, and 125 is not a very large number. Or how can we determine the remainder when 26 divides 19392, a number of over 500 digits! There is a third, more devious, problem. 13 = 1 so a is enciphered to A, but 33 = 27 ≡ 1 (mod 26) so c is also enciphered to A. Similarly, d and f both become L. Why is this, and how can we prevent this? Despite these difficulties, the idea of developing a cryptosystem around the raising of numbers to powers is a very good one. It will lead us to one of the most popular cryptosystem of all time. But before we get there we need to surmount the difficulties we’ve just discussed.

12.1 Fermat’s Theorem

Our first need is to learn how to compute values like 133719%101. As you know, the “mod” just is shorthand for “find the remainder when divided by”, so in some sense we are not learning anything new. All one has to do is to compute 133719, divide by 101, and find the remainder. However, 133719 is galactically far too large for you calculator to directly compute.1 How then to do this? Some help is provided by a 360 year-old theorem.

Theorem 3 Fermat’s Little Theorem2 (1640). If p is a prime number and p does not divide a, then ap−1 ≡ 1 (mod p).

1It has over 1500 digits in it. For comparison, the traditional estimate for the number of elementary particles in the universe is 1080, i.e., a number of 80 digits. 2Pierre (de) Fermat, 1601-1665, was one of the last truly great amateur mathematicians. During the day he was a lawyer/councilor for his local parliament. At night he read famous math books and added his thoughts and comments to the margins. It took over 350 years before one of these notes, Fermat’s Last Theorem, was determined to actually be true. An- other, Fermat’s Little Theorem (to differentiate it from the Last Theorem) is what interests us here. 12.1. FERMAT’S THEOREM 233

Example: Since 13 is prime and doesn’t divide 5, we have (512)%13 = 1. Similarly, 37 and 101 are primes, and they don’t divide 52 and 133, respectively, so (5236)%37 = 1, and (133100)%101 = 1. 

This is a beautiful, deep and powerful theorem. The beauty lies in its sim- plicity: Choose any prime number and any other number that is not a multiple of that prime. Then, the second number, when raised to the prime minus first power, will be one less than a multiple of the prime. In other words, the remain- der upon division is always 1. No special conditions, no separate cases, just a 1. OK, there is one condition: that p doesn’t divide the number. But if p does divide a, then a ≡ 0 (mod p) so ap−1 ≡ 0 (mod p). Thus, we perfectly understand ap−1%p for all a’s: If a is a multiple of p then the answer is 0, otherwise it is 1. Since 1 ∗ a ≡ a (mod p) we may multiply ap−1 ≡ 1 by a to see that ap ≡ a (mod p) when a is not a multiple of p. And, since both sides of this equivalence are 0 when a is a multiple of p we have the following corollary:

Corollary 1 If p is a prime number and a is any integer, then ap ≡ a (mod p).

The depth of this theorem is what it tells us about arithmetic. One might think that raising integers to powers leads to somewhat arbitrary results. But Fermat’s Theorem says otherwise: raising to powers modulo a prime number produces a definite structure.3 The power of the theorem lies our application of it. Suppose we wish to compute 33125%41. Since 41 does not divide 33, by Fermat’s Theorem we 2 know that 3340 ≡ 1 (mod 41). So 3340 ≡ 12 ≡ 1 (mod 41) and, likewise, 3 3340 ≡ 13 ≡ 1 (mod 41). Because 125 = 3 ∗ 40 + 5, we then have

3 33125 = 333∗40+5 = 333∗40 · 335 = 3340 · 335 ≡ 13 · 335 ≡ 335 (mod 41).

So 33125 is the same as 335 modulo 41. Because 335%41 = 32 is small enough to compute on a calculator, we conclude that 33125%41 = 32. Fermat’s Theorem has turned the seemingly impossible 33125 (mod 41) into a calculation that is easy.

Example: Compute 20236%59. Since 59 − 1 = 58 and 236 = 4 ∗ 58 + 4, we have 20236 = 204∗58+4 = 4 204∗58 · 204 = 2058 · 204 ≡ 14 · 204 ≡ 204 ≡ 51 (mod 59). 

3In fact, it can be said that Fermat’s Little Theorem is the first important result in the subject of Abstract Algebra, a subject that, along with Analysis, constitutes most of modern mathematics. 234 CHAPTER 12. RSA

In both of these examples the quotient played an insignificant role. What, then, of the old exponent remains? The remainder. Both 125 and 236 were replaced by their remainder when divided by p − 1. But “replacing by the remainder” is just another way of saying that we are doing modular arithmetic! We can summarize this as the following:

Theorem 4 Fermat’s Theorem Restated: If p is a prime number and p does 0 not divide a, then ab ≡ ab (mod p), where b%(p − 1) = b0.

Less formally, when doing powers modulo p, we may work on the exponent modulo p − 1.

Examples:

(1) Compute 6191%19. Since 19 doesn’t divide 6 we can use Fermat’s theorem. Since 191%18 = 11, we know that 6191 ≡ 611 (mod 19). This last is easily computed to be 17, which is our answer.

(2) Compute 12360 (mod 17). 17 doesn’t divide 12, and 360%16 = 8, so 12360 ≡ 128 ≡ 15 (mod 17). So the answer is 15.

(3) Compute 23465 (mod 43). 23465 ≡ 233 ≡ 41 (mod 43). The answer is 41.



12.2 Complication I: a small one

The examples we’ve done were carefully chosen so that we ended up with a fairly small number raised to a fairly small number. What if the base was too large for a calculator to handle this computation? For example, what if we wanted 233125 (mod 41)? By Fermat’s Theorem this is the same as 2335 (mod 41). But 2335 is still too large for most calculators. What can we do? Easy: we are doing modular arithmetic, and 233%41 = 28. We then have the string of equivalences

233125 ≡ 2335 ≡ 285 ≡ 3 (mod 41).

We are now doing “double modular arithmetic”, modulo p on the base and modulo p − 1 on the exponent. 12.3. COMPLICATION II: A SUBSTANTIAL ONE 235

Examples:

(1) Compute 13097%31. First, 130%31 = 6. Next, 97%30 = 7. Putting them together we have 13097 ≡ 67 ≡ 6 (mod 31). The answer is 6.

(2) Compute 220136%67. 220%67 = 19 and 136%66 = 4. So 220136 ≡ 194 ≡ 6 (mod 67).

(3) Compute 5891213%53. 5891%53 = 8 and 213%52 = 5. So 5891212 ≡ 85 ≡ 14 (mod 53).



12.3 Complication II: a substantial one

The examples we have seen have been carefully chosen so that after (at most) two reductions we end up with a small base and small exponent. What if even after both of the reduction steps are taken the numbers are still too big?

Examples:

(1) Compute 1327%37. 13 is already reduced modulo 37, as is 27 modulo 36. But 1327 has some 38 digits in it, far too many to work with. Now what? Clearly 27 = 16 + 8 + 2 + 1. So 1327 = 1316+8+2+1 = 1316 · 138 · 132 · 13. Several of these powers are small and computable but 1316 is too big. To find it we start with the smaller powers. Of course 13 ≡ 13 (mod 37) and 132 ≡ 21 (mod 37). Next, instead of directly computing 134, we use that 134 = (132)2, so 134 ≡ (132)2 ≡ 212 ≡ 34 (mod 37). Similarly, 138 ≡ (134)2 ≡ 342 ≡ 9 (mod 37). Finally, 1316 ≡ (138)2 ≡ 92 ≡ 7 (mod 37). The last step is to put it all together: 1327 = 1316 · 138 · 132 · 13 ≡ 7 · 9 · 21 · 13 ≡ 31. So the final answer is 31. 236 CHAPTER 12. RSA

(2) Compute 2257%61 (with fewer explanations). We have 57 = 32 + 16 + 8 + 1. Computing the powers,

222 ≡ 15 (mod 61), 224 = (222)2 ≡ 152 ≡ 42 (mod 61), 228 = (224)2 ≡ (42)2 ≡ 56 (mod 61), 2216 = (228)2 ≡ (56)2 ≡ 25 (mod 61), and 2232 = (2216)2 ≡ (25)2 ≡ 15 (mod 61).

Using the powers that we need gives

2257 = 2232 · 2216 · 228 · 22 ≡ 15 · 25 · 56 · 22 ≡ 47 (mod 61).

Therefore 2257%61 = 47.



In each of these examples we broke down the exponent into a sum of power of 2.4 To make this process easier we need to find a way of determining which powers of 2 are needed. A simple on is as follows. At each step divide by 2. If this division has a remainder (the quotient has a .5 as a decimal) then indicate this with a 1, otherwise indicate this with a 0. Then repeat starting with the integer part of the quotient. End when the final quotient is .5, as it has no integer part to continue with.

Examples: Find the needed powers for an exponent of 27 and 57.

(1) For 27:

exponent quotient remainder? power of 2 27 ÷2 = 13.5 1 1 13 ÷2 = 6.5 1 2 6 ÷2 = 3 0 4 3 ÷2 = 1.5 1 8 1 ÷2 = .5 1 16

So 27 as a 16, 8, 2 and 1 in it (that is 27 = 16 + 8 + 2 + 1).

4Why 2 and not, say, 3? Because every number can be expressed as a sum of powers of 2, allowing us to compute 2257%61 by performing simply a number of squarings. If we had used 3 instead we would have needed to write 57 = 2 · 27 + 3, a sum of multiples of powers of 3. Using 2 means we don’t need multiples, which makes the calculations a bit easier. 12.3. COMPLICATION II: A SUBSTANTIAL ONE 237

(2) For 57: remainder? power of 2 57 ÷2 = 28.5 1 1 28 ÷2 = 14 0 2 14 ÷2 = 7 0 4 7 ÷2 = 3.5 1 8 3 ÷2 = 1.5 1 16 1 ÷2 = .5 1 32 So 57 = 32 + 16 + 8 = +1.



In fact, every positive integer can be written as a sum of the powers of 2: 1, 2, 4, 8, ..., using each power at most once. This is the binary expansion of the number, and the process we have just described finds this expansion. We can use the two processes, squaring using the powers of two, and the binary expansion process, together.

Example: Find the binary expansion of 159. Use it to compute 23159%171.

159 ÷ 2 = 79.5 1 1 79 ÷ 2 = 39.54 1 2 39 ÷ 2 = 19.5 1 4 19 ÷ 2 = 9.5 1 8 9 ÷ 2 = 4.5 1 16 4 ÷ 2 = 2 0 32 2 ÷ 2 = 1 0 64 1 ÷ 2 = .5 1 128 Thus 159 = 128 + 16 + 8 + 4 + 2 + 1. Next, compute the required powers of 23:

23 ≡ 23 (mod 171), 232 = 232 ≡ 16 (mod 171), 234 = (232)2 ≡ (16)2 ≡ 85 (mod 171), 238 = (234)2 ≡ (85)2 ≡ 43 (mod 171), 2316 = (238)2 ≡ (43)2 ≡ 139 (mod 171), 2332 = (2316)2 ≡ (139)2 ≡ 169 (mod 171), 2364 = (2332)2 ≡ (169)2 ≡ 4 (mod 171), and 23128 = (2364)2 ≡ (4)2 ≡ 16 (mod 171).

Using the powers that we need gives

23159 = 23128 · 2316 · 238 · 234 · 232 · 23 ≡ 16 · 43 · 85 · 16 · 23 ≡ 163 (mod 171).

 238 CHAPTER 12. RSA

Neither of these processes is very hard, but each is a bit time-consuming, especially because there are several things we wrote again and again, like ÷2 and (mod 171). Fortunately, we can combine the two steps and cut out much of the unnecessary writing. As a first example we redo the previous one.

Example: Compute 23159%171.

exponent quotient remainder? power of 2 modulo 171 159 ÷2 79.5 1 1 23 ≡ 23 79 ÷2 39.54 1 2 232 ≡ 16 39 ÷2 19.5 1 4 162 ≡ 85 19 ÷2 9.5 1 8 852 ≡ 43 9 ÷2 4.5 1 16 432 ≡ 139 4 ÷2 2 0 32 1392 ≡ 169 2 ÷2 1 0 64 1692 ≡ 4 1 ÷2 .5 1 128 42 ≡ 16

Thus 23159 ≡ 16 · 43 · 85 · 16 · 23 ≡ 163 (mod 171). 

From now on we will abbreviate by not writing the “÷2 =” or “power” columns.5

Example: Compute 4561%79. First, 45 < 79 and 61 < 70 so the base and the power are already reduced. Since 4561 is far too big we must build we will call the binary chart.

exponent quotient remainder mod 79 61 30.5 1 45 ≡ 45 30 15 0 452 ≡ 50 15 7.5 1 502 ≡ 51 7 3.5 1 512 ≡ 73 3 1.5 1 733 ≡ 36 1 .5 1 362 ≡ 32 Thus 4561 ≡ 45 · 51 · 73 · 36 · 32 ≡ 2 (mod 79). 

12.4 Complication III: a mini one

The computation we just did, 45 · 51 · 73 · 36 · 32 ≡ 2 (mod 79), almost is too much for many calculators. It certainly can happen that the final computation’s product is too big. Then what? If there are too many factors, or if they are too

5In fact, one of the interesting results of this method is that we don’t need to explicitly find the proper powers of 2. 12.5. COMPLICATION IV: THE LAST ONE 239 large to multiply together all at once, simply group them into a families of two or three, multiply and reduce them, and then multiply and reduce the results.

Example: Compute 372 · 361 · 19 · 281 · 107 · 81 · 239 · 301 (mod 401)

372 · 361 · 19 · 281 · 107 · 81 · 239 · 301 (mod 401) ≡ (372 · 361 · 19) · (281 · 107 · 81) · (239 · 301) (mod 401) ≡ 386 · 154 · 160 (mod 401) ≡ 122 (mod 401).



12.5 Complication IV: the last one

The reason we are interested in computations like 133172%323 in the first place is that they form the heart of the most popular public key cryptography system in use today. The final complication is that the computations in this system are not of the form ab%p, where p is prime, but like ab%pq, where p and q are different primes. How does the change from a prime as the modulus to two prime affect what we’ve done so far? We know that 126 ≡ 1 (mod 7) and 1210 ≡ 1 (mod 11). How should we complete 12 ≡ 1 (mod 77)? It turns out this is equivalent6 to finding a value k so that 12k ≡ 1 (mod 7) and 12k ≡ 1 (mod 11). From Fermat’s theorem 126 ≡ 1 (mod 7) and 1210 ≡ 1 (mod 11). In fact, any exponent that is a n multiple of 6 or 10, respectively, produces the same result: 126n ≡ 126 ≡ m 1n ≡ 1 (mod 7) and 1210m ≡ 1210 ≡ 1m ≡ 1 (mod 11). The simplest such multiple7 is the product, (7 − 1)(11 − 1) = 60 in this case. So 1260 ≡ 1 (mod 7) and 1260 ≡ 1 (mod 11), hence 1260 ≡ 1 (mod 77). Now there is nothing special about 7 and 11. If p and q are distinct prime numbers, neither of which divides a, then ak ≡ 1 (mod p) and ak ≡ 1 (mod q) whenever k is a multiple of both p − 1 and q − 1. In particular, this is true when k = (p − 1)(q − 1). This gives us

6From Theorem 1 way back in Chapter 3, a ≡ b (mod pq) is the same as pq dividing a − b. But when p and q are relatively prime, this needs both p and q to divide a − b. That is, a ≡ b modulo both p and q. 7The smallest such multiple is the least common multiple, which we recall from Section 8.4. For simplicity we will use the product. 240 CHAPTER 12. RSA

Theorem 5 Euler’s Theorem8 (1760): If p and q are two distinct primes and neither one of them divides a, then

a(p−1)(q−1) ≡ 1 (mod pq).

Leonard Euler (1707–1783) is history’s most prolific mathematician. Except for 1741–1766 when he was at the Royal Academy in Berlin, from 1727 until his death Euler lived in St. Petersburg and worked at the Imperial Academy there. His contributed to most fields of mathematics, in particular geometry, calculus and number theory, as well as to physics, especially acoustics, hydraulics and the theory of light. As if more is needed, he became blind in 1766 at age 59 but, despite this, continued his work on optics, algebra and lunar motion, producing almost half of his total works while blind. Euler wrote over 700 books and papers, piling new papers atop older ones. The Imperial Academy, which published the papers, published the top ones first, cause later, more advanced results to appear before the ones they superseded or depended upon!

Examples: Of Euler’s Theorem.

(1) 1260 ≡ 1 (mod 77), since 77 = 11 · 7 and (11 − 1)(7 − 1) = 60. (2) 19252 ≡ 1 (mod 301), since 301 = 43 · 7 and (43 − 1)(7 − 1) = 252. (3) 2324 ≡ 1 (mod 35) and 4964 ≡ 1 (mod 85).



Where will these theorems affect our work? Only in one place. The only times we’ve used “p − 1” so far is when doing modular arithmetic on the ex- ponent. So when the modulus is the produce of two prime pq, then we must simply be careful to consider the exponent modulo (p − 1)(q − 1).

Examples:

(1) Compute 171803%671. 671 factors as 61 · 11. Since (61 − 1, 11 − 1) = 600, we must reduce 1803 modulo 600. Since this is 3, we have 171803 ≡ 73 ≡ 35 (mod 77). (2) Compute 25448%253. 253 = 11 · 23, and 10 · 22 = 220. 448%220 = 8, so 25448 ≡ 258 ≡ 49 (mod 253).



8Euler actually proved his theorem for any modulus, rather than for the much simpler case were are concerned with here. In general it takes the form aφ(n) ≡ 1 (mod n), where φ(n) is an easily computed value. This φ is called “Euler’s phi function” and is not to be confused with Friedman’s Φ. 12.6. PUTTING IT ALL TOGETHER 241

12.6 Putting It All Together

We summarize the steps we’ve taken and then do one final example. To compute ab (mod p) or ab (mod pq).

1. Reduce the base a either modulo p or modulo pq. 2. Reduce the exponent b either modulo p − 1 or modulo (p − 1)(q − 1).

3. If the numbers involved are still too large for your calculator, build and complete the binary chart modulo p or modulo pq. 4. If the final product contains too many or too large of numbers, groups them, multiply and reduce the groups, then multiply and reduce again to get the final answer.

Example: Compute 19711149%391. Hint: 391 = 17 · 23. (1) Reduce the base: 1971%391 = 20. (2) Reduce the exponent: (17 − 1)(23 − 1) = 352. 1149%352 = 93. (3) The binary chart:

93 46.5 1 20 ≡ 20 46 23 0 202 ≡ 9 23 11.5 1 92 ≡ 81 11 5.5 1 812 ≡ 305 5 2.5 1 3052 ≡ 358 2 1 0 3582 ≡ 307 1 .5 1 3072 ≡ 18 (4) Group and multiply: 19711149 ≡ (20·81·305)·(358·18) ≡ 148 (mod 391). 

12.7 Exponential Problems (and answers)

At the beginning this chapter we hinted at a new cryptosystem, one based on taking the power of the message. To use it we would choose as key an integer e and encipher our message m as me (mod 26). There were, however, a number of difficulties with this idea. First, there was the question of decryption: what reverses raising to the e-th power in modular arithmetic? Second, to prevent a possible enemy from simply trying all possible exponents we need to be able to choose arbitrarily large e’s. Finally, it is possible for this system to encipher different letters to the same cipherletter (both d and f become L if e = 3), making deciphering rather problematic. 242 CHAPTER 12. RSA

Perhaps not surprisingly, given the last several sections, the solution to all three of these difficulties will come from the same modification: instead of work- ing modulo 26 we will work modulo PQ, where P and Q are two (large) primes. From Euler’s generalization of Fermat’s Theorem, we know that a(P −1)(Q−1) ≡ 1 (mod PQ) whenever neither P nor Q divides a. So if we cleverly pick d (using the Euclidean Algorithm) to be the solution to ed ≡ 1 (mod (P − 1)(Q − 1)), then we have

d ae ≡ aed ≡ a (mod PQ).

That is, raising to the d-th power will reverse the effect of raising the e-th power.9 We will be able to decipher messages. Further, having two letters that become the same when enciphered will be impossible: if me ≡ ne (mod PQ), then

m ≡ med ≡ (me)d ≡ (ne)d ≡ ned ≡ n (mod PQ).

So two letters (or messages) that are enciphered to the same letter (or mes- sage) were actually the same to start. That is, different letters are enciphered differently. Finally, there is the question of choices of e and d: are there enough so that our supposed enemy cannot stumble upon d simply by trying all the possibilities. Notice first that different choices of e can lead to the same actual encryption. We pointed this out in the case of prime moduli: if e ≡ (mod P − 1) then 0 me ≡ me (mod P ). The same is true modulo (P − 1)(Q − 1): if if e ≡ e0 0 (mod (P − 1)(Q − 1)) then me ≡ me (mod PQ). So it doesn’t actually add more choices to allow e and d to be larger than (P − 1)(Q − 1), and so there are at most (P − 1)(Q − 1) different choices for e and d. To provide for a large number of choices for e, then, we will use very large P ’s and Q’s.

12.8 RSA

The RSA10 crypto-system was invented by Ronald L. Rivest, Adi Shamir and Leonard Adelman in 1977. We have given the basics of the system. The only thing left to add is that if we are going to work modulo a large modulus, there is no need to encipher one letter at a time. We can instead use this as a polygraphic cipher.

9This statement is the “trick” behind the cipher system we are about to explain. Make sure you understand it, looking back at Theorems 12.1 and ??, if necessary. 10U.S. Patent No. 4, 405, 829, September 20, 1983, expired on September 20, 2000. 12.8. RSA 243

The RSA Algorithm Setup: Pick two prime numbers P and Q, and let N = PQ. Choose e so that 1 < e < (P − 1)(Q − 1) with gcde, (P − 1)(Q − 1) = 1. Find d such that ed ≡ 1 (mod (P −1)(Q−1)) via the Euclidean Algorithm. To encipher: Split the message into segments M each of which is smaller than N. Compute and send the numbers M e%N. To decipher: To decipher a message block E, compute Ed%N.

Before we do examples there are a couple comments we need to make. Notice that when converting the plaintext into numbers we need to translate letters like a into 01 rather than simply 1. This way we can tell concatenations like sab = 190102 and sl = 1912 apart.11 12 Next, people tend to pick P and Q to be massively large primes, one having 300 digits or a bit less, and one having 300 digits or a bit more. This makes N = P · Q to be about 600 digits long. It’s certainly not any trouble for a computer to store a 600 digit number (in a text file this is only about eight lines of numbers). Conversely, it is quite common choose P and Q so that 3 does not divide either P − 1 or Q − 1, and then simply use e = 3 for enciphering.13 Finally, the RSA code seems quite difficult to break (for reasons we will see in a moment) as long as P and Q are this large. However, RSA is very slow compared to the popular private key codes available today. So most messages are sent in two parts. The first part of the message would say something like “Use DES with key key” and be enciphered using RSA, while the second part, the much longer portion, would contain the actual message enciphered using DES with the key sent in part one.

Examples:

(1) Use P = 19, Q = 13, e = 23.

1. Encipher code as a monographic cipher (i.e., one letter at a time). First N = P ·Q = 247. Then, performing the necessary computations (but not writing the details, such as the binary charts), we have

11This translation means that the largest two-letter block is 2626, the largest three-letter block is 262626, etc. We need to make sure to pick N so it is larger than the largest block in whatever block size we pick. 12There are more compact ways to translate letter blocks into numbers. For example, setting z=0 rather than 26, and then using (p1, p2) → 26 ∗ p1 + p2 for two-letter blocks and (p1, p2, p3) → 676 ∗ p1 + 26 ∗ p2 + p3 for three-letter blocks provides for more compact usage. But we will stick with concatenation. 13216 + 1 is another popular choice for e, due to its simple binary expansion. 244 CHAPTER 12. RSA

c = 3 → 323%247 = 243. o = 15 → 1523%247 = 59. d = 4 → 423%247 = 36. e = 5 → 523%247 = 47.

So code is send as 243, 59, 36, 47. 2. Decipher 47, 123, 61, 59. To decipher we must find d. First, (P −1)(Q−1) = 18·12 = 216, and then from the Euclidean algorithm d = 47 is the solution 23x ≡ 1 (mod 216). Then

47 → 4747%247 = e → e. 123 → 12347%247 = 24 → x. 61 → 6147%247 = 16 → p. 59 → 5947%247 = 15 → o.

So the deciphered message is expo.

(2) Use N = 2747 and e = 19 to encipher exponentiation in two-letter pairs.

ex = 0524 → 051419%2747 = 1567. po = 1615 → 161519%2747 = 1084. ne = 1405 → 140519%2747 = 1461. nt = 1420 → 142019%2747 = 1323. ia = 0901 → 90119%2747 = 901. ti = 2009 → 200919%2747 = 2009. on = 1514 → 200919%2747 = 1818.

So 1567, 1084, 1461, 1323, 901, 2009, 1818 is the ciphertext.

(3) Use that 2747 = 67 · 41 to to decipher 1032, 1469, 1821, 1551, 2020 into two-letter pairs. After finding (P − 1)(Q − 1) = 2640 and d = 139, we decipher:

1032 → 1032139%2747 = 1921 → su. 1469 → 1469139%2747 = 1605 → pe. 1821 → 1821139%2747 = 1816 → rp. 1551 → 1551139%2747 = 1523 → ow. 2020 → 2020139%2747 = 518 → er.

The answer is superpower.

 12.9. RSA AND PUBLIC KEYS 245

There are at least two interesting things about these examples. First, be- cause of the choice of small numbers for P and Q in Example (1), the RSA cipher became a complicated mono-graphic cipher. Frequency analysis of the type we practiced in Chapter 6 would easily break this cipher. However, by choosing P and Q so that N was larger than 2626 in Examples (2) and (3) we used RSA as a digraphic cipher. And if we had chosen P = 487, Q = 541 then N = 263467 would have let us use RSA as trigraphic cipher. The second thing was how encryption and decryption, while very similar processes, involve very different amounts of knowledge. To encipher we only need to know e and N, for the only thing we must do is to compute M e%N. In particular, we do not need either P or Q. To decipher, however, we do need to know P and Q, because to find d we must use (P −1)(Q−1). The application of this differential knowledge is what allows RSA to be public key cipher system.

12.9 RSA and Public Keys

The information needed to use any particular RSA cipher is very different for the encryptor than it is for the decryptor. To encipher a message one only needs the power e and the modulus N. The values of d and (P − 1)(Q − 1) are unnecessary. When deciphering we need N, but also need d, and need e and (P − 1)(Q − 1) to determine d. That is, the decipherer needs P and Q to determine d. This allows RSA to be used as a public key code. Alice chooses the two primes P and Q and computes their product N. Then she chooses e and uses P and Q to compute d. Alice then makes public the values N and e. Since all that is needed to encipher a message is e and N, anyone can send Alice a message using her system.

Alice Anderson Phone: 1-800-CALL-ALC Email: alice a mymail.com I use RSA. My public keys are e = 17 and N = 549992441.

Conversely, Alice keeps P , Q,(P − 1)(Q − 1) and d all secret. Since she knows d she can decipher any message sent to her.14

12.10 How to break RSA

Suppose we capture an enciphered message E that is intended for our enemy. How can we read the message? 14Since to decipher she only need to raise to the d-th power modulo N, she should throw P , Q and (P − 1)(Q − 1) away, erase them from any computers they are on and burn any papers they are written on. 246 CHAPTER 12. RSA

First, what about our old standby – frequency analysis? If N consists of around 600 decimal digits, then our ciphertext segments will also be about 600 digits long. How many possible 600 digit segments are there? Lots: 10600. Recall that a traditional estimate for the number of elementary particles in the universe is 1080. So even if we wanted to perform frequency analysis, there wouldn’t be enough room in the universe to write down our frequency count! So we must try another method to break a message. As usual, once we have tried the brute force method of frequency analysis, we then turn to the specifics of the system itself. Again, how to decrypt an RSA-enciphered message? Well, we can look up our enemy’s e and N, since these are public information. We want M, the true message, we know the value of N and we know that Ed%N = M. The only thing we don’t know is d. So we only need to discover d. Well, we know that e and d are chosen so that e·d ≡ 1 (mod (P −1)(Q−1)), and we know e. The only thing we don’t have is (P − 1)(Q − 1). So we only need to discover (P − 1)(Q − 1). Well, we know N, and we know that P · Q = N. Also,

(P − 1)(Q − 1) = P · Q − P − Q + 1 = N − P − Q + 1.

We know the N and 1 parts of this, but don’t know the P or Q. So we only need to discover P or Q. Well, N/P = Q and N/Q = P , so if we know either P or Q then we know the other and so know them both. But N has only two factors, P and Q. So we only need to factor N. Thus the entire security of the RSA system apparently comes down to the ease or difficulty of factoring N. If we can factor N we can easily decrypt any message enciphered modulo N. And the chain of “well”s above is meant to convince you that factor N is the only way to break RSA.15 How hard can this be? After all, we all spent several weeks in 5th or 7th grade talking about primes and factors and breaking down numbers into their prime factors. So why can’t smart people using fast machines just factor N? In fact, why not just set up a really fast computer and do the obvious thing: see if 2 divides N, then see if 3 divides N, then see if 5 divides N, and so on, working your way along the primes until you find either P or Q? Remember that N is about 600 digits long. An important theorem, called, logically enough, the Prime Number Theorem, says that less that N there are about N/ ln(N) primes. That means that to find up to P or Q, which means checking up to about 10300, we must check about 10300/ ln(10300) ≈ 10298 primes. But, again, there are only 1080 particles in the whole universe! Imagine that I said I hid one specially marked atom somewhere in the universe and you

15Of course there are other ways to break any particular RSA system. Perhaps our enemy will make some grievous mistake in enciphering, like leaving part of the message unenciphered. Or will allow us to time his/her computer while it is deciphering as many messages as we wish. But, for most practical purposes, the security of RSA comes down to the factoring problem. 12.10. HOW TO BREAK RSA 247 had to find it. This task is almost infinitely easier than the factoring N using a brute force factoring method! OK, perhaps you say that we know that none of the small prime numbers divide N, so let’s not waste our time with those. In fact, let’s only check the primes that have between 299 and 301 digits. This can’t be so many, right? Well, the Prime Number Theorem, again, tells us there are

10301/ ln(10301) − 10299/ ln(10299) > 10298 primes in this region. We haven’t eliminated too many!16 OK, perhaps you will argue that somebody will someday figure out how to factor such big numbers. Really, it is a very simple idea: just factor the darned thing. There are indeed many many people working on exactly this question, developing fancy methods with exotic names like “Pollard’s ρ-method” and the “Number Field Sieve”, and these method have been shown to have the ability to factor numbers of up to 155 digits.17 So it is possible that in the future18 people will be able to quickly factor 600 digit numbers. I guess our hope is that by then whatever messages we send today will be so outdated that no one will care to go back and break them. And by then we will have switched so that P and Q are about 300 digits each. The final “OK”. OK, perhaps you will say but isn’t there this thing called the “National Security Agency” and isn’t the US government spending billions of dollars to fund them every year to do cryptographic work for the FBI and CIA and weren’t they smart enough to break the Russians one-time pads when the Russians didn’t use them properly and don’t they hire many many really smart mathematicians that they swear to secrecy? Might not they have figured out a way to break RSA ciphers and not be telling us? Huh? Hey smart guy, what’s your answer to this? And I’d have to answer, “dunno”. Maybe they have. No one seems to have any real evidence that they did, but we just don’t know since they aren’t telling. The upshot is that, outside of possibly the NSA (or its equivalents in other countries), as far as I know no one is currently able to break a carefully con- structed and properly used RSA cipher in any reasonable amount of time. This is an area of much research in mathematics, so I’m making no promises about the future. But for the time being a well-constructed RSA system appears to be quite secure.

16In fact, intuitively this makes sense. $10, 000 is a lot of money, but removing that much from $1, 000, 000 still leaves a whole lot left over. Those two extra 0’s are a big deal! 17See the rsasecurity.com homepage. 18Two big shots in the field, Arjen Lenstra and Eric Verheul, have guessed that in another 5 or so years (2009) it will be possible to build a computer that can break a 1024-bit RSA key in about a day for $250 million. The National Institute of Standards and Technology recommends that to protect information until 2015 one should use primes of roughly 300 digits. 248 CHAPTER 12. RSA

12.11 Authenticity – Proof of Authorship

An important disadvantage of all private key cipher systems is that they do nothing to help solve the key management problem. How can we secretly ex- change secret keys with people we’ve never met? And how can we keep all these secret keys straight, let alone secret? As we have discussed, a Public Key cipher takes care of these difficulties. To send someone a private message we don’t have to know them, or have met them, or have agreed on a key or a method. We just look up and use their RSA information from their Internet site. This leads to a problem: If anyone can send you a message, there is no direct way of knowing who sent you that message! It means little for an email to be signed “Bob” as anyone can type B-o-b. How can we be sure that the person whose name is at the bottom of the message really sent it? With the knapsack ciphers this is a fairly difficult problem to overcome but with the RSA it’s easy. As usual, we’ll refer to the two parties exchanging messages as Alice and Bob. We will use a subscript A to denote “Alice’s”, so NA is the value of N in Alice’s system. Similarly, eB is the enciphering exponent in Bob’s system. Alice knows her dA but no one else does. Similarly only Bob knows dB. However, they both know eA, NA, eB and NB. When Alice sends a message to Bob she wants to make sure that only he can decipher it, and that he knows it is actually from her. So she writes her message M and then computes

 eB dA M %NA %NB, if NA < NB, or  dA eB M %NB %NA, if NB < NA,

e d and sends this ciphertext to Bob. (Unfortunately, while M dA  B and M eB  A are equal, they may not be equivalent, that is, when working modulo NA and NB the order of the moduli matters.)

Bob needs to be similarly careful when deciphering. If NA < NB, he undoes the eB exponentiation (using dB) and then undoes the dA exponentiation (with eA). When NA > NB, the order must be reversed. This is quite clever, and so let’s take a moment to see what each party now knows. Bob receives a ciphertext supposedly from Alice. Since he can look up her modulus NA he knows which order to apply dB and eA to decipher the message. So he can read it. Further, since only he knows dB, Bob is the only person that can decipher this message. Finally, since applying eA led to a readable message, dA must have been applied to the plaintext, and since Alice is the only person that knows dA, it really must have been her that sent the message. Alice can come to quite similar conclusions: Bob is the only person who can read the message she sent, and she knows he will know it was she who sent it. (Make sure you understand why she knows these things.) Finally, Ed the Adversary, even though he knows all of Alice and Bob’s public information, 12.12. SUMMARY 249 cannot read the message; in fact, he cannot even determine if it is actually from Alice! This process is called sending a . In fact, it is a bit like a person’s signature in that your signature is something that you can easily make and that anyone can compare with others of yours to convince themselves that you are you, but it is something that is hard for other people to fake. Similarly, it is easy for Alice to compute the number (Alice)dA after which anyone else e can compute (Alice)dA  A to convince themselves she is really who she says she is.19

Example: We receive 277763, 165924, 169282 from someone who claims to be David. We know that NDavid = 273487 and eDavid = 3. Our values are N = 279827, e = 297 and d = 4757. Did this message come from him and what did it say?

First, since NDavid = 273487 < 279827, if it was David who sent this message dD e he would first have computed E = M %ND and sent us E %N. So we must d first compute E %N, and then raise this to the eD modulo ND. Raising the message to the d = 47577-th power modulo 279827 produces 198427, 240953, 178426. Raising these numbers to the eD = 3-rd power modulo ND = 273487 gives 23, 5, 5, 11, 5, 14, 4 or weekend. This mes- sage makes sense, so yes, David actually send the message. 

With the idea of digital signatures understood we should modify one of our earlier sections. When one is using the RSA one really sends a message in several parts. Part 1 is a quick note saying who you are and what private key method and what key you will use to encipher part 2 of the message. Part 1 is double enciphered using your d and your recipient’s e as just indicated, which both keeps it secure and proves you sent it. Part 2 is the true message and is enciphered using the method and key you just indicated. Finally, Part 0 is a cover letter saying who you are and where you are sending the message.

12.12 Summary

After trying addition (Caesar Ciphers) and multiplication (Decimation Ciphers), it is natural to try exponentiation to create a cipher system, and RSA is exactly that. It enciphers messages by raising them to a power modulo the product of

19There is a subtlety here. If Alice simply sends AlicedA as her signature for each of her messages, Ed can steal this number and pretend to be her. In practice, Alice would apply a hash function to the message, producing a very short version of it, and it is this that she “signs.” Since Bob can apply the same hash function to the un-deciphered message, he can compare these “digital fingerprints” to see if they are the same, and will then know if it was indeed Alice who sent the message. For our purposes, you can instead think that Alice uses as her name Alice coupled with some nulls that change from message to message – xAliceet or mnAliceW. 250 CHAPTER 12. RSA two large primes. Euler’s Theorem allows us to decipher messages by raising the ciphertext to a power modulo the same product. The deciphering power is the multiplicative inverse of the enciphering power modulo a number easily computed from the two original primes. When dealing with large primes, and here large means approximately 300 digits each, exponentiation must be done carefully so that the size of the num- bers does not overwhelm the computer or calculator being used. In general, to compute the value (ab)%n, where a, b and n could all be sizable values and n is either prime or the product of two primes, first reduce a modulo n, then reduce b modulo n − 1 (if n is prime) or modulo (p − 1)(q − 1) (if n = pq is a product of primes). Next, a “binary chart” is used to simultaneously write the remainder of b in binary and compute the powers of the remainder of a. Finally, the necessary powers of the remainder of a are combined to produce the final value. The RSA system has become the world’s most widely-known Public Key cryptosystem. The private information is the deciphering key and the two primes, and the public information is the product of the primes and the enci- phering exponent. RSA also provides for digital signatures, a method by which the sender can prove their identity. Although RSA is slow when compared with popular private key systems, it pairs easily with any such system: encipher your message with a private key system, encipher the key with RSA and then send the enciphered key and the ciphertext as a two-part message. Outside of poor uses of the system (e.g., a bad choice of parameters), which are generally easy to avoid, the RSA system seems very difficult to break. Its security seems to depend on the difficulty of factoring the product of the two large primes. While factoring has been studied for many, many years, there no publicly-known general method that will factor the product of two well-chosen primes in any realistically small amount of time. So, for the time being at least, RSA is one of the most secure ciphersystems known.

12.13 Topics and Techniques

1. What is Fermat’s Theorem? How to use it? Under what conditions does it apply? 2. Is raising a number to a large power modulo a prime ever the same as raising that same number to a smaller power modulo the same prime? Explain. 3. What is “double modular arithmetic”? 4. If I have a large number raised to a large number that I want to reduce modulo a smaller prime, how do I go about making this problem more manageable? 5. What is the binary expansion of a number? How to find it? 12.14. EXERCISES 251

6. What is a “binary chart”? How is it used?

7. Outline the steps involved in computing ab%p, where p is a prime number and a and b both might be quite large.

8. What is Euler’s Theorem? How does it differ from Fermat’s Theorem?

9. Outline the steps involved in computing ab%n, where n = pq is the product of two prime numbers and a and b both might be quite large.

10. Explain the basics of the RSA algorithm.

11. Can RSA be used as a polyalphabetic cipher? How? When?

12. Is RSA a public key system? What is public? What is private?

13. What is involved in breaking the RSA system? What knowledge must the enemy obtain to break an RSA-enciphered message?

14. What size of numbers are used in an RSA system? Why does size matter?

15. How can the author of a message sign the message? How does the recipient of a message become convinced the author is who he/she says he/she is?

12.14 Exercises

1. Find the following numbers. The numbers in the modulus are all primes.

(a) 340187 (mod 37). (b) 19195 (mod 5 · 17). (c) 541330 (mod 7 · 19). (d) 184144 (mod 59). (e) 2467 (mod 5 · 23). (f) 54042203 (mod 29 · 37).

2. Encipher large in one-letter blocks using P = 7, Q = 19 and e = 5.

3. Decipher 1108, 494 if it was enciphered with parameters P = 31, Q = 41 and e = 7.

4. Using the parameters P = 31, Q = 13, e = 7:

(a) Encipher primes in one-letter pieces. (b) Compute d and decipher 391, 135, 208, 128, 346, 164. 5. Using the parameters P = 71, Q = 37, and e = 13:

(a) Encipher arithmetic in two-letter segments. 252 CHAPTER 12. RSA

(b) Compute d and decipher 2024, 1553, 469, 299.

6. Decipher 498, 1, 280, 248, 143, 37, if the RSA system used P = 19, Q = 29 and e = 13.

7. Encipher composite in three-letter blocks using the parameters P = 1223, Q = 563 and e = 23.

8. (a) [RSA] The “Small Example” given in the original description of RSA uses p = 27, q = 59 and d = 157. (This paper chooses d first and then computes e.) Compute e. (b) Using the usual translation of letters to two-digit numbers, with 00 representing a space, translate Its all greek to me into numbers. (c) Encipher the message.

9. [Hellman] In his Scientific American paper about Knapsack Ciphers and RSA, Hellman uses the letter-to-number translation in which a to z are 0 to 25, A to Z are 26 to 51, a space is represented by 62, and ? is 66.

(a) Hellman’s example message is How are you? What is the numerical equivalent? (b) Use e = 11 and N = 11023 to encipher the message in two-letter pairs. (Hellman called these E and n.) Hint: 73 divides 11023. (c) Find the deciphering exponent.

10. In the following examples, the sender has tried to prove his identity by signing the message, as described in the text. Decipher the message.

(a) The message is 54, 135, 112, 112. It is to Andy (parameters P = 17, Q = 11, e = 7) from Bridget (parameters P = 19, Q = 13, e = 5). (b) The message is 14756, 9105, 191. It is to Alice (parameters P = 137, Q = 191, e = 7) from Bob (parameters P = 173, Q = 127, e = 5).

11. We started the chapter by considering raising a message to the 3rd power. That is translating letters to numbers and raising each number to the 3rd power modulo 26. Why didn’t we instead raise 3 to the letter’s power? So, for example, e would encipher to 35 = 243 ≡ 9 (mod 26) = I. Would this method produce a good cipher? (Hint: encipher a couple of words and study the outcome.)

12. Your RSA parameters are P = 67, Q = 97, e = 59 and d = 179. One day you receive the message

5689, 3415, 347. 12.14. EXERCISES 253

Bob is going to be sending you the name of a new business contact. But so is Carley. If Bob’s RSA parameters are N = 7979 and e = 47 and Carley’s are N = 3569 and e = 53, who sent the message and what does it say? 13. The statement of Euler’s Theorem begins “If p and q are two distinct primes.” Why is the distinctness needed? In other words, what happens if p = q? 14. By some amazing coincidence Alice and Bob independently chose the same primes P and Q when making their RSA parameters. Fortunately they chose different encryption and decryption keys eA and eB. Show that eA eB given the ciphertexts CA = (M )%N and CB = (M )%N if eA and eB are relatively prime then M can be recovered. 15. Show that P +Q = N −(P −1)(Q−1)+1 and P −Q = p(P + Q)2 − 4N. Thus by adding or subtracting P and Q can be found directly. This shows that knowledge of N and (P − 1)(Q − 1) suffices to break RSA. 16. Shamir, Rivest and Adelman suggested that their RSA method could be used to play poker over the phone [SRA]. Explain how using techniques similar to signing a message, two players may use RSA to play poker. Hint: Have one player begin by encrypting the fifty-two messages ace of clubs, two of clubs, three of clubs, ..., king of spades, sending them to the other player. There will need to be a lot of messages sent. 254 CHAPTER 12. RSA Bibliography

[Antonucci] Michael Antonucci, Code-Crackers Civil war Times Illustrated, July August 1995. [Abeles] Francine F. Abeles, The Mathematical Pamplets of Charles Lut- twidge Dodgson and Related Pieces, 1994, page 326. [Abeles2] Francine Abeles and Stanley H. Lipson, Some Victorian Periodic Polyalphabetic Ciphers, Cryptologia, April 1990, Vol XIV No. 2, page 128–134. [Abelels3] Francine Abeles and Stanley H. Lipson, The Matrix Cipher of C.L. Dodgson, Cryptologia, January 1990, Vol XIV No. 1, page 28-36. [Bates] David Homer Bates, Lincoln in the Telegraph Office. [Bauer] Bauer’s Book. [Belaso] Giovan Batista Belaso, La Cifra del. Sig. Giovan Batista Belaso, 1553 (see page 137, Kahn). [Brown] Joseph Willard Brown, The Signal Corps, U.S.A. in the War of the Rebelions, Boston, U.S. Veteran Signal Corps Association, 1896. [Elements] United States of America War Office, Elements of Cryptanalysis, Training Pamphlet No. 3, May 1923 . [Friedman] William Friedman, Cryptography and Cryptanalysis, Vol 2, 1937. [Gaddy] David Winfred Gaddy,Signals and Ciphers, C.S.A. A Study in Con- federate Cryptology. [Gaines] Helen Fouche Gaines, Cryptanalysis. [Glasby] S.P. Glasby, Extended Euclid’s Algorithm vi Backward Recurrence Relations, Math. Mag., Vo. 72, No. 3, Pages 228–230. [Graves] Robert Graves, I, Claudius, page 212, Random House, 1934. [Gray] Jacques B. M. Gray, Vowel identification: an old (but good) algo- rithm. Crytologia, July 1991 VOl XV, Number 3, pages 258–61.

255 256 BIBLIOGRAPHY

[Hassard] John R. G. Hassard, Cryptography in Politics, pp 315–26, 1879, The North American Review. [Hellman] Martin Hellman, The Mathematics of Public-Key Cryptography, Scientific American, 1979. Vol 241, pages 146–57. [Kahn] David Kahn, The Codebreakers. [Kasiski] , Die Geheimschriften und die Dechiffrir-kunst [Koblitz] Neal Koblitz, Crytography as a teaching tool, Cryptologia, Vol. 21, No. 4 (1977), pp. 317–326. [Merkle] R. C. Merkle, M. E. Hellman, Hiding Information and Signatures in Trapdoor Knapsacks, IEEE Transactions on Information Theory, Vol. 24, No. 5, pp. 525-530, September 1978. [Morris] Richard Morris, John Jay: The Making of a Revolutionary, Unpub- lished papers 1745–1780, Edited by Richard B. Morris, Harper & Row, page 656-666 [Myers] General Albert James Myer, A Manual of Signals: for the use of Signal Officers in the field [Nichols] Randall K. Nichols, ICSA Guide to Cryptography, page 70. [Schooling] John Holt Schooling, The Pall Mall Magazine Vol VIII, 1896 I. page 119–??D [Peckham] Howard H. Peckham, British Secret Writing in the Revolution, The Quarterly Review, pages 125–130. (Michigan Alumnus Quarterly Review, 9 November 1938, winter 1938 VF 25-43 ) [Plum] William Rattle Plum The Military Telegraph during the Civil War in the United States, Arno Press, New York, 1974 [Pratt] Fletcher Pratt, Secret and Urgent [RSA] Rivest, Shamir, Adleman A Method for Obtaining Digital Signa- tures and Public-Key Cryptosystems, Communications of the ACM, February 1978, Volume 21, Number 2, pages 120–126. [Sinkov] Abraham Sinkov, Elementary Crytanalysis. [SSA1] SSA, The History of Codes and Ciphers in the United States Prior to World War I, page 132. [SSA2] Codes and Ciphers during the Civil War Prepared under the Direc- tion of the Chief Signal Officer, 20 April 1945. [SSA3] Historical Background of the Signal Security Agency Army Security Agency, Washington, D.C, 12 April 1946 Volume I BIBLIOGRAPHY 257

[Shamir] A. Shamir, A Polynomial-Time Algorithm for Breaking the Basic Merkle-Hellman Cryptosystem, Advances in Cryptology - CRYPTO ’82 Proceedings, pp. 279-288, Plenum Press, 1983. IEEE Transac- tions on Information Theory, Vol. IT-30, pp. 699-704, 1984. [SRA] A. Shamir, R. Rivest, L. Adleman, Mental Poker, The Mathemat- ical Gardner, pgs 37-43. [Treat] Daniel G. Treat RSA: A Limerick, Mathematics Magazine, Vol 75, No. 4, October 2002, page 255. [Vigenere] Vigener`e, Traicte’ des Chiffres.

[Weber] Ralph E. Weber Masked Dispatches: and Cryptology in American Hisotyr, 1775–1900, Series I Pre-World War I Volume I MSA, CSS 1992. [Yardleyb] Yardley book

[Yardleya] Herbert Yardley, Are we giving away our state secrets?, Liberty Magazine, Dec 19, 1931, pages 8–13.