California State University, San Bernardino CSUSB ScholarWorks
Electronic Theses, Projects, and Dissertations Office of aduateGr Studies
6-2016
The Evolution of Cryptology
Gwendolyn Rae Souza California State University - San Bernardino
Follow this and additional works at: https://scholarworks.lib.csusb.edu/etd
Part of the Intellectual History Commons, Number Theory Commons, Other Applied Mathematics Commons, Other History Commons, and the Other Mathematics Commons
Recommended Citation Souza, Gwendolyn Rae, "The Evolution of Cryptology" (2016). Electronic Theses, Projects, and Dissertations. 572. https://scholarworks.lib.csusb.edu/etd/572
This Thesis is brought to you for free and open access by the Office of aduateGr Studies at CSUSB ScholarWorks. It has been accepted for inclusion in Electronic Theses, Projects, and Dissertations by an authorized administrator of CSUSB ScholarWorks. For more information, please contact [email protected]. The Evolution of Cryptology
A Thesis
Presented to the
Faculty of
California State University,
San Bernardino
In Partial Fulfillment
of the Requirements for the Degree
Master of Arts
in
Mathematics
by
Gwendolyn Rae Souza
June 2016 The Evolution of Cryptology
A Thesis
Presented to the
Faculty of
California State University,
San Bernardino
by
Gwendolyn Rae Souza
June 2016
Approved by:
Shawnee McMurran, Committee Chair Date
Jeremy Aikin, Committee Member
Corrine Johnson, Committee Member
Charles Stanton, Chair, Corey Dunn Department of Mathematics Graduate Coordinator, Department of Mathematics iii
Abstract
We live in an age when our most private information is becoming exceedingly difficult to keep private. Cryptology allows for the creation of encryptive barriers that protect this information. Though the information is protected, it is not entirely inacces- sible. A recipient may be able to access the information by decoding the message. This possible threat has encouraged cryptologists to evolve and complicate their encrypting methods so that future information can remain safe and become more difficult to decode. There are various methods of encryption that demonstrate how cryptology continues to evolve through time. These methods revolve around different areas of mathematics such as arithmetic, number theory, and probability. Another concern that has brought cryp- tology into everyday use and necessity is user authentication. How does one or a machine know that a user is who they say they are? Living in the age where most of our infor- mation is sent and accepted through computers, it is crucial that our information is kept safe, and in the appropriate care. iv
Acknowledgements
I had started this project around the same time I was reading Stieg Larsson’s The Girl With The Dragon Tattoo. A lot of my inspiration to research this topic came from the character Lisbeth Salander, a young woman who is a notorious computer hacker who will stop at nothing to find out the truth. I would like to thank my mom for not only helping me take care of my daughter while I worked on my project, but for always believing I could do this. The third woman I owe many thanks to is Shawnee McMurran. Thank you for helping me make this paper the best it can be, and for taking the time to be my advisor. You have been an inspiration to me since I first took your class in 2013. Thanks to Dr. Aikin for looking over everything and making sure all of the sections flowed nicely. I also owe a huge thank you to Dr. Johnson for her meticulous editing and for giving me so many great suggestions. Lastly, I would like to acknowledge the biggest contributor to my interest in mathematics, my dad: problem solver extraodinaire, and the person who showed me my first mathematical magic trick ever. My love for numbers and puzzles would not exist without you. v
Table of Contents
Abstract iii
Acknowledgements iv
List of Tables vi
List of Figures vii
1 Introduction 1
2 The First Ciphers 4 2.1 Monoalphabetic Ciphers: The Caesar and Affine Ciphers ...... 4 2.2 Decryption Using Frequency Analysis ...... 6
3 Polyalphabetic Ciphers and The Vigen`ereCipher 8 3.1 The Vigen`ereCipher ...... 8 3.2 The Kasiski Test ...... 10 3.3 Index of Coincidence and The Friedman Test ...... 13 3.4 Other Classical Ciphers ...... 17
4 Cryptology and Computers 21 4.1 Shift Registers and Pseudorandom Sequences ...... 21 4.2 User Authentication ...... 29 4.3 Public-Key Cryptosystems ...... 31 4.4 The RSA Algorithm ...... 34 4.5 An Insecure Public-key Cryptosystem: The Knapsack Cryptosystem . . . 40
5 Conclusion 47
Bibliography 49 vi
List of Tables
2.1 Frequency Distribution of all 26 Letters in English Text ...... 6
4.1 Some ASCII Binary Representations ...... 22 vii
List of Figures
3.1 The Vigen`ereSquare ...... 9
4.1 A Shrinking Generator Process ...... 28 1
Chapter 1
Introduction
In this paper we will analyze a few breakthroughs of cryptology, beginning from the BC era to present day. The earliest development of cryptology stemmed from the need to keep information safe while being sent between two recipients. This is still the primary reason why cryptology is used today. The general process revolves around two actions: disguising the original message a user wants to send and decoding a disguised message. These actions are known as encryption and decryption, or deciphering. The disguised form of the message is known as the cipher text. Throughout my project, I will introduce and examine the methods of ciphering that I found most compelling. While introducing each cipher, I will additionally provide historical information and significance. Chapter 2 will introduce some of the first forms of encryption, starting with simple substitution methods. These methods evolved with time as users needed more secure ways of communicating. (Fortunately, as cryptology advances, cryptanalysts do as well.) Affine ciphers provided a little more security by enciphering messages under more than one operation. Near the early 1400s, it was time to develop a new means of ciphering, where cipher texts were not as easy to decipher. Thus, the famous Vigen`erecipher was born and was considered a breakthrough as one of the first polyalphabetic ciphers. Its complexity is derived from being a composition of multiple substitution ciphers. The Vigen`erecipher will be a prominent focal point in my thesis project. I will explain in depth two methods commonly used for decrypting a text under this cipher, the Kasiski and Friedman Tests. These two tests inspired all cryptologists alike by showing that different areas of mathematics can be used in cryptology, not just simple algebra. 2
(However, one’s underlying knowledge in number theory and probability help tremen- dously.) The Kasiski Test deals with recurring strings of letters found in the cipher text, giving us clues as to what the length of the keyword might be. (The keyword is a tool for encryption.) The Friedman Test not only provides an estimate of the length of the key- word, but also proves whether or not our cipher text is monoalphabetic or polyalphabetic, which should always be the first question at hand. When the earliest computers were created, it was much easier to use a binary numeral system, a system in which information is solely expressed in combinations of 0s and 1s, where each digit is referred to as a bit. The use of computers and this system opened many doors for cryptologists. Creating ciphers and algorithms was just a click away, and recreating it on paper was relatively easy once one knew what to do. Thus, another method of enciphering was created using shift registers. Shift registers are able to produce pseudorandom binary sequences, which are very useful in cryptography and provide extra camouflage/security for a message. They work like miniature machines, where the shift register, i.e., the machine, requires the most work to build. The section on shift registers will provide a detailed example, with outlined steps on how our machine is constructed and how it processes. In Section 4.2 we will see that cryptology can be used for other purposes besides encrypting messages or information. Continuing on the evolution of cryptography with computers, one section will define and focus on message authentication codes (MAC) and digital signatures. MAC and digital signatures deal with verification that a user is in fact who they proclaim to be, a topic that we will revisit in the final sections of the thesis. Public-Key Cryptosystems will also be introduced, as they are crucial to understand before comprehending the final sections. This section, will prepare the reader for the next which will cover significant breakthroughs in cryptology that are still used today. Section 4.4 will introduce the reader to concepts and methods necessary to comprehend before approaching the most famous algorithm pertaining to cryptology, the RSA algorithm. We will discuss the derivation and significance of the algorithm, as well as the founders and other protocols that the founders contributed to cryptology. Proofs will be provided and outlined, and examples will be included that show that the algorithm does indeed work and can be highly effective and secure if applied appropriately. One of the first commercial applications of the RSA algorithm that will be examined is key exchange, 3 which consists of deriving a mutual key between sender and receiver by using values that they initially choose. We will primarily focus on the Diffie-Hellman key exchange scheme. The last cryptosystem that will be discussed is the knapsack cryptosystem. In Section 4.5 we will present one method of attack on this cryptosystem, showing that any information being sent through this system is not secure. We will conclude the project with a chapter that gives an overview of current cryptology and how crucial it is that the world of cryptology continues to evolve. 4
Chapter 2
The First Ciphers
2.1 Monoalphabetic Ciphers: The Caesar and Affine Ci- phers
Monoalphabetic ciphers are defined by each letter in the alphabet being mapped to exactly one other letter during the encryption process. These types of ciphers are sometimes called substitution ciphers. One of the most famous and earliest monoalpha- betic ciphers was the Caesar cipher, which consists of shifting the letters in the alphabet. Thus, there are a total of 26 possible additive shifts [Beu94].
Example 2.1.1. Encrypting a message with an additive shift of 3: By letting the letters of the English alphabet correspond to the numbers 0, 1, ..., 25, we get A = 0, B = 1, ..., Z = 25. To encrypt our given cleartext/message with a shift of 3, one would add 3 to each number that represents the letters in the given cleartext. For example, since A = 0, encrypting A from cleartext to ciphertext would give D, since D= 3 and 0+3 = 3. Using the same additive shift of 3, C, which is 2 in cleartext, would translate to F in ciphertext, since F corresponds to 5.
A cipher that is a bit more complex, requiring multiplication as well as addition, is the affine cipher. An affine cipher can be represented in the form,
ax + b (mod 26), where a is the multiplicative shift, b is the additive shift, and x is the number representing the letter in cleartext that one is encrypting. There are only 12 possible multiplicative 5 shifts: a must be an element of the set {1, 3, 5, 7, 9, 11, 15, 17, 19, 21, 23, 25}. This set consists of numbers inclusively between 1 and 26 that are relatively prime to 26. One is able to perform multiplicative shifts from the above set because each element of this set has a multiplicative inverse a−1, considered modulo 26, such that aa−1 ≡ 1 (mod 26). This is a necessary condition for a one-to-one mapping, or in this case, for the affine cipher to be monoalphabetic. This condition of having a multiplicative inverse also allows us to decipher the text. The following table shows the set of positive integers smaller than and relatively prime to 26, with their inverses.
possible a’s 1 3 5 7 9 11 15 17 19 21 23 25 a−1 1 9 21 15 3 19 7 23 11 5 17 25
It is also worth noting that affine ciphers provide significantly more security than Caesar ciphers alone. Since there are a total of 26 additive ciphers, and 12 possible multiplicative ciphers, there are a total of 26 · 12 = 312 possible affine ciphers.
In the following formula, and throughout this paper, we will frequently have need to refer to residues in a modular congruence class. In such cases we will use the notation x = y mod n to mean x equals the least nonnegative residue y (mod n).
To decrypt a given ciphertext with a known [a, b] is a matter of substituting a−1 and b into the formula:
p = a−1(c − b) mod 26, where c is the number corresponding to a letter in ciphertext, and p is the desired number that corresponds to a letter in cleartext.
Example 2.1.2. Ciphertext: HGGVUB and [a, b]= [5,7].
HGGVUB 7 6 6 21 20 1 a−1(c − b) mod 26 = p 21(7-7) = 0 21(6 − 7) mod 26 = 5 5 21(21 − 7) mod 26 = 8 21(20 − 7) mod 26 = 13 21(1 − 7) mod 26 = 4 Cleartext A F F I N E
If [a, b] is not known, then frequency analysis may be used to assist us in de- crypting a given ciphertext. 6
2.2 Decryption Using Frequency Analysis
Frequency analysis in cryptology is the process of analyzing the frequencies in the letters in ciphertext and making the assumption that the letter that appears most frequently maps from the letter that has the highest frequency in an English text: the letter E. (See Table 2.1 [Beu94].)
letter relative frequency(%) letter relative frequency(%) a 8.167 n 6.749 b 1.492 o 7.507 c 2.782 p 1.929 d 4.253 q 0.095 e 12.702 r 5.987 f 2.228 s 6.327 g 2.015 t 9.056 h 6.094 u 2.758 i 6.966 v 0.978 j 0.153 w 2.360 k 0.772 x 0.150 l 4.025 y 1.974 m 2.406 z 0.074
Table 2.1: Frequency Distribution of all 26 Letters in English Text
One can conclude that this approach is more accurate if the ciphertext is rea- sonably large so that it makes sense to consider frequency among letters. Assuming that the most frequent letter in the ciphertext maps from E, if the ciphertext was encrypted using a Caesar cipher, we can count the additive shift used in the cipher from E to the letter we believe maps from E. Thus, we can reverse this shift on all the other letters in the ciphertext to see if one has a coherent message. If an affine cipher was used, one can still use frequency analysis to determine both shifts, a and b.
Example 2.2.1.
Given a text encrypted with an affine cipher: 7
JZQ DQN IQY UXQRC JZQ CZQN ERN JZQ HLMQ IQY UXQRC JZQ HEKI NUUD
By frequency analysis, the letter that appears the most in our ciphertext is Q. We may assume that Q maps from E. Since Q = 16 and E = 4, to find our a and b, we set up the mapping equation from E to Q:
4a + b (mod 26) ≡ 16.
Now, as stated earlier, a must be an element of the set {1, 3, 5, 7, 9, 11, 15, 17, 19, 21, 23, 25}. So let us first assume that a = 3:
4(3) + b = 16 12 + b = 16 b = 4.
This gives us a plausible combination of [a, b] to test. Putting [3,4] and each letter in our ciphertext into p = a−1(c − b), we get a coherent message:
THE RED KEY OPENS THE SHED AND THE BLUE KEY OPENS THE BACK DOOR
If a = 3 had not worked, the next reasonable choice would be to test a = 5, and so on. If a = 1, then the cipher would have been encrypted by a simple substitution cipher. This method of trying all possibilities is called a brute force attack. A brute force attack of trying all possible a’s would take only about 10 attempts.
Frequency analysis will continue to be a valuable tool for decryption in the next chapter, as we discuss one of the first known polyalphabetic ciphers, the Vigen`ereCipher. 8
Chapter 3
Polyalphabetic Ciphers and The Vigen`ereCipher
“[Enciphered writing] must not rely upon secrecy, and it must be able to fall into the enemy’s hands without disadvantage.” -Auguste Kerckhoff, La Cryptographie Militaire, 1883.
3.1 The Vigen`ereCipher
The Vigen`erecipher was a significant breakthrough for early 16th century cryp- tology. Prior to this discovery, cryptologists were having difficulty keeping messages secure and undecodable. Since frequency analysis had become a popular code breaking strategy, monoalphabetic ciphers were becoming less of an issue for all cryptanalysts alike, and a new method of ciphering was sought. This led several cryptography enthusiasts to attempt to produce a new cipher where it was possible for two different letters in cleartext to become the same letter in ciphertext through the encryption process. From their efforts, French diplomat Blaise de Vigen`erewas inspired to devise the new cipher fitting that which the enthusiasts had in mind [Beu94]. Thus the cipher was coined the “Vigen`erecipher,” and the table that breaks down the cipher that can be used as a tool for encryption and decryption was coined the “Vigen`eresquare.” (See Figure 3.1.) 9
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z A A B C D E F G H I J K L M N O P Q R S T U V W X Y Z B B C D E F G H I J K L M N O P Q R S T U V W X Y Z A C C D E F G H I J K L M N O P Q R S T U V W X Y Z A B D D E F G H I J K L M N O P Q R S T U V W X Y Z A B C E E F G H I J K L M N O P Q R S T U V W X Y Z A B C D F F G H I J K L M N O P Q R S T U V W X Y Z A B C D E G G H I J K L M N O P Q R S T U V W X Y Z A B C D E F H H I J K L M N O P Q R S T U V W X Y Z A B C D E F G I I J K L M N O P Q R S T U V W X Y Z A B C D E F G H J J K L M N O P Q R S T U V W X Y Z A B C D E F G H I K K L M N O P Q R S T U V W X Y Z A B C D E F G H I J L L M N O P Q R S T U V W X Y Z A B C D E F G H I J K M M N O P Q R S T U V W X Y Z A B C D E F G H I J K L N N O P Q R S T U V W X Y Z A B C D E F G H I J K L M O O P Q R S T U V W X Y Z A B C D E F G H I J K L M N P P Q R S T U V W X Y Z A B C D E F G H I J K L M N O Q Q R S T U V W X Y Z A B C D E F G H I J K L M N O P R R S T U V W X Y Z A B C D E F G H I J K L M N O P Q S S T U V W X Y Z A B C D E F G H I J K L M N O P Q R T T U V W X Y Z A B C D E F G H I J K L M N O P Q R S U U V W X Y Z A B C D E F G H I J K L M N O P Q R S T V V W X Y Z A B C D E F G H I J K L M N O P Q R S T U W W X Y Z A B C D E F G H I J K L M N O P Q R S T U V X X Y Z A B C D E F G H I J K L M N O P Q R S T U V W Y Y Z A B C D E F G H I J K L M N O P Q R S T U V W X Z Z A B C D E F G H I J K L M N O P Q R S T U V W X Y
Figure 3.1: The Vigen`ereSquare
The square in Figure 3.1 [Beu94] displays all of the possible monoalphabetic ciphers/shifts. In comparison with Caesar ciphers, messages encrypted with Vigen`ere ciphers are exceedingly more secure. Within a Caesar cipher, for example, if the letter A is encrypted as the letter Y, then it is impossible for another letter to be encrypted as Y. This is not the case with Vigen`ereciphers, or any other polyalphabetic cipher. In Vigen`ereciphers, each letter in the ciphertext has 26 possible outcomes when decrypted, depending on the keyword that is used during encryption. As with affine ciphers, knowing the different shifts used from each letter in the keyword helps when decrypting a Vigen`ere cipher, and is essential when encrypting each letter in a message.
Example 3.1.1.
Let our cleartext be: THERESTROUBLECLOSEBY and our keyword be: 10
LEAVE
Let n denote the length of the keyword. In this case n = 5. We will then proceed with encryption by writing out the keyword directly under the first 5 letters of our cleartext, and continue writing it underneath until the string of our repeated keyword is equivalent in length to our cleartext:
T H E R E S T R O U B L E C L O S E B Y L E A V E L E A V E L E A V E L E A V E
In order to obtain the ciphertext, the next step is to convert all letters to their corre- sponding assigned number, 0 to 25, as visible in the previous chapter:
19 7 4 17 4 18 19 17 14 20 1 11 4 2 11 14 18 4 1 24 11 4 0 21 4 11 4 0 21 4 11 4 0 21 4 11 4 0 21 4
The following step is to add the numbers in each column and compute each sum (mod 26). Doing this for each column will give the ciphertext in numerical form. Converting these sums to letters will give us the desired final ciphertext. For example, the first entry would be (19 + 11) mod 26 = 4, so the first letter would be E. Thus the complete ciphertext would be:
ELEMIDXRJYMPEXPZWEWC
If the keyword is known to both parties, then decrypting the ciphertext is only a matter of reversing the number of shifts used for each letter. If the keyword is not known, then the first task at hand would be to find the length of the keyword.
3.2 The Kasiski Test
One method of possibly finding the length of the keyword is the Kasiski test. The Kasiski test was first published by Prussian colonel and cryptologist Friedrich Wilhelm Kasiski in 1863. However, it is known that the earliest discovery of this particular method is credited to mathematician and inventor Charles Babbage. Babbage first thought of 11 the method of finding regularities and patterns in ciphertext while studying the cipher in the mid-1840s. His first successful attack using this method was made in the 1850s during the Crimean War, however, the method was kept as a military secret by British Intelligence. Despite not being publicly credited for this particular discovery, Babbage continued to invent and is most famous for his work in designing mechanical calculators [Sta10]. The Kasiski test consists of 3 steps: finding repeated strings of letters in a ciphertext, finding the distances between these sequences, and calculating the common divisors of the distances between all sequences [Beu94]. One of these common divisors is suggested to be the length of the keyword. The best way to see why this method works is through a particularly (short) example where the method might be of use.
Example 3.2.1.
Let our cleartext be: THECATSEESTHECATNEXTDOOR and our keyword be: WATER
In this example, the string THECAT is repeated twice in the cleartext. When strings of letters or words are repeated in the cleatext, the Kasiski test is more effective since the possibility of seeing a repeated sequence of letters in the ciphertext increases. We can also presume that the larger the ciphertext, the greater the possibility that repeated sequences are likely to occur. This is because there are words in English that occur regularly in text, e.g., THE, AND and THAT [Beu94]. Encrypting our cleartext from Example 3.1.1, we get:
T H E C A T S E E S T H E C A T N E X T D O O R W A T E R W A T E R W A T E R W A T E R W A T E
19 7 4 2 0 19 18 4 4 18 19 7 4 2 0 19 13 4 23 19 3 14 14 17 22 0 19 4 17 22 0 19 4 17 22 0 19 4 17 22 0 19 4 17 22 0 19 4
Computing the sums of each column, modulo 26, and converting our numbers back to letters, we get our ciphertext: 12
PHXGRPSXIJPHXGRPNXBKZOHV
We see that the only repeated sequence is PHXGRP, and the distance, or the number of letters, between each set is 10, with prime factors of 2 and 5. So, by the Kasiski test, the length of the keyword is suggested to be 2, 5 or 10. Note that 5 is the length of the keyword in our example.
After finding the length n of the keyword, we can complete the deciphering process by analyzing the frequencies of letters in our ciphertext. If we separate the ciphertext in groups of n, we know that the first letter in each group of n must come from the same column in the Vigen`eresquare because the same shift was used for those letters.
Using the example from above: PHXGR PSXIJ PHXGR PNXBK ZOHV
We begin by applying frequency analysis for the first letters of every group. Therefore, we consider the set: {P, P, P, P, Z}. We first assume the letter that appears most frequently corresponds to E in the original message, revealing what shift might have been used and thus giving us the first letter of the keyword. So we assume that P corresponds to the letter E in cleartext. We subtract the numbers that represent P and E and reduce modulo 26: (15 − 4) mod 26 = 11. Since 11 corresponds to the letter L, we assume that the first letter of the keyword is L. To continue this process, frequency analysis would be applied to the second letter of each group, then the third, until we reach the nth letter of each group. If at any point the deciphered message clearly becomes incomprehensible, we try to determine which letter of the key may have been derived incorrectly, and start over from that location using the letter with the next highest relative frequency from Table 2.1. In this example, we would start over from the beginning, and assume that P corresponds to the letter T in cleartext. Subtracting P and T , we get (15 − 19) mod 26 = 22. Thus, our next assumption would be that the first letter of the keyword is W , which, in this case, it is! We continue by assuming that the most frequent letter in the second set of letters corresponds to E in plaintext, and repeat the process of trying to find a coherent keyword.
This process becomes more reliable and accurate as the ciphertext grows larger and the keyword stays relatively short in length, perhaps of length 8 or less. 13
3.3 Index of Coincidence and The Friedman Test
One significant contributor to the US government’s cryptographic efforts was William Friedman. During World War I, Friedman worked with the US government and led courses that consisted of training prospective cryptanalysts. His cryptological methods became classics in cryptology history, making Friedman an important figure in the early NSA (National Security Agency) [Lev01]. In order to proceed to the next method of solving the Vigen`erecipher, the Friedman test, we must first introduce the Index of Coincidence (IOC), a technique also developed by William F. Friedman. The IOC represents the probability that, after selecting a pair of letters from a text, the two letters are the same.
Derivation of the Index of Coincidence [Beu94]:
Consider an arbitrary sequence of letters of length n, n ≥ 2. Let n1 denote the number of a’s, n2 the number of b’s, ... , n26 the number of z’s. We are interested in how often a randomly selected pair of letters would both be a.
There are n1 ways to choose the first a. There are n1 − 1 possibilities for choosing the second a. Since we neglect the order of the letters in the pair, the number of pairs of a’s equals:
n (n − 1) 1 1 . 2
Therefore, the total number of pairs that consist of equal letters (that is both a, or both b, ..., or both z) equals:
26 n1(n1 − 1) n2(n2 − 1) n26(n26 − 1) X ni(ni − 1) + + ··· + = . (3.1) 2 2 2 2 i=1 We compute the probability of choosing a pair of equal letters at random by dividing Formula (3.1) by the number of all possible cases:
26 26 P P ni(ni − 1)/2 ni(ni − 1) i=1 = i=1 n(n − 1)/2 n(n − 1) 14
Thus we have, 26 P ni(ni − 1) I = i=1 , (3.2) n(n − 1) where 0 ≤ I ≤ 1.
Another representation of the IOC is:
26 P 2 p1 · p1 + p2 · p2 + ··· + p26 · p26 = pi , i=1 where pi · pi is the probability that two chosen letters in the text will be the same letter.
For the English language, the IOC is:
26 P 2 pi ≈ 0.065. i=1 By this calculation, one can determine if a monoalphabetic or polyalphabetic cipher was used by comparing the IOC to 0.065. If a ciphertext has an IOC close to 0.065, it is likely that the text was encrypted with a monoalphabetic cipher. On the other hand, if the text was encrypted using a polyalphabetic cipher, each letter theoretically occurs 1 with the same probability of . In this case the IOC would be estimated at: 26
26 26 X X 1 1 1 p2 = = 26 · = ≈ 0.038. (3.3) i 262 262 26 i=1 i=1 The Index of Coincidence can also assist us in finding the length of our keyword and is a shorter process than the Kasiski test. The formula to find the length of a keyword is
0.027n l = , (3.4) (n − 1)I − 0.038n + 0.065 where l is the length of the keyword, I is the calculated IOC of the ciphertext, and n is the number of letters in our ciphertext. Before we derive Formula (3.4), we must first visualize what exactly the length of a keyword does: it divides our ciphertext into groups of length l. If l is known, we can 15 organize our ciphertext into l columns, such that the first letter in every group of length l belongs in one column, the second letter of each group belongs in the second column, and so on. If we organize the ciphertext from Example 3.1.1 in this manner, knowing that the length of the keyword is 5, the ciphertext would look like this:
P H X G R P S X I J P H X G R P N X B K Z O H V
Note that there are the same number of rows as groups of l’s; this will always be true. Since we are now able to visualize a ciphertext divided into columns, we may begin the steps to derive the formula used to find the length of the keyword.
Steps for the derivation of Formula (3.4) [Beu94]:
We start by deriving how many pairs of letters must be in each column. Let n denote the number of letters in the ciphertext. Then each column has n/l letters. There are exactly n possibilities for a randomly chosen letter. In the column the chosen letter is located, there are exactly n/l − 1 other letters. Thus there are n/l − 1 ways to choose the second letter from the same column, and the number of pairs of letters that are in the same column equals n − 1 n(n − l) n · l = . (3.5) 2 2l
Since there are exactly n − n/l letters outside the chosen letter’s column, the number of pairs of letters that are in different columns equals
n n − n2(l − 1) n · l = . (3.6) 2 2l
Now, recall that the English language has an IOC of 0.065 and a completely jumbled text, where every letter occurs with the same probability, has an IOC of approximately 0.038 (by Formula 3.3). Combining these IOCs with our observations, we see that the expected number, A, of pairs of equal letters is 16
n(n − l) n2(l − 1) A = · 0.065 + · 0.038. (3.7) 2l 2l
Therefore, the probability that a pair of letters chosen at random are the same letter equals
A n − l n(l − 1) = · 0.065 + · 0.038 n(n − 1)/2 l(n − 1) l(n − 1)
1 = · [0.027n + l(0.038n − 0.065)]. l(n − 1)
Since the IOC approximately equals the above probability, we obtain
0.027n 0.038n − 0.065 I ≈ + . (3.8) l(n − 1) n − 1
Solving for our variable l, we finally obtain the desired formula by Friedman,
0.027n l = . (3.9) (n − 1)I − 0.038n + 0.065
To apply Formula (3.9), we first determine the length n of the text and the frequencies, n1, . . . , n26, among the letters. We next substitute these values into Formula (3.2) to obtain I. Lastly, we substitute the values I and n into Formula (3.9) to obtain an estimate of the length of the keyword.
Example 3.3.1. In the following table [Bar02], we have the counts ni of the various letters in a given ciphertext:
ABCDEFGHIJKLM 3 2 7 2 4 1 8 9 10 5 5 5 11
NOPQRSTUVWXYZ 5 5 5 3 8 8 12 2 8 9 13 1 1 17
The total number of letters is n = 152, and
26 P ni(ni − 1) = 3 · 2 + 2 · 1 + 7 · 6 + ··· + 13 · 12 + 1 · 0 + 1 · 0 = 1048. i=1
Thus the IOC is
26 P n (n − 1) i i 1048 I = i=1 = ≈ 0.0457. n(n − 1) 152 · 151
Since I is not estimated at about 0.065, it is likely that we were given a text enciphered with a polyalphabetic cipher. By putting n and I into the formula for deriving l (3.9), we get 0.027(152) 4.104 l = = ≈ 3.45, (152 − 1)0.0457 − 0.038(152) + 0.065 1.1897 suggesting that the length of the keyword must be the next whole number, 4. After obtaining the length of the keyword, one can apply the same steps used in Section 3.1 to find the shifts/letters of our keyword.
3.4 Other Classical Ciphers
A cipher that inspired many other cryptological ideas is the Hill Cipher, some- times known as Hill’s System. This cipher was invented by mathematician and educator Lester S. Hill in 1929. Hill served as a lieutenant in the US Navy in World War I, and continued to invest time and send suggestions regarding cryptological techniques to naval communications until retiring in 1960 [Kah67]. The Hill Cipher was one of the first known polygraphic ciphers, a cipher in which it is possible to operate on large groups of symbols at once. Like the Vigen`ereCipher, the Hill Cipher requires a key for both encryption and decryption. Unlike the Vigen`ere cipher, this key is in the form of a matrix. The key for this cipher will be in the form of a 2 × 2 matrix, consisting of the elements a, b, c, and d, such that these elements are in Z26. We then allow pk and pk+1 to be two characters in the kth and (k + 1)st positions of our plaintext message.
Two ciphertext characters in corresponding positions, ck and ck+1, will be enciphered 18 by multiplying our key matrix by our two plaintext letters as a 2 × 1 matrix as shown [Lew00]: For k an integer, ck a b pk = mod 26. ck+1 c d pk+1
To decrypt a given ciphertext while knowing the key matrix, we simply put each pair of ciphertext characters into the equation
−1 pi a b ci = mod 26. pi+1 c d ci+1
A classical cipher that one might glimpse in the morning paper, widely known as an anagram, is the transposition cipher (or permutation cipher). A transposition cipher consists of rearranging the positions of the elements of the message without changing the identities of the elements [Mao04]. The German ADFGVX cipher (a cipher used during World War I), is a transposition cipher that consists of substitutions as well. Though it was considered to be difficult to crack at the time, the cipher was broken by French cryptanalyst Georges Painvin [Sch94]. We present one example of a transposition cipher as follows [Mao04]:
Example 3.4.1. Consider that the elements of a plaintext message P are numbers in
Z26. Let b be a fixed positive integer representing the size/length of a message, let C b be our ciphertext so that P = C = (Z26) , and let k (our key) be all permutations, i.e., rearrangements of (1, 2, . . . , b). Let π ∈ k, for a permutation π, such that π =
(π(1), π(2), ..., π(b)). For a plaintext string (x1, x2, ...xb) ∈ P , the encryption algorithm of this transposition cipher is