Evolving Keys for Periodic Polyalphabetic Ciphers
Total Page:16
File Type:pdf, Size:1020Kb
Evolving Keys for Periodic Polyalphabetic Ciphers Ralph Morelli and Ralph Walde Computer Science Department Trinity College Hartford, CT 06106 [email protected] Abstract recover increasingly longer portions of the key. As described more fully in the discussion, our results compare favorably A genetic algorithm is used to find the keys of Type to other approaches reported in the literature. II periodic polyalphabetic ciphers with mixed primary alphabets. Because of the difficulty of the ciphertext only cryptanalysis for Type II ciphers, a multi-phased Polyalphabetic Ciphers search strategy is used, each phase of which recovers a Following (King 1994), a polyalphabetic cipher can be de- bigger portion of the key. fined as follows. Given d cipher alphabets C1 ...Cd, let gi : A → Ci be a mapping from the plaintext alphabet A to th Introduction the i cipher alphabet Ci(1 ≤ i ≤ d). A plaintext message M = m . m m . m ... is enciphered by repeat- A periodic polyalphabetic cipher with period d encrypts a 1 d d+1 2d ing the sequence of mappings g1, . , gd every d characters, plaintext message into ciphertext by replacing each plaintext giving letter with a letter from one of d substitution alphabets. The d alphabets are typically selected from a sequence of related alphabets that is repeated as often as necessary to encrypt E(M) = g1(m1) . gd(md)g1(md+1) . gd(m2d) ... (1) the message. Although in the most general case the cipher alphabets are In polyalphabetic encryption a message is broken into d completely unrelated, in many classical polyalphabetic ci- columns, each encrypted by a different alphabet. The longer phers the alternative alphabets C , ..., C are shifted ver- the period, d—the more alphabets used–the more difficult it 2 d sions of the mixed primary cipher alphabet, C . Thus, in is to unscramble the message. 1 addition to the mixed primary alphabet, C , a second key, Ciphertext only cryptanalysis is the process of recovering 1 the shift keyword, is used to specify both the period and the the plaintext directly from the ciphertext message without shifts used to generate the other cipher alphabets, with ’a’ access to the message key. The search space for this type of representing a shift of 0, ’b’ a shift of 1, and so on.1 problem is much too large for brute force approaches. For In shifted polyalphabetic ciphers, the alternative alphabets a polyalphabetic cipher with a 26-letter alphabet and period are derived by performing a shift modulo 26 on each letter d the key space contains 26! × 26d−1 possible keys. That of the mixed primary alphabet. In the following example, gives approximately 1032 keys for a period of length 5 and the second alphabet is derived form the first by shifting each 1053 keys for a period of length 20. of its letters by one modulo 26: Several studies have used the genetic algorithm (GA) suc- cessfully in ciphertext only cryptanalysis (Clark and Daw- zerosumgabcdfhijklnpqtvwxy son 1997, Matthews 1993, Morelli, Walde and Servos 2004, afsptvnhbcdegijklmoqruwxyz Morelli and Walde 2003, Spillman 1993 and Spillman, In this case the primary alphabet is a simple substitution al- Janssen and Nelson 1993). Delman (2004) provides a recent phabet generated from the primary keyword zerosumgame. study that compares various efforts to apply GAs cryptogra- Thus, a polyalphabetic cipher with shifted alphabets en- phy. GA approaches evolve the message key(s) by repeat- cryption can be represented as the composition, or product, edly decrypting the message and measuring how close the of two operations: a substitution step, to create the mixed resulting text is to plaintext. primary alphabet, and a shift step, to derive a secondary For polyalphabetic ciphers, the search is made more diffi- alphabets. Substitution involves replacing a plaintext letter cult by the fact that each letter in two-letter (bigram), three- letter (trigram), or four-letter (tetragram) sequences is en- 1This type of cipher, which is sometimes called a double-key crypted from a different alphabet. In this study, we show cipher, was first introduced in the 15th century by Alberti and ex- that a collection of GAs working in parallel can be used to tended with contributions by Trithemius, Belaso, Porta, and Vi- genere. Such ciphers have been well studied, both from an histor- Copyright c 2006, American Association for Artificial Intelli- ical perspective (Kahn 1967, Bauer 1997) and as the basis behind gence (www.aaai.org). All rights reserved. secure, contemporary ciphers (Rubin 1995). 445 with the corresponding letter from the mixed primary alpha- E C:rosumgabcdfhijklnpqtvwxyze bet. The shift step involves shifting a letter by an arbitrary Y. amount (modulo 26). S:npqtvwxyzerosumgabcdfhijkl The order of these operations is significant and leads to Y:xyzerosumgabcdfhijklnpqtvw two different ciphers known as Type I and Type II (Gaines M:fhijklnpqtvwxyzerosumgabcd 1939). In Type I, where substitution is performed first, the . Z:yzerosumgabcdfhijklnpqtvwx encryption and decryption operations can be represented as: To encrypt “the” using the shift keyword SYMBOL, we would select the corresponding letters from the S, Y, and M cj = gi(mj) = (C1(mj) + ki)(mod26) (2) rows, giving “duk”. This is equivalent to taking a letter, say −1 −1 ’t’, shifting it by, say s, giving ’l’, and replacing the result mj = gi (cj) = C1 ((cj + 26 − ki)(mod26)) (3) with the corresponding letter from the C1 alphabet, giving where C1(mj) represents substitution of the jth plaintext ’d’. character, mj, from the primary cipher alphabet C1 and ki represents the shift of the resulting ciphertext letter. In de- Index of Coincidence cryption the inverse operations are applied. Type II ciphers are much more difficult to analyze than Type In Type II the shift step is performed before substitution, I. To show this it is necessary to describe the Index of Coin- leading to the following encryption and decryption opera- cidence. The IC is a measure used to distinguish text that has tions: been encrypted with a polyalphabetic cipher from plaintext or from text that was encrypted with a single alphabet. First c = g (m ) = C ((m + k )(mod26)) (4) described by (Friedman 1920), the IC is the probability that j i j 1 j i two symbols chosen at random from a text will be equiva- −1 −1 mj = gi (cj) = (C1 (cj) + 26 − ki)(mod26) (5) lent. For a text of length N written in the standard alphabet, A...Z, where f represents character frequencies, the IC is A familiar way to represent a set of shifted alphabets is to use expressed as: a Vigenere tableau, in which the mixed primary alphabet, C1, is placed in the first row and alternative alphabets are placed in successive rows. The following table corresponds fi(fi−1) IC = Pi=A,Z (6) to a Type I cipher: N(N−1). Plaintext As Friedman showed, the IC value for English plaintext and ABCDEFGHIJKLMNOPQRSTUVWXYZ for a simple substitution cipher will be around 0.066. Text A:zerosumgabcdfhijklnpqtvwxy encrypted by a polyalphabetic substitution cipher would K B:afsptvnhbcdegijklmoqruwxyz have an IC value less than 0.066 depending on the number E C:bgtquwoicdefhjklmnprsvxyza of alphabets used. Polyalphabetics with periods greater than Y . Ciphertext 10 would have IC values that approach 0.038. Z:ydqnrtlfzabceghijkmopsuvwx The rows of the table represent the shifted alphabets. For Finding the Period example, the shift keyword SYMBOL would select the fol- Regardless of whether Type I or Type II encryption was used lowing alphabets from the above table: in creating a ciphertext, the IC can be used to find the ci- Plaintext pher’s period by performing an exhaustive search on each ABCDEFGHIJKLMNOPQRSTUVWXYZ possible period from 1 to some maximum length. For each S:rwjgkmeystuvxzabcdfhilnopq possible period, di, the plaintext is broken into di columns Y:xcpmqskeyzabdfghijlnortuvw and the IC is computed for each column. If di is the cor- K M:lqdaegysmnoprtuvwxzbcfhijk rect period, the average IC of the di columns will be close to E B:afsptvnhbcdegijklmoqruwxyz 0.066. If none of the values get close enough to 0.066, we Y O:nsfcgiauopqrtvwxyzbdehjklm take the di associated with the highest IC value. While this L:kpczdfxrlmnoqstuvwyabeghij algorithm is not guaranteed to find the correct period, we To encrypt the word “the” using the above table, we re- found it to be successful in nearly 100 percent of the cases. place ’t’ with its corresponding letter from the S alphabet. This is equivalent to replacing ’t’ by ’p’ from the mixed pri- Type I versus Type II Cryptanalysis mary alphabet and then shifting ’p’ by s places (modulo 26). Once the cipher’s period is known, cryptanalysis of a Type In either case, ’t’ is encrypted as ’h’. The word “the” would I cipher is very straightforward. Consider the following de- be encrypted as “hee.” piction of Type I encryption (substitution then shift): The table corresponding to a Type II cipher would be rep- resented as follows: 1 2 P0.066 ⇒ C0.066 → C0.038 (7) Plaintext ABCDEFGHIJKLMNOPQRSTUVWXYZ When plaintext P, which has an IC of 0.066, is replaced by A:zerosumgabcdfhijklnpqtvwxy substitution (⇒) from the mixed primary alphabet, the re- K B:erosumgabcdfhijklnpqtvwxyz sulting ciphertext, C1, will have an IC value approximately 446 equal to plaintext (IC ≈ 0.066). When the ciphertext Thus, once the correct period is found, we perform GA is then shifted (→), resulting in a new ciphertext, C2, its search on two, then three, then four adjacent columns of ci- IC value will be characteristic of a polyalphabetic cipher phertext. At each phase we generate a partial decryption of (IC ≈ 0.038). the columns by composing the shift and substitution steps Given this characterization of a Type I cipher, it is easy to (← + ⇐).