Solving the Running Key Cipher with the Viterbi Algorithm
Total Page:16
File Type:pdf, Size:1020Kb
Cryptologia ISSN: 0161-1194 (Print) 1558-1586 (Online) Journal homepage: http://www.tandfonline.com/loi/ucry20 Solving the Running Key Cipher with the Viterbi Algorithm Alexander Griffing To cite this article: Alexander Griffing (2006) Solving the Running Key Cipher with the Viterbi Algorithm, Cryptologia, 30:4, 361-367, DOI: 10.1080/01611190600789117 To link to this article: https://doi.org/10.1080/01611190600789117 Published online: 22 Nov 2006. Submit your article to this journal Article views: 127 Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=ucry20 Cryptologia, 30:361–367, 2006 Copyright Taylor & Francis Group, LLC ISSN: 0161-1194 print DOI: 10.1080/01611190600789117 Solving the Running Key Cipher with the Viterbi Algorithm ALEXANDER GRIFFING Abstract The Viterbi algorithm is able to reconstruct most of the plaintext from running key ciphertext most of the time using 6-gram letter statistics. Keywords cryptanalysis, running key cipher, vigene`re cipher, viterbi algorithm Introduction Under the simplifying assumption that the running key cipher is a random cipher [11], it should have a unique solution if and only if the redundancy of the message and key is at least 50%. This is because the output of the encryption (ciphertext) is half as long as the input (message plus key). In this paper we use a model of written English that can detect this much redundancy, and we note that the Viterbi algorithm [10, 13] finds the most likely solution to a running key cipher with respect to this model. The algorithm was tested on 100 English (message, key) pairs, and usually the message and key were mostly recovered. This method is shown to be an improvement on an earlier approach [1]. This article is a follow-up to [5], which shows how the Viterbi algorithm can solve running key ciphers in which the spaces are kept in the message and key, and for which the letters of the message and key are combined using bitwise XOR. Running Key Cipher The running key cipher combines a message and a key, each N letters long, to make a ciphertext of length N. During enciphering, each letter of the message is shifted against the corresponding letter of the key, as shown in Figure 1. It is equivalent to a Vigene`re cipher with a period as long as the message. For this article, both the message and the key are assumed to consist of English language text from which all spaces and punctuation have been removed, and for which the remaining 26 letters are considered without regard to case. This paper explains how to estimate the message and key from the ciphertext. Summary of a Previous Automated Attack Bauer and Tate [1] enciphered the first 1000 letters of Dracula using the first 1000 letters of The Cask of Amontillado as the running key [12, 9]. The resulting ciphertext was partitioned into disjoint n-letter blocks that were each solved separately. Address correspondence to Alexander Griffing, 3930 Jackson St., Apt. Q-311, Raleigh, NC 27607, USA. E-mail: [email protected] 361 362 A. Griffing Figure 1. To encipher a message using a running key cipher, letters at corresponding positions in the message and the key are converted to numbers and added together modulo 26. The resulting numbers are then converted back into letters. Letters ‘A’ through ‘Z’ are numbers 0 through 25. As shown here, it is possible for different (message, key) pairs to generate the same ciphertext. To solve a block, they iterated through all n-letter keys, recording the corre- sponding message that would produce the observed ciphertext block. The key for which the product of the probability of the key and the probability of the corre- sponding message was highest was chosen as the recovered key for that block. Unsmoothed frequencies of n-grams observed in a training set were used as the prob- abilities. They used n-gram sizes up to 6 and training sets including and excluding the message and key, but no more than about a third of the unordered letter pairs were recovered. Maximum Likelihood Finding the best solution to a running key cipher means finding the (message, key) pair that maximizes the value of some objective function, given the ciphertext. Using the principle of maximum likelihood estimation, we choose the objective function to be the probability of observing the (message, key) pair given a language model. Because a ciphertext and a key uniquely determine the message, the problem is equivalent to finding a key such that the probability of observing the key and the corresponding message is maximized. The key and the message are both assumed to be English language texts, and the probability of observing text x is Pðx1 ...xN Þ where xi is letter i of the N-letter text. This can be rewritten using con- ditional probabilities: YN Pðx1 ...xN Þ¼Pðx1Þ Pðxijx1 ...xiÀ1Þ i¼2 The key and message are assumed to have been picked independently by the enci- pherer, so the joint probability is the product of their marginal probabilities. So how are the conditional probabilities calculated and how do we search efficiently for the maximum likelihood key? By making simplifying assumptions about English language text, a Markov chain can be used as the language model that generates the conditional probabilities. This model allows the Viterbi algorithm to efficiently find the maximum likelihood key. Solving the Running Key Cipher with the Viterbi Algorithm 363 Markov Chain A Markov chain models a sequence of random variables by assuming the sequence has local structure [10, 13]. In particular, a Markov chain of order n assumes that if i > n then Pðxijx1 ...xiÀ1Þ¼PðxijxiÀn ...xiÀ1Þ.A(n þ 1)-gram language model is a Markov chain of order n. Viterbi Algorithm The Viterbi algorithm is a dynamic programming algorithm used for finding the most probable sequence of hidden states, assuming that this sequence follows a Markov model [10, 13]. It uses recursion to compute this sequence. Let n be the order of the Markov chain used in the language model. Let ViÀn...iðkiÀn ...kiÞ be the probability of the most probable partial key beginning at position i under the condition that the key values at positions i À n ...i are kiÀn ...ki. Let miðkiÞ be the message letter at position i determined by key letter ki and the ciphertext letter at position i. Then each entry is the product of three terms: ViÀn...iðkiÀn...iÞ¼T1 Á T2 Á T3 T1 ¼ PðkijkiÀn ...kiÀ1Þ T2 ¼ PðmiðkiÞjmiÀnðkiÀnÞ ...miÀ1ðkiÀ1ÞÞ T3 ¼ max½ViðnÀ1Þ...iþ1ðkiðnÀ1Þ ...kiþ1Þ kiþ1 In practice, this is done in log space where probabilities (PðxÞ) are replaced by costs (À logðPðxÞÞ), multiplication is replaced by addition, and maximization of prob- ability is replaced by minimization of cost. Figure 2 illustrates how this method finds Figure 2. This table is filled by the Viterbi algorithm from right to left during solution of the running key cipher message ‘LHHYI’ using a 3-gram language model (an order 2 Markov chain). After all entries have been filled, the optimal solution can be read from left to right, following the indicated path. Here the most likely (message, key) pair is (‘STORE,’ ‘TO THE’). In each column, each (message, key) pair is repeated as a (key, message) pair when the partial key and corresponding partial message are not identical. This symmetry can be exploited to reduce the size of the table by about half. 364 A. Griffing the most likely solution to a running key ciphertext. In this figure, the areas of the dark gray rectangles associated with each entry are proportional to the T1 and T2 components of the cost, and the light gray area is proportional to the T3 component. nþ1 For each position i > n, the table ViÀn...i has 26 entries. Procedure Two experiments were performed. The first experiment was a comparison with the approach in [1], so it used the same training set and attempted to solve the same ciphertext using the same n-gram sizes. The second experiment attempted a more general evaluation by using many texts from Project Gutenberg [6]. For both experi- ments, time and memory requirements restricted the n-gram size to a maximum of 6 letters. Comparison Procedure As in [1], the first 1000 letters of Dracula were enciphered using the first 1000 letters of The Cask of Amontillado as the running key [12, 9]. This ciphertext was solved by the Viterbi algorithm using n-gram sizes between 1 and 6, where the conditional probabilities for the Markov chain were calculated using maximum likelihood esti- mation, i.e., without smoothing, using the union of [12] and [9] as the training text. Results of this comparison are shown in Figure 3. General Evaluation Procedure All of the English language text from Project Gutenberg that had been converted to the etext id naming system was downloaded, very small files were removed, the headers and footers were stripped, and the set of files was divided evenly and randomly into a training set and a testing set. An n-gram language model was trained using the training set for each n between 1 and 6. The conditional probabilities were calculated by Witten-Bell smoothing rather than by maximum likelihood estimation [14]. Figure 3. This is a direct comparison between the results of the Viterbi algorithm (solid line) and the results reported in [1] (dashed line). Both methods used n-gram statistics to solve a running key cipher made by combining the first 1000 characters of Dracula [12] with the first 1000 characters of The Cask of Amontillado (in [9]).