<<

Cryptologia

ISSN: 0161-1194 (Print) 1558-1586 (Online) Journal homepage: http://www.tandfonline.com/loi/ucry20

Solving the Running with the Viterbi

Alexander Griffing

To cite this article: Alexander Griffing (2006) Solving the with the Viterbi Algorithm, Cryptologia, 30:4, 361-367, DOI: 10.1080/01611190600789117 To link to this article: https://doi.org/10.1080/01611190600789117

Published online: 22 Nov 2006.

Submit your article to this journal

Article views: 127

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=ucry20 Cryptologia, 30:361–367, 2006 Copyright  Taylor & Francis Group, LLC ISSN: 0161-1194 print DOI: 10.1080/01611190600789117

Solving the Running Key Cipher with the Viterbi Algorithm

ALEXANDER GRIFFING

Abstract The Viterbi algorithm is able to reconstruct most of the plaintext from running key most of the time using 6-gram letter statistics.

Keywords , running key cipher, vigene`re cipher, viterbi algorithm

Introduction Under the simplifying assumption that the running key cipher is a random cipher [11], it should have a unique solution if and only if the redundancy of the message and key is at least 50%. This is because the output of the (ciphertext) is half as long as the input (message plus key). In this paper we use a model of written English that can detect this much redundancy, and we note that the Viterbi algorithm [10, 13] finds the most likely solution to a running key cipher with respect to this model. The algorithm was tested on 100 English (message, key) pairs, and usually the message and key were mostly recovered. This method is shown to be an improvement on an earlier approach [1]. This article is a follow-up to [5], which shows how the Viterbi algorithm can solve running key in which the spaces are kept in the message and key, and for which the letters of the message and key are combined using bitwise XOR.

Running Key Cipher The running key cipher combines a message and a key, each N letters long, to make a ciphertext of length N. During enciphering, each letter of the message is shifted against the corresponding letter of the key, as shown in Figure 1. It is equivalent to a Vigene`re cipher with a period as long as the message. For this article, both the message and the key are assumed to consist of English language text from which all spaces and punctuation have been removed, and for which the remaining 26 letters are considered without regard to case. This paper explains how to estimate the message and key from the ciphertext.

Summary of a Previous Automated Attack Bauer and Tate [1] enciphered the first 1000 letters of Dracula using the first 1000 letters of The Cask of Amontillado as the running key [12, 9]. The resulting ciphertext was partitioned into disjoint n-letter blocks that were each solved separately.

Address correspondence to Alexander Griffing, 3930 Jackson St., Apt. Q-311, Raleigh, NC 27607, USA. E-mail: [email protected]

361 362 A. Griffing

Figure 1. To encipher a message using a running key cipher, letters at corresponding positions in the message and the key are converted to numbers and added together modulo 26. The resulting numbers are then converted back into letters. Letters ‘A’ through ‘Z’ are numbers 0 through 25. As shown here, it is possible for different (message, key) pairs to generate the same ciphertext.

To solve a block, they iterated through all n-letter keys, recording the corre- sponding message that would produce the observed ciphertext block. The key for which the product of the probability of the key and the probability of the corre- sponding message was highest was chosen as the recovered key for that block. Unsmoothed frequencies of n-grams observed in a training set were used as the prob- abilities. They used n-gram sizes up to 6 and training sets including and excluding the message and key, but no more than about a third of the unordered letter pairs were recovered.

Maximum Likelihood Finding the best solution to a running key cipher means finding the (message, key) pair that maximizes the value of some objective function, given the ciphertext. Using the principle of maximum likelihood estimation, we choose the objective function to be the probability of observing the (message, key) pair given a language model. Because a ciphertext and a key uniquely determine the message, the problem is equivalent to finding a key such that the probability of observing the key and the corresponding message is maximized. The key and the message are both assumed to be English language texts, and the probability of observing text x is Pðx1 ...xN Þ where xi is letter i of the N-letter text. This can be rewritten using con- ditional probabilities:

YN Pðx1 ...xN Þ¼Pðx1Þ Pðxijx1 ...xi1Þ i¼2 The key and message are assumed to have been picked independently by the enci- pherer, so the joint probability is the product of their marginal probabilities. So how are the conditional probabilities calculated and how do we search efficiently for the maximum likelihood key? By making simplifying assumptions about English language text, a Markov chain can be used as the language model that generates the conditional probabilities. This model allows the Viterbi algorithm to efficiently find the maximum likelihood key. Solving the Running Key Cipher with the Viterbi Algorithm 363

Markov Chain A Markov chain models a sequence of random variables by assuming the sequence has local structure [10, 13]. In particular, a Markov chain of order n assumes that if i > n then Pðxijx1 ...xi1Þ¼Pðxijxin ...xi1Þ.A(n þ 1)-gram language model is a Markov chain of order n.

Viterbi Algorithm The Viterbi algorithm is a dynamic programming algorithm used for finding the most probable sequence of hidden states, assuming that this sequence follows a Markov model [10, 13]. It uses recursion to compute this sequence. Let n be the order of the Markov chain used in the language model. Let Vin...iðkin ...kiÞ be the probability of the most probable partial key beginning at position i under the condition that the key values at positions i n ...i are kin ...ki. Let miðkiÞ be the message letter at position i determined by key letter ki and the ciphertext letter at position i. Then each entry is the product of three terms:

Vin...iðkin...iÞ¼T1 T2 T3

T1 ¼ Pðkijkin ...ki1Þ

T2 ¼ PðmiðkiÞjminðkinÞ ...mi1ðki1ÞÞ

T3 ¼ max½Viðn1Þ...iþ1ðkiðn1Þ ...kiþ1Þ kiþ1 In practice, this is done in log space where probabilities (PðxÞ) are replaced by costs ( logðPðxÞÞ), multiplication is replaced by addition, and maximization of prob- ability is replaced by minimization of cost. Figure 2 illustrates how this method finds

Figure 2. This table is filled by the Viterbi algorithm from right to left during solution of the running key cipher message ‘LHHYI’ using a 3-gram language model (an order 2 Markov chain). After all entries have been filled, the optimal solution can be read from left to right, following the indicated path. Here the most likely (message, key) pair is (‘STORE,’ ‘TO THE’). In each column, each (message, key) pair is repeated as a (key, message) pair when the partial key and corresponding partial message are not identical. This symmetry can be exploited to reduce the size of the table by about half. 364 A. Griffing the most likely solution to a running key ciphertext. In this figure, the areas of the dark gray rectangles associated with each entry are proportional to the T1 and T2 components of the cost, and the light gray area is proportional to the T3 component. nþ1 For each position i > n, the table Vin...i has 26 entries.

Procedure Two experiments were performed. The first experiment was a comparison with the approach in [1], so it used the same training set and attempted to solve the same ciphertext using the same n-gram sizes. The second experiment attempted a more general evaluation by using many texts from Project Gutenberg [6]. For both experi- ments, time and memory requirements restricted the n-gram size to a maximum of 6 letters.

Comparison Procedure As in [1], the first 1000 letters of Dracula were enciphered using the first 1000 letters of The Cask of Amontillado as the running key [12, 9]. This ciphertext was solved by the Viterbi algorithm using n-gram sizes between 1 and 6, where the conditional probabilities for the Markov chain were calculated using maximum likelihood esti- mation, i.e., without smoothing, using the union of [12] and [9] as the training text. Results of this comparison are shown in Figure 3.

General Evaluation Procedure All of the English language text from Project Gutenberg that had been converted to the etext id naming system was downloaded, very small files were removed, the headers and footers were stripped, and the set of files was divided evenly and randomly into a training set and a testing set. An n-gram language model was trained using the training set for each n between 1 and 6. The conditional probabilities were calculated by Witten-Bell smoothing rather than by maximum likelihood estimation [14].

Figure 3. This is a direct comparison between the results of the Viterbi algorithm (solid line) and the results reported in [1] (dashed line). Both methods used n-gram statistics to solve a running key cipher made by combining the first 1000 characters of Dracula [12] with the first 1000 characters of The Cask of Amontillado (in [9]). The training text for both methods consisted of the union of [12] and [9]. Although both methods used maximum likelihood estimates, only the Viterbi algorithm considered overlapping ciphertext n-grams. Solving the Running Key Cipher with the Viterbi Algorithm 365

Figure 4. The scatter plot for each n-gram size from 1 to 6 shows the optimal solution found by the Viterbi algorithm for each of the 100 . The vertical axis is the fraction of let- ters correct in the optimal solution. The number above each scatter plot is the n-gram size used with the Viterbi algorithm. In each scatter plot, the x axis is the fraction of the redundancy in the original message and key with respect to the language model. Box plots show the median, first and third quartiles, and extrema of each marginal distribution.

Witten-Bell smoothing is one of many ways of recursively interpolating between the maximum likelihood n-gram estimate and the smoothed (n 1)-gram estimate [2, 7, 14]. Two hundred contiguous substrings each 1000 letters long were chosen from the testing set by choosing a file at random and choosing a starting position randomly within the file. These were divided evenly and randomly into a set of keys and a set of messages which were paired off to generate 100 ciphertexts. Each ciphertext was solved using the Viterbi algorithm with each n-gram language model, and the fraction of unordered (message, key) letter pairs that were correct in the solution was recorded. The fraction of redundancy of each (message, key) pair with respect to each n-gram language model was also recorded. This was calculated as D=R where D ¼ R r is the redundancy, R = log2ð26Þ is the absolute rate of the language, and r is the calculated rate [3]. Here r is log2ðPðmessage, key/modelÞÞ =2000. Results are shown in Figure 4.

Results For the comparison experiment, Figure 3 shows that the results of the Viterbi algor- ithm are similar to the results reported in [1] when small n-grams are used, but that the Viterbi algorithm gives much better results when n-grams longer than four letters are considered. Using 6-grams, the Viterbi algorithm recovered all unordered pairs of (message, key) letters, and the ordering was recovered correctly except for the single crossover point shown in Figure 5. The results of the more general evaluation in Figure 4 show that while the degree of accuracy obtained by the comparison experiment cannot be expected when the language model has been trained on text that does not include the message and key, most of the text can still be recovered. When 6-grams were used, the median percent of unordered (message, key) letter pairs found correctly was about 87%. For 6-grams, 90% of the observed accuracy variation can be explained by the frac- tion of redundancy in the message and key relative to the language model, using a simple linear regression model.

Discussion Because the running key cipher produces an output (ciphertext) that is half as long as the input (message plus key), we might guess that a solution is possible if and only if the input is at least fifty percent redundant. The assumption underlying this guess is 366 A. Griffing

Figure 5. This shows solutions found by the Viterbi algorithm for a running key cipher that was made by combining a sequence of letters from [9] with a sequence of letters from [12]. The original sequences are shown at the top and solutions found by the Viterbi algorithm follow. To help show text for which the correct solution has been found, letters equal to the corre- sponding letter in the first original sequence are darkened. These darkened letters are shown graphically to the right. that the running key cipher is a random cipher. If this assumption were correct, then the rightmost scatter plot in Figure 4 should show a step function giving a constant low background accuracy when the perceived redundancy is less than 50%, and a 100% accuracy when the perceived redundancy is greater than 50%. The actual plot shows a curve that is smoother than this, but the 50% figure still seems to be a reasonable guideline for determining whether most of the message will be found. Of the 100 messages and 100 keys of English language text taken randomly from Project Gutenberg, not all have the same redundancy relative to a given n-gram model. The most extreme example, an outlier in the scatter plot, is the excerpt from [8] comprising a list of words: ‘‘... sting pivot spring diffident bliss splinter twitch pinafore inch tinder thick infamy strip wicked sphinx liturgy...’’ Less than half of the plaintext was recovered from the ciphertext made from this message.

Conclusion The work in this article demonstrates that the Viterbi algorithm usually recovers most of the key and message from a running key ciphertext, provided that the key and message are English language text and that a high order language model is used.

Future Work This method could be improved by using a more appropriate smoothing function or larger n-grams. Although the work of the Viterbi algorithm increases exponentially with n, the work may be somewhat decreased using a lazy variant [4]. A variable order Markov model may give better results than a fixed order Markov model, and a more practical algorithm may result from removing the requirement of optimality. Using an iterative search could improve the algorithm by modifying the lan- guage model depending on the solutions that are found. For example, if much of the message text in the solution matches Shakespearean text statistics then perhaps the search could be improved by using a greater weighting for Shakespearean texts when the conditional probabilities are calculated. Solving the Running Key Cipher with the Viterbi Algorithm 367

Rather than using the Viterbi algorithm, which finds the most likely (message, key) pair for a given ciphertext, posterior decoding could be used to find the most likely (message, key) pair for each position in the ciphertext.

Acknowledgments Thanks are due to Eric Stone and Cryptologia’s editors and anonymous reviewers for commenting on the manuscript.

About the Author Alexander Griffing has an undergraduate degree from Texas A&M University, and he is enrolled in the Bioinformatics Ph.D. program at North Carolina State University.

References 1. Bauer, C. and C. N. S. Tate. October 2002. ‘‘A Statistical Attack on the Running Key Cipher, Cryptologia,’’ 26(4):274–282. 2. Chen, S. F. and J. Goodman. August 1998. ‘‘An Empirical Study of Smoothing Techni- ques for Language Modeling,’’ Technical Report TR-10-98, Center for Research in Computing Technology, Harvard University. 3. Denning, D. E. 1982. and Data Security, Addison-Wesley Publishing Company, Inc., Reading, MA, pp. 26–27. 4. Feldman, J., I. Abou-Faycal, and M. Frigo. September 2002. A Fast Maximum- Likelihood Decoder for Convolutional Codes, Proc. IEEE Semiannual Vehicular Tech- nology Conference, September 24–28. Vancouver, Canada. IEEE Publishers. 5. Griffing, A. R. July 2006. ‘‘Solving XOR Plaintext Strings with the Viterbi Algorithm,’’ Cryptologia, 30(3):258–265. 6. Hart, M. and Volunteers. 1971–2006. Project Gutenberg Literary Archive Foundation, http://www.gutenberg.net. Accessed on September 2, 2006. 7. Jelinek, F. and R. L. Mercer. May 1980. Interpolated Estimation of Markov Source Parameters from Sparse Data. In Proceedings of the Workshop on Pattern Recognition in Practice, May 21–23, Amsterdam, The Netherlands: North-Holland. 8. McGuffey, W. H. 2005. McGuffey’s Eclectic Spelling Book, Project Gutenberg Literary Archive Foundation, EText 15456. http://www.gutenberg.org/files/15456/15456.txt. Accessed on September 2, 2006. 9. Poe, E. A. 2000. The Works of Edgar Allen Poe in Five Volumes, Project Gutenberg Literary Archive Foundation, ETexts 2147–2151. http://www.gutenberg.org/dirs/ etext00/poelv10.txt. Accessed on September 2, 2006. 10. Rabiner, L. R. 1989. ‘‘A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,’’ Proceedings of the IEEE, 77(2):257–285. 11. Shannon, C. E. October 1949. Communication Theory of Secrecy Systems, Bell System Technical Journal, 28:656–715. 12. Stoker, B. 1995. Dracula, Project Gutenberg Literary Archive Foundation, EText 345. http://www.gutenberg.org/dirs/etext95/dracu13.txt. Accessed on September 2, 2006. 13. Viterbi, A. J. April 1967. ‘‘Error Bounds for Convolutional Codes and an Asymptotically Optimal Decoding Algorithm,’’ IEEE Transactions on , 13(2):260–267. 14. Witten, I. H. and T. C. Bell. July 1991. ‘‘The Zero-Frequency Problem: Estimating the Probabilities of Novel Events in Adaptive Text Compression,’’ IEEE Transactions on Information Theory, 37(4):1085–1094.