An Automatic Cryptanalysis of Playfair Ciphers Using Compression

An Automatic Cryptanalysis of Playfair Ciphers Using Compression Noor R. Al-Kazaz1 Sean A. Irvine William J. Teahan School of Computer Science Real Time Genomics School of Computer Science Bangor University Hamilton, New Zealand Bangor University Bangor, UK [email protected] Bangor, UK [email protected] [email protected] [email protected] Abstract by exploiting statistical regularities or redundancy in the source. Since compression re- This paper introduces a new moves redundancy from a source, it is im- compression-based approach to the mediately apparent why compression is advo- automatic cryptanalysis of Playfair cated prior to encryption (Irvine, 1997). How- ciphers. More specifically, it shows ever, this paper considers another application how the Prediction by Partial Match- of compression to tackle the plaintext iden- ing (‘PPM’) data compression model, tification problem for cryptanalysis. This is a method that shows a high level of an approach that has resulted in relatively performance when applied to different few publications compared to the many other natural language processing tasks, can methods that have been proposed for break- also be used for the automatic decryp- ing ciphers. The purpose of this paper is to tion of very short Playfair ciphers with explore the use of a compression model for the no probable word. Our new method automatic cryptanalysis of Playfair ciphers. is the result of an efficient combina- The primary motivation for data compres- tion between data compression and sion has always been making messages smaller simulated annealing. The method has so they can be transmitted more quickly or been tried on a variety of cryptograms stored in less space. Compression is achieved with different lengths (starting from by removing redundancy from the message, re- 60 letters) and a substantial majority sulting in a more ‘random’ output. There are of these ciphers are solved rapidly two main classes of text compression adap- without any errors with 100% of tive techniques: dictionary based and sta- ciphers of length over 120 being solved. tistical (Bell et al., 1990). Prediction by In addition, as the spaces are omitted Partial Matching (’PPM’), first described in from the ciphertext traditionally, we 1984 (Cleary and Witten, 1984), is an adap- have also tried a compression-based tive statistical coding approach, which dy- approach in order to achieve readabil- namically constructs and updates fixed order ity by adding spaces automatically Markov-based models that help predict the to the decrypted texts. The PPM upcoming character relying on the previous compression model is used again to symbols or characters being processed. PPM rank the solutions and almost all the models are one of the best computer models decrypted examples were effectively of English and rival the predictive ability of segmented with a low average number human experts (Teahan and Cleary, 1996). of errors. Furthermore, we have also Our new approach to the automatic crypt- been able to break a Playfair cipher analysis of Playfair ciphers uses PPM com- for a 6 × 6 grid using our method. pression to tackle the plaintext recognition problem. We rank the quality of the differ- 1 Introduction ent plaintexts using the size of the compressed output in bits as the metric. We also use an- Compression can be used in several ways to other PPM-based algorithm to automatically enhance cryptography and cryptanalysis. For insert spaces into the decrypted texts in order example, many cryptosystems can be broken to achieve readability. 1 Computer Science Department, College of Science This paper is organised as follows. Sec- for Women, Baghdad University, Baghdad, Iraq. tion 2 covers the basics of Playfair ciphers and Proceedings of the 1st Conference on Historical Cryptology, pages 115– 124, Uppsala, Sweden, 18-20 June, 2018 also includes a general overview of previous There are three basic encryption rules to be research on the cryptanalysis of Playfair ci- applied (Klima and Sigmon, 2012): phers as well as a discussion of its weaknesses. Our PPM based method and the simulated an- • If both letters of the bigram occupy the nealing search we use are explained in section same row, replace them with letters to 3. Section 4 covers the experimentation and the immediate right respectively, wrap- results obtained with the conclusions to our ping from the end of the row to the start findings presented in the final section. if the plaintext letter is at the end of the row. 2 Playfair Ciphers • If both letters occupy the same column, The Playfair cipher is a symmetric encryption then replace them with the letters imme- method which is based on bigram substitution. diately below them. So ‘IS’ enciphers to It was first invented by Charles Wheatstone ‘SZ’. Wrapping in this case occurs from in 1854. The cipher was named after Lord the bottom to the top if the plaintext let- Lyon Playfair who published it and strongly ter is at the bottom of the column. promoted its use. It was considered as a sig- nificant improvement on existing encryption • If both letters occupy different rows and 5 × 5 methods. A key is written into a grid columns, replace them with the letters at and this may involve using a keyword (as in the free end points of the rectangle de- the example below). For English, the 25 let- fined by both letters. Thus ‘TO’ enci- ters are arranged into the grid with one letter phers to ‘CB’. The order is important— omitted from the alphabet. Usually, the letter the letters must correspond between the ‘I’ takes the place of letter ‘J’ in the text to be encrypted and plaintext pairs (the one on encrypted. the row of the first letter of the plaintext To generate the key that is used, spaces in should be selected first). the grid are filled with the letters of the keyword and then the remaining spaces are filled Following these rules, the encrypted message with the rest of the letters from the alpha- would be: bet in order. The key is usually written into the top rows of the grid, from left to right, “CB LI LC KG PZ CB LI PI BP SZ PI HM VD ZB although some other patterns can be used in- DB QW” stead. For example, if the keyword ‘CRYPTOL- OGY’ is used, the key grid would be as below: The Playfair cipher is one of the most well known multiple letter enciphering systems. CRYPT OLGAB However, despite the high efficiency demon- DEFHI strated by this cipher, it suffers from a number KMNQS of drawbacks. The existing Playfair method is UVWXZ based on 25 English alphabetic letters with no To encrypt any plaintext message, all spaces support for any numeric or special characters. and non-alphabetic characters must be re- Several algorithms have been proposed aim- moved from the message at the beginning, ing to enhance this method (Srivastava and then the message is split into groups of two Gupta, 2011; Murali and Senthilkumar, 2009; letters (i.e. bigrams). If any bigrams contain Hans et al., 2014). One particular extended repeated letters, an ‘X’ letter is used to sepa- Playfair cipher method (Ravindra Babu et al., rate between them. (It is inserted between the 2011) is based on 36 characters (26 alphabeti- first pair of repeated letters, and then bigram cal letters and 10 numeric characters). Here, a splitting continues from that point). This pro- 6×6 key matrix was constructed with no need cess is repeated (as necessary) until no bigrams to replace the letter ‘J’ with ‘I’. By using the with repeated letters. If the plaintext has an same previous keyword ‘CRYPTOLOGY’, the odd number of letters, an ‘X’ is inserted at the key matrix in this case would be: end so that the last letter is in a bigram (Klima and Sigmon, 2012). For example, the message CRYPTO “To be or not to be that is the question” would LGABDE FHIJKM end up as: NQSUVW “TO BE OR NO TX TO BE TH AT IS TH EQ UE X Z 0 1 2 3 ST IO NX”. 4 5 6 7 8 9 116 Plaintexts containing any numerical values short Playfair ciphers are extremely difficult such as, contact number, house number, date to break without some known words. In our of birth, can be easily enciphered using this ex- paper, even Playfair ciphertexts as short as tended method (Ravindra Babu et al., 2011). 60 letters (without a probable crib) have been successfully decrypted using our new univer- 2.1 Cryptanalysis of Playfair Ciphers sal compression-based approach. We use simulated annealing in combination with compres- Different cryptanalysis methods have been in- sion for the automatic decryption. Moreover, vented to break Playfair ciphers using com- we have also effectively managed to break ex- puter methods. An evolutionary method for tended Playfair ciphers that use a 6 × 6 key Playfair cipher cryptanalysis was presented by matrix. Rhew (2003). The fitness function was based on a simple version of dictionary look-up with 2.2 Playfair’s Weaknesses the fitness calculated based on the number of words found. However, results obtained from The Playfair cipher suffers from some major this method were poor with run-time requiring weaknesses. An interesting weakness is that several hours. A genetic algorithm was pro- repeated bigrams in the plaintext will create posed by Negara (2012) where character uni- repeated bigrams in the ciphertext. Further- gram and bigram statistics were both used as more, a ciphertext bigram and its reverse will a basis of calculating the fitness function. The decipher to the same pattern in the plaintext. efficiency of the algorithm is affected by differ- For example, if the ciphertext bigram “CD” ent parameters such as the genetic operators, deciphers to “IS”, then the ciphertext “DC” ciphertext length and fitness function.

Load more