Breaking Transposition with Genetic Algorithm

C.Obi Reddy, National University of Singapore(NUS), ECE Department, [email protected]

Abstract: In recent years a number of example total number of letters is 20 optimisation algorithms have emerged which is regular case with the order of which have proven to be effective in 5. If the plain text is irregular then solving a variety of NP-complete padding of X is done at the tail of the text problems. In this paper an example the to make it regular. kind in , breaking of of plain text is done as follows transposition cipher using a heuristic genetic algorithm is presented. Regular: Plain 1 2 3 4 5 Introduction: Text In the Brute Force attack the attacker Key 1 4 2 5 3 tries every possible (k! Keys) key on a T H I S I piece of cipher text until an intelligible S A S I M translation into plaintext is obtained. P L E M E Many types of classical exist, S S A G E although most fall into one of two broad categories: substitution ciphers and Cipher Text: TSPS ISEA IMEE HALS SIMG transposition ciphers. In the former one, every plaintext character is substituted by Irregular: a cipher character, using a substitution Plain 1 2 3 4 5 alphabet, and in the latter one, plaintext Text characters are permuted using a Key 1 4 2 5 3 predetermined permutation. T H I S I

S A S I M Transposition cipher works by breaking P L E M E the plain text in to fixed number of blocks S S A G E and then shuffling the characters in each block according to the agreed key K. Key is E X A M P just a permutation of the order of the key L E X X X (eg., key = 4 2 5 1 3 with an order of 5). Example: Cipher Text: TSPSEL ISEAAX IMEEPX Plain text: 1 2 3 4 5 HALSXE SIMGMX Key : 1 4 2 5 3 Plain text: This is a simple message To decipher the cipher text write down as Remove all spaces and convert to upper a matrix as above and use the same key case. There are two cases one is irregular but inverse mapping of the permutation. meaning the number of characters in the Read the text by column wise. plain text is not a multiple of the order of key and the second is regular exactly In cryptanalysis attacker tries to either divisible by order of key. In the above break the cipher text or find the symmetric key used for encryption. In this paper we are trying to do the CIPHER TEXT Representation: Permutation ONLY ATTACK which takes the input of representation suits very well to this Cipher text and no other knowledge about problem, Phenotypes and genotypes are the plain text and the key is required. same here. Classical attacks searches a random key Selection: Is a binary probabilistic space to find the optimal key, makes them tournament selection in which two pairs computationally intensive. Genetic are selected randomly from population algorithms add direction in random search and they undergo the tournament. problems. Crossover: Single point crossover is used as shown below at 4. Genetic algorithm Attack Parent 1: 8 9 1 3|6 5 4 2 7 Parent 2: 1 5 4 9|7 8 2 3 6 Start Child 1: 8 9 1 3|5 4 7 2 6 Child 2: 1 5 4 9|8 3 6 2 7

Input the cipher text, key The first part of first child is the first part of first parent and second part of first size and genetic operators child is the remaining digits as the order of values second parent, and the first part of second child is the first part of second Generate population no of parent and second part is the remaining random permutation keys and digits as the order of first parent. calculate fitness by decrypting them Mutation: used is the swap mutation as shown below Display Child: 1 5 4 9 8 3 6 2 7 decrypted ciph er Ma gen text with highest reached Mutated child: 1 5 3 9 8 4 6 2 7 fit key Stopping criteria: is the maximum generations reached.

Select population /2 no of Elitism: Number of individuals in Stop pair of keys with binary population is kept constant by replacing tournament the lowest fit individuals with higher fitness children and highest fit individuals are protected for next generation.

Crossover them using single Fitness Function: Fitness is evaluated point crossover and generate based on the digrams (two letter words) population no of children frequency in the decrypted cipher text. Most popular digrams of English (http://www.cryptograms.org/letter- Mutate children by swap frequencies.php) are shown with their mutation scores given to them in fitness calculation at below table. Due to permuting the characters frequency of unigrams (single Decrypt each child key, letters) doesn’t change in the cipher text, calculate fitness and replace the lowest fit parents by elitism makes no effect on the cipher. Trigrams improvement in the performance but, a and quadgrams are computationally little difference in performance observed expensive so they aren’t considered in when the number of generations is fitness calculation. increased. As this algorithm works based on the permutation representation the S.N Digra Scor S.N Digra Scor crossover probability was kept lower to o m e o m e 0.4 because not to break the 1 TH 3.88 11 OU 1.28 chromosomes which are better. Coming 2 HE 3.68 12 ED 1.27 to mutation there is a large need for 3 IN 2.28 13 HA 1.27 randomness mutation probability was 4 ER 2.18 14 TO 1.17 kept higher than usual to 0.6. 5 AN 2.14 15 OR 1.15 6 RE 1.75 16 IT 1.13 Time taken for the algorithm to run is 7 ND 1.57 17 IS 1.11 shown in the below table. 8 ON 1.42 18 HI 1.09 9 EN 1.38 19 ES 1.09 Key 5 6 7 8 9 10 11 12 13 14 15 10 AT 1.33 20 NG 1.05 size Time 2 2 2 3 4 5 8 15 20 30 50 Fitness function is as shown below

Conclusion:

Same problem is solved by many authors Matthews[1] , Bethany Delman[2], When chromosome fitness is to be R.Tomeoh & S.Arumugam[3] but I have calculated then it should first decrypt the used the new fitness function in different cipher text then compute the frequencies manner. Results are not very good as of digrams in the decrypted text and sum compared to the other authors. But the the scores. advantage is that the algorithm is faster. For future work there is need to Experimentation & Results implement adaptive mutation, crossover and fitness function with trigrams & Algorithm is implemented in Matlab on a penalties. dual core Intel p4 processors with 3 GB References RAM and speed of 2.14 GHz. Below table 1. “Genetic algorithms in shows the keys recovered by genetic ” by Bethany Delman, algorithm cipher text only attack. Rochester Institute of Technology. 2. “Attacks on Transposition Ciphers lett Key recovered Using Optimization Heuristics” by ers Key size (order of the key) A. Dimovski & D. Glogoroski, Ss. 250 5 6 7 8 9 10 11 12 13 14 15 Cyril & Methodious University. 5 6 7 8 9 9 9 9 10 10 10 3. “Breaking Transposition cipher with Genetic algorithm” by R. The algorithm ran for different Tomeoh, Government college of generations 50,100,200 and 400, different Technology, India & S.Armugam, populations 20, 30, 50 and 100. As the Directorate of technical education, population increased there is a significant Chennai.