Of Ciphers and Neurons Detecting the Type of Ciphers Using Artificial Neural Networks
Total Page:16
File Type:pdf, Size:1020Kb
Of Ciphers and Neurons Detecting the Type of Ciphers Using Artificial Neural Networks Nils Kopal University of Siegen, Germany [email protected] Abstract cuits). ANNs found usages in a broad set of dif- ferent applications and research fields. Their main There are many (historical) unsolved ci- purpose is fast filtering, classifying, and process- phertexts from which we don’t know the ing of (mostly) non-linear data, e.g. image pro- type of cipher which was used to encrypt cessing, speech recognition, and language trans- these. A first step each cryptanalyst does lation. Besides that, scientists were also able to is to try to identify their cipher types us- “teach” ANNs to play games or to create paintings ing different (statistical) methods. This in the style of famous artists. can be difficult, since a multitude of ci- Inspired by the vast growth of ANNs, also cryp- pher types exist. To help cryptanalysts, tologists started to use them for different crypto- we developed a first version of an artifi- graphic and cryptanalytic problems. Examples are cial neural network that is right now able the learning of complex cryptographic algorithms, to differentiate between five classical ci- e.g. the Enigma machine, or the detection of the phers: simple monoalphabetic substitu- type of cipher used for encrypting a specific ci- tion, Vigenere,` Playfair, Hill, and transpo- phertext. sition. The network is based on Google’s In late 2019 Stamp published a challenge on the TensorFlow library as well as Keras. This MysteryTwister C3 (MTC3) website called “Ci- paper presents the current progress in the pher ID”. The goal of the challenge is to assign research of using such networks for detect- the type of cipher to each ciphertext out of a set ing the cipher type. We tried to classify all of 500 ciphertexts, while 5 different types of ci- ciphers of a new MysteryTwister C3 chal- phers were used to encrypt these ciphertexts us- lenge called “Cipher ID” created by Stamp ing random keys. Each cipher type was used ex- in 2019. The network is able to classify actly 100 times and the different ciphertexts were about 90% of the ciphertexts of the chal- shuffled then. While the intention of the author lenge correctly. Furthermore, the paper was to motivate people to start research in the field presents the current state-of-the-art of ci- of machine learning and cipher type detection, all pher type detection. Finally, we present previous solvers solved the challenge by breaking a method which shows that one can save the ciphertexts using solvers for the 5 different ci- about 54% computation time for classifi- pher types. Thus, after revealing the plaintext of cation of cipher types when using our arti- each cipher, the participants knew which type of ficial neural network instead of trying dif- encryption algorithm was used. ferent solvers for all ciphertext messages We started to work on the cipher type detec- of Stamp’s challenge. tion problem in 2019 with the intention to de- 1 Introduction tect the ciphers’ types solely using ANNs. Ten- sorFlow (Abadi et al., 2016) and Keras (Chol- Artificial neural networks (ANNs) experienced a let, 2015) were used. TensorFlow is a free and renaissance over the past years. Supported by the open-source data flow and math library devel- development of easy-to-use software libraries, e.g. oped by Google written in Python, C++, and TensorFlow and Keras, as well as the wide range CUDA, and was publicly released in 2015. Keras of new powerful hardware (especially graphic card is a free and open-source library for developing processors and application-specific integrated cir- ANNs developed by Chollet and also written in Proceedings of the 3rd International Conference on Historical Cryptology, HistoCrypt 2020 77 Python. In 2017 Google’s TensorFlow team de- 2 Related Work cided to support Keras in the TensorFlow core li- In this section, we present different papers and ar- brary. While working on the cipher type detection ticles, which deal with ANNs and cryptology. The problem, Stamp’s challenge was published. We usage of ANNs in the paper ranges between the then adapted our code and tools to the require- emulation of ciphers, the detection of the cipher ments of the challenge. Therefore, in this paper, type, and the recovering of cryptographic keys. we present our current progress of implementing Also, there are papers where the authors worked a cipher type detection ANN with the help of the with other techniques to detect the cipher type. aforementioned libraries especially for the MTC3 challenge. At the time of writing this paper, we are 1. Ibrahem (Khalel Ibrahem Al-Ubaidy, 2004) able to classify the type of ciphers of the afore- presents two ideas: First, to determine the mentioned challenge at a success rate of about key from a given plaintext-ciphertext pair. He 90%. Despite this relatively good detection rate calls this the “cryptanalysis approach”. Sec- it is still not good enough to solve the challenge ond, the emulation of an unknown cipher. on its own. Therefore, we also propose a first idea He calls this the “emulation approach”. He of a detection (and solving) method for ciphertexts used an ANN with two hidden layers in his with unknown cipher types. approach. For training his model he used The contributions and goals of this paper are: Levenberg-Marquardt (LM). He successfully trains Vigenere` cipher as well as two different 1. First public ANN classifier for classical ci- stream ciphers (GEFFE and THRESHOLD, phers developed with TensorFlow and Keras. which are both linear feedback shift regis- ters). 2. Presentation of the basics of ANNs to the au- dience of HistoCrypt, who are from differ- 2. Chandra (Chandra et al., 2007) present their ent research areas, e.g. history and linguistics method of cipher type identification. They (but mostly no computers scientists). created different ANNs which are able to dis- tinguish between different modern ciphers, 3. Example Python code which can be used to e.g. RC6 and Serpent. Their ANN architec- directly implement our methods in Tensor- ture is comparable small, consisting only of Flow and Keras. 2 hidden layers, where each layer has at most 25 neurons. They used different techniques 4. Overview of the existing work in the field of to map from the ciphertext to 400 “input pat- ANNs and cryptanalysis of classical/histori- terns”, which they fed to their network. cal ciphers and cipher type detection. 3. Sivagurunathan (Sivagurunathan et al., 2010) 5. Presentation of a first idea of a method which created an ANN with one hidden layer to does both, cipher type detection and solving distinguish between Vigenere` cipher, Hill ci- of classical ciphers. pher, and Playfair cipher. While their net- work was able to detect Playfair ciphers with The rest of this paper is structured as follows: an accuracy of 100%, the detection rate of Vi- Section 2 presents the related work in the field genere` and Hill was between 69% and 85%, of machine learning and cryptanalysis with a fo- depending on their test scenarios. cus on ANNs. Section 3 shows the founda- tion on which we created our methods. Here, 4. The BION classifiers from BION’s gadget 1 firstly we discuss ANNs in general. Secondly, we website are browser-based classifiers, inte- briefly present TensorFlow as well as Keras. After grated in two well working cipher type de- that, Section 4 presents our cipher type detection tection methods built in JavaScript. The first approach based on the aforementioned libraries. one works with random decision forests and Then, Section 5 discusses our first ideas for a ci- the second one is based on a multitude of pher type detection and solving method. Finally, ANNs. The basic idea with the second clas- Section 6 briefly concludes the paper and gives an sifier (ANN-based) is, that the different net- overview of planned future work with regards to works (different layers, activation functions, ANNs and cryptology. 1see https://bionsgadgets.appspot.com/ Proceedings of the 3rd International Conference on Historical Cryptology, HistoCrypt 2020 78 etc.) each have a “vote” for the cipher type. 1 O In the end, the votes are shown, and the cor- b rect cipher type probably has the most votes. I1 O The classifiers are able to detect the cipher types defined by the American cryptogram I2 O ( association (ACM). ... ... ... 5. Nuhn and Knight’s (Nuhn and Knight, 2014) ) extensive work on cipher type detection used I O a support vector machine based on the lib- n SVM toolkit (Chang and Lin, 2011). In their Figure 1: A single neuron of an ANN with inputs, work, they used 58 different features to suc- outputs, bias, and activation function cessfully classify 50 different cipher types out of 56 cipher types specified by the Amer- ican cryptogram association (ACA). 3 Foundation 6. Greydanus (Greydanus, 2017) used recurrent In this section, we describe the foundation used neural networks (RNN) to learn the Enigma. for our detection method. First, we discuss the An RNN has connections going from succes- ANN in general. Then, we give an introduction sive hidden layer neurons to neurons in pre- to TensorFlow and Keras and show some example ceding layers. He showed that an RNN with Python code building an ANN. a 3000-unit long short-term memory cell can learn the decryption function of an Enigma 3.1 Artificial Neural Network machine with three rotors, of a Vigenere` ci- Artificial neural networks (ANNs) are computing pher, and of a Vigenere` Autokey cipher. Fur- models (organized as graphs) that are in principle thermore, he created an RNN network which inspired by the human brain.