Decyphering the Geheimschreiber, a Machine Learning Approach
Total Page:16
File Type:pdf, Size:1020Kb
DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2019 Decyphering the Geheimschreiber, a Machine Learning approach Recreating and breaking the Siemens and Halske T52 used during World War II to secure communications in Sweden ORIOL CLOSA MÁRQUEZ KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Decyphering the Geheimschreiber, a Machine Learning approach Recreating and breaking the Siemens and Halske T52 used during World War II to secure communications in Sweden ORIOL CLOSA MÁRQUEZ Bachelor in Computer Science Date: 28th June 2019 Supervisor: Richard Glassey Examiner: Örjan Ekeberg School of Electrical Engineering and Computer Science Titel på svenska: Att dechiffrera Geheimschreiber med hjälp av maskininlärning Títol en català: Desxifrant la Geheimschreiber, un enfocament d’aprenentatge automàtic Abstract Historically, rotor cyphers have been used in order to secure written communications. Mechanical machines provided continuous streams of characters for encoding secret messages that were sent to the other part of the continent by means of telephone cables or radio. Several people tried in vain to tackle them but only those bold enough were successful. In Sweden, the Siemens and Halske T52 was used by the Germans during World War II and Arne Beurling was one of those bright people that successfully broke it. This thesis aims to recreate his steps applying modern concepts to the task, breaking the Geheimschreiber. In order to do that, a recreation of the machine has virtually been built and several German texts encyphered. The techniques used, involving Recurrent Neural Networks, have proven to be effective in breaking all XOR wheels with different crib sizes removing the random factor introduced by the cypher. However, if this method can be applied to real war intercepts remains to be seen. Sammanfattning Historiskt sett har rotormaskiner använts för att säkra skriftlig kommunikation. Mekaniska ma- skiner försåg kontinuerliga strömmar av tecken för att kryptera hemliga meddelanden som skic- kades till andra delar av kontinenten genom telefonkablar eller radio. Flera personer försökte att knäcka dem men bara ett fåtal personer var djärva nog att lyckas. I Sverige användes Siemens and Halske T52 av tyskarna under andra världskriget och Arne Beurling var en av de första att framgångsrikt knäcka den. Tesen syftar att återskapa stegen genom att applicera moderna kon- cept till uppgiften, att knäcka Geheimschreiber. För att lyckas med det har en maskin återskapats i en virtuell miljö och ett flertal tyska texter har chiffrererats. De teknikerna som har använts, som involverar Återkommande Neurala Nätverk, har bevisat sig vara effektiva för knäcka XOR-hjulen genom att ta bort den slumpmässiga faktorn som introduceras av chiffern. Om denna metod kan bli applicerad i riktiga krigssituationer återstår dock att se. Resum Històricament, les màquines d’encriptar amb rotors s’utilitzaven per protegir totes les comuni- cacions escrites. Aquestes generaven un flux continu de caràcters per codificar missatges secrets que eren enviats a l’altra banda del continent a través de cables telefònics o per ràdio. Diverses persones van intentar en va fer-hi front però només aquells prou aguts hi van tenir èxit. A Suècia, la Siemens and Halske T52 va ser utilitzada pels alemanys durant la Segona Guerra Mundial i Arne Beurling fou una d’aquelles persones intel·ligents que va tenir èxit en trencar-la. Aquesta tesi vol recrear els seus passos aplicant conceptes moderns a la tasca, trencar la Geheimschreiber. Per fer-ho, una recreació de la màquina s’ha construït virtualment i diversos texts alemanys han estat xifrats. Les tècniques utilitzades, incloent Xarxes Neuronals Recurrents, han demostrat ser efectives en trencar totes les positions corresponents a l’XOR amb diferents prediccions eliminant el factor aleatori introduït per la màquina. Tot i això, si aquest mètode es pot aplicar a missatges reals interceptats durant la guerra queda per veure. To Arne Beurling (1905-1986) for his amazing feat in beating the Geheimschreiber and his leadership in decyphering German traffic which played a big but secret role during World War II. To Erika Schwarze (1917-2003) for her bravery and determination spying the Nazis during the Second World War and providing the Swedes with Gestapo operatives, active agents information and Geheimschreiber messages in plain under the code name Onkel. To Bengt Beckman (1925-2012) for his interest and publications on cryptography which have been indispensable for this thesis. To every single individual that played a role in this marvelous feat, for their work and contribution to modern democracy. Contents Acknowledgements ......................................... 1 1 Introduction ............................................ 3 1.1 Problem statement . 4 1.1.1 Objectives . 4 2 Background ............................................. 5 2.1 Intercepting German signals . 5 2.2 Evolution of cyphers . 7 2.3 XOR cypher . 8 2.4 Breaking the Geheimschreiber . 9 2.5 The Siemens and Halske T52 . 11 2.5.1 Models . 13 2.5.1.1 Model T52a/b . 13 2.5.1.2 Model T52c/ca . 13 2.5.1.3 Model T52d . 13 2.5.1.4 Model T52e . 13 2.5.1.5 Model T52f . 13 2.5.2 Irregular stepping . 14 2.5.3 Klartextfunction .................................... 14 2.6 The App . 14 2.7 Artificial Neural Networks . 16 2.7.1 Artificial neuron . 17 2.7.2 Activation function . 17 2.7.3 Learning processes . 18 2.7.4 Backpropagation . 18 2.7.4.1 The Delta rule . 18 2.7.5 Regularisation . 19 2.7.5.1 LASSO regression . 19 2.7.5.2 Ridge regression . 20 2.7.5.3 Early stopping . 20 2.7.6 Recurrent Neural Networks . 20 2.7.6.1 Long Short-Term Memory . 21 3 Methods and results ........................................ 23 3.1 The Vigenère . 23 3.1.1 Unknown key and plaintext . 23 3.1.1.1 Training with a fixed key . 25 3.1.1.2 Training with variable keys . 26 3.1.1.3 Training with a German dictionary . 26 3.1.2 Unknown key and known plaintext . 27 3.2 The Geheimschreiber . 31 3.2.1 Cryptanalysis . 31 ixixix x CONTENTS 3.2.2 Unknown XOR and permutation wheels . 32 3.2.3 Unknown XOR and known permutation wheels . 35 3.2.3.1 Training with a short crib . 35 3.2.3.2 Training with a long crib . 36 4 Discussion ............................................. 39 4.1 Obtained results . 39 4.1.1 The Vigenère . 39 4.1.2 The Geheimschreiber . 40 5 Conclusions ............................................. 43 A T52 simulator ............................................ 45 A.1 Text encyphering . 45 A.2 Interactive version . 45 B Cloud computing ......................................... 49 B.1 Virtual machine setup . 49 C International Teleprinter Alphabet 2 .............................. 51 D Historical images ......................................... 53 E Chronological timeline of the events .............................. 55 Bibliography ............................................. 57 Acknowledgements Although the period in which this project has been developed does not span through more than a few months, several people have contributed to the realisation of this thesis. Because without their help the results would have been really different, I would like to thank the following people and institutions. Richard Glassey, my supervisor, for his enthusiasm and feedback on this project. Ingrid Karlsson, archivist from the Riksarkivet, and Martina Brisman and Lars Rune, from Försvarsmakten, for their quick interest in pointing me to the right direction. Kári Ólafsson from the legal unit at Försvarets Radioanstalt who has helped me from the first day in providing data and background on the matter. Her dedication in all my requests has been outstanding. In consequence, original data has been gathered and inspected, providing true results to something real. Christine, Daniel and everyone else working at the Krigsarkivet that have helped me in find- ing the corresponding material for this thesis. The archives related to this project have been pre- served and maintained through the years thanks to them and consequently I have been able to examine this marvelous material proof of another epoch. Furthermore, they have provided the necessary resources and allowed me to publish original material on this thesis. Herman Byström for helping with the translation of the abstract into Swedish in a really short period of time. KTH Biblioteket for providing me access to printed and online material used to develop the background. 111 Chapter 1 Introduction During World War II, the transmission of information through secure means became an important concern. Radio could be intercepted and telegraph lines tapped. This led to the development of encyphering methods in different parts of the planet. One of the most famous systems include the Enigma, Type B1 or the Lorenz machine, all used during the war time. The latter —codenamed Tunny by British cryptanalysts[1]— was the main objective of the first programmable electronic digital computer in the world[2]. However, a less known cyphering machine named Siemens and Halske T52 was also used by the Germans to secure communications through neutral Sweden. This machine was not really of interest by Bletchley Park as much of its traffic was also encoded with other systems easier to break. Because of its complexity compared with the others (from where they reversed engineered their layout just by intercepts), they believed it was impossible to break. The Siemens and Halske T52, the Geheimschreiber or Sturgeon as codenamed by the British GC&HQ and Bletchley Park[2], was both a cypher and a teleprinter produced by the company who gave also part of its name. Opposed to the Enigma, heavily used by the Germans during the first part of WW2, this machine was not as portable but offered a more automated way of secur- ing and sending the transmissions. No actual knowledge of cryptography from the operators side was needed, they would type and receive plain text at all times. Nevertheless, anyone listening in between would not be able to understand what was being said, as information would appear en- cyphered.