<<

Neural for Cyber-Physical System

Emma M. Meno

Thesis submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of

Master of Science in Computer Science and Applications

Danfeng Yao, Chair Matthew Hicks Bimal Viswanath

April 30, 2021 Blacksburg, Virginia

Keywords: Neural Networks, Cryptanalysis, Black-Box Evaluation, Block Ciphers, Symmetric Ciphers, Lightweight , CPS Ciphers Copyright 2021, Emma M. Meno Neural Cryptanalysis for Cyber-Physical System Ciphers

Emma M. Meno

(ABSTRACT)

A cryptographic research interest is developing an automatic, black-box method to pro- vide a relative security strength measure for symmetric ciphers, particularly for proprietary cyber-physical systems (CPS) and lightweight block ciphers. This thesis work extends the work of the recently-developed neural cryptanalysis method, which trains neural networks on a set of / pairs to extract meaningful bitwise relationships and predict corresponding given a set of . As opposed to traditional cryptanal- ysis, the goal is not key recovery but achieving a mimic accuracy greater than a defined base match rate. In addition to reproducing tests run with the Data Stan- dard, this work applies neural cryptanalysis to round-reduced versions and components of the / family of block ciphers and the Advanced Encryption Standard. This methodology generated a metric able to rank the relative strengths of rounds for each cipher as well as algorithmic components within these ciphers. Given the current neural network suite tested, neural cryptanalysis is best-suited for analyzing components of ciphers rather than full encryption models. If these models are improved, this method presents a promising future in measuring the strength of lightweight symmetric ciphers, particularly for CPS. Neural Cryptanalysis for Cyber-Physical System Ciphers

Emma M. Meno

(GENERAL AUDIENCE ABSTRACT)

Cryptanalysis is the process of systematically measuring the strength of ciphers, algorithms used to secure data and information. Through encryption, a cipher is applied to an orig- inal message or plaintext to generate muddled message or ciphertext. The inverse of this operation, translating ciphertext back into plaintext, is decryption. Symmetric ciphers only require one key that is used during for both encryption and decryption. Ma- chine learning is a data analysis method that automates computers to learn certain data properties, which can be used to predict outputs given a set of inputs. Neural networks are one type of machine learning used to uncover relationships, chaining a series of nodes together that individually perform some operations to determine correlations. The topic of this work is neural cryptanalysis, a new approach to evaluate cipher strength relying on machine learning. In this method, the goal is to ”learn” the ciphers, using machine learning to predict what the ciphertext will be for an inputted plaintext. This is done by training the networks on plaintext/ciphertext pairs to extract meaningful relationships. If a cipher is easier to predict, it is easier to crack and thus less secure. In this work, neural cryptanalysis was applied to different real-world symmetric ciphers to rank their relatively security. This technique worked best on analyzing smaller components of the cipher algorithms rather than the entire cipher, as the ciphers were complex and the neural networks were simpler. Dedication

I dedicate this to all my loved ones who supported, motivated, and believed in me, even when I did not believe in myself. I also want to dedicate this to the teachers and professors who encouraged me to pursue my passions in higher education. Mine has certainly been a unique journey, but I am grateful for every step and lesson along the way.

iv Acknowledgments

I first want to acknowledge Dr. Danfeng (Daphne) Yao for her mentorship and guidance throughout this thesis process. I would also like to acknowledge Ya Xiao, whose project was the launching point for my work, for her help in deciphering and understanding the neural cryptanalysis methodology and source . Further, I want to thank my committee members Dr. Matthew Hicks and Dr. Bimal Viswanath for their time and input. Finally, I would like to acknowledge Dr. Cliff Shaffer, who assisted me throughout my Accelerated Masters’ program experience.

v Contents

List of Figures x

1 Introduction 1

1.1 Introduction to Cyber-physical Systems ...... 1

1.2 Motivation for Neural Cryptanalysis ...... 2

1.3 Research Contributions ...... 3

1.4 Thesis Layout ...... 3

2 Review of Literature 5

2.1 Symmetric Cipher Cryptanalysis ...... 5

2.2 Lightweight Cipher Cryptanalysis ...... 5

2.3 Deep Learning in Cryptanalysis ...... 6

2.4 Neural Cryptanalysis ...... 7

3 Experimental Setup 9

3.1 Methodology & Metrics ...... 9

3.2 Neural Network Architectures ...... 11

3.3 Testing Environment and Implementation ...... 12

vi 4 14

4.1 Background ...... 14

4.1.1 DES Structure ...... 15

4.1.2 DES Previous Cryptanalysis ...... 17

4.2 Neural Cryptanalysis Results ...... 18

4.2.1 Round-Reduced DES Across Different Networks ...... 19

4.2.2 DES Decryption ...... 22

5 SIMON and SPECK Lightweight Ciphers 25

5.1 Background ...... 25

5.2 SIMON/SPECK Previous Cryptanalysis ...... 26

5.3 SIMON Ciphers ...... 26

5.3.1 SIMON Structure ...... 27

5.3.2 Neural Cryptanalysis Results ...... 28

5.4 SPECK Ciphers ...... 32

5.4.1 SPECK Structure ...... 32

5.4.2 Neural Cryptanalysis Results ...... 35

6 Advanced Encryption Standard 39

6.1 Background ...... 39

6.1.1 AES Structure ...... 40

vii 6.1.2 AES Previous Cryptanalysis ...... 42

6.2 Neural Cryptanalysis Results ...... 43

6.2.1 AES Across Different Networks ...... 43

6.2.2 Round-Reduced AES ...... 44

6.2.3 AES Algorithm Components ...... 47

7 Discussion 54

7.1 Fat/Shallow Network Architecture ...... 54

7.2 Encryption vs. Decryption Mode ...... 55

7.3 Relative Security of Cipher Rounds ...... 55

7.4 Security of Algorithmic Components ...... 56

7.5 Neural Cryptanalysis on Full Cipher Algorithms ...... 57

7.6 Application of Neural Cryptanalysis to CPS ...... 58

8 Conclusion & Future Work 59

8.1 Future Work ...... 59

8.1.1 Fine-Tuning Architectures ...... 59

8.1.2 Testing/Training Split ...... 60

8.1.3 Incorporating White-Box Knowledge ...... 60

8.1.4 AI-Based Attack Capabilities ...... 61

8.1.5 Comparative Metric to Traditional Cryptanalysis ...... 61

viii 8.1.6 NIST Lightweight Cryptography ...... 62

8.2 Conclusions ...... 62

Bibliography 64

Appendices 70

Appendix A Neural Network Code Implementation 71

ix List of Figures

3.1 Cipher Data Collection Process ...... 10

3.2 Security Indicator Generation ...... 11

3.3 Three neural network architectures applied in experiments [1] ...... 11

3.4 Tensorflow model training in Ubuntu terminal ...... 13

4.1 General DES structure [2] ...... 16

4.2 DES encryption round [2] ...... 17

4.3 DES function [2] ...... 18

4.4 DES Encryption and Decryption Algorithms [2] ...... 19

4.5 Predicted Accuracy on 1-round DES ...... 21

4.6 Attack capacity summary for round-reduced DES [1] ...... 22

4.7 Predicted Accuracy on 1-round DES for Encryption vs. Decryption Mode .. 23

4.8 Predicted Accuracy on 2-round DES for Encryption vs. Decryption Mode .. 24

5.1 Feistel stepping within SIMON round function [3] ...... 27

5.2 SIMON three-word key expansion [3] ...... 28

5.3 Predicted Accuracy of SIMON64/96 Across Different Network Architectures 30

5.4 Predicted Accuracy of 2-round SIMON64/96 on Different Network Architectures 31

x 5.5 Predicted Accuracy of Round-Reduced SIMON64/96 ...... 33

5.6 SPECK round function after i encryption steps [3] ...... 34

5.7 SPECK key expansion [3] ...... 34

5.8 SPECK round function split into Feistel-like steps [3] ...... 35

5.9 Predicted Accuracy of SPECK64/96 on Different Network Architectures .. 37

5.10 Predicted Accuracy of Round-Reduced SPECK64/96 ...... 38

6.1 State array input and output for AES [4] ...... 40

6.2 AES Cipher Round Structure [4] ...... 41

6.3 Predicted Accuracy of AES Across Different Network Architectures ..... 45

6.4 Predicted Accuracy of Round-Reduced AES ...... 46

6.5 Predicted Accuracy on AES SubBytes Across Different Network Architectures 47

6.6 Predicted Accuracy on AES ShiftRows Across Different Network Architectures 48

6.7 Predicted Accuracy on AES MixColumns Across Different Network Architec- tures ...... 49

6.8 Predicted Accuracy on AES RoundKey Across Different Network Architectures 50

6.9 Predicted Accuracy of AES Components ...... 51

6.10 Predicted Accuracy of AES Components (sans RoundKey) ...... 52

A.1 Fat/Shallow Network Source Code for 64-bit ciphers ...... 71

A.2 Deep/Thin Network Source Code for 64-bit ciphers ...... 72

xi A.3 Cascade Network Source Code for 64-bit ciphers ...... 73

xii Chapter 1

Introduction

1.1 Introduction to Cyber-physical Systems

Cyber-physical systems (CPS) integrate sensing, networking, and computing capabilities to- gether in physical infrastructure. Examples of CPS applications include autonomous vehicles, smart grid, medical monitoring, robotics, and sustainability technologies. The applicability and importance of CPS is continuing to grow in the computer science field.

CPS technologies are increasingly employing lightweight symmetric ciphers for encrypting sensitive or vulnerable data and communications. Due to the commercial nature of this application, many of these ciphers are proprietary and/or difficult to reverse engineer, ob- scuring the algorithmic structure. One such example (studied in [1]) is Hitag2, a technology for keyless modern vehicle entry. As assessed in [1], the security of this cipher is fairly weak, somewhere between 1-round and 3-round DES.

Given the relatively weak security of this and similar ciphers, it is important to determine security metrics for different CPS cipher implementations and/or candidates. The goal of cryptanalysis is to systematically measure cipher strength to determine security. The end goal of this thesiswork is to apply the neural cryptanalysis technique to CPS full encryption algorithms, round-reduced versions, and cipher components.

1 2 CHAPTER 1. INTRODUCTION

1.2 Motivation for Neural Cryptanalysis

A key goal in the research and industry sphere is to quantitatively analyze the relative strength of CPS ciphers and their components. However, traditional cryptanalysis methods for symmetric ciphers use empirical evaluation and intuitions to extract the key. An attack is only successful if the key recovery process achieves less complexity than a brute-force (exhaustive search) method. The overall security of the cipher is measured by the best-effort attack with the lowest required complexity.

Such a traditional cryptanalysis approach is difficult to apply to proprietary CPS ciphers and cipher components. Additionally, such a method does not provide a clear relative security metric easily comparable across algorithms.

This thesis extends the work of [1] to propose neural cryptanalysis as a scalable black- box approach to symmetric cipher cryptanalysis, requiring no algorithmic knowledge and generating a comparable mimic difficulty metric. Neural cryptanalysis operates by training neural networks to mimic a cipher algorithm, conducting supervised learning on plaintex- t/ciphertext pairs to predict the cipher operation. If the mimic accuracy is higher than a predefined base accuracy, then some useful information was extracted from the pairings and the algorithm is deemed less secure. Using the testing accuracy as indicative of the predic- tion accuracy, we can produce a relative cipher strength measure applicable to the black-box nature of proprietary CPS symmetric ciphers.

The key novelty of the neural cryptanalysis method is producing a relative quantitative secu- rity metric that is simple to run and simple to interpret. After constructing a suite of neural network models, a researcher can simply feed plaintext/ciphertext pairs to the pre-written code base to determine cipher strength, without requiring heuristics or intuition. With ad- ditional testing, this method is a promising direction for future symmetric cryptanalysis 1.3. RESEARCH CONTRIBUTIONS 3 research.

This thesis reproduces some of the work in the original proposal paper [1] and conducts additional experiments with different configurations and ciphers.

1.3 Research Contributions

Presenting my work makes the following contributions to neural cryptanalysis:

1. Expanding the original 64-bit neural network architecture suite to evaluate 128-bit plaintext/ciphertext pairs.

2. Applying the neural cryptanalysis methodology to additional lightweight and symmetric cipher components and round-reduced versions.

3. Providing novel relative security measures of cipher components across bit lengths and algorithm types.

1.4 Thesis Layout

The remainder of this thesis is organized as the follows. First, Chapter 2 conducts a literature review on cryptanalysis approaches related to neural cryptanalysis and CPS ciphers. Next, Chapter 3 covers the neural cryptanalysis methodology and metrics as well as the specific neural network architectures and testing tools used. Chapter 4 focuses on the Data En- cryption Standard, introducing the established security, reproducing previous experiments, and conducting neural cryptanalysis on new variations. Then, Chapter 5 shifts the focus to SIMON/SPECK ciphers, describing their structure and background and performing more neural cryptanalysis. Chapter 6wraps up the neural cryptanalysis experimentation with the 4 CHAPTER 1. INTRODUCTION

Advanced Encryption Standard, including its round-reduced version and algorithmic com- ponents. Lastly, Chapter 7 presents the conclusion and areas for future work. Chapter 2

Review of Literature

The following sections contain a brief overview of related approaches to symmetric - analysis. Note that this review is not entirely comprehensive but instead provides an insight into current state-of-the-art approaches.

2.1 Symmetric Cipher Cryptanalysis

Traditionally, symmetric cipher cryptanalysis falls involves one of two representative methods or some related variant: [5] and differential cryptanalysis [6]. As cited in [1], these approaches require 255 for differential cryptanalysis and 247 encryptions for linear cryptanalysis.

Previous symmetric cryptanalysis approaches applied to each individual cipher algorithm studied in this work are detailed in their individual sections (Section 4.1.2, Section 5.2, and Section 6.1.2 respectively).

2.2 Lightweight Cipher Cryptanalysis

Many of the cryptanalysis approaches applied to symmetric ciphers can also be applied to lightweight block ciphers. [7] presents the design and cryptanalysis of lightweight symmetric

5 6 CHAPTER 2. REVIEW OF LITERATURE

algorithms, covering three different attack approaches:

• Mixed Integer Linear Programming (MILP)-based , an automatable process which analyzes the cipher to find a ”balanced superpoly” property to query and unveil key bits.

• Meet-in-the-Middle Attack using Correlated Sequences, a variation on a generic crypto- graphic primitive cryptanalysis approach that is applicable to Feistel and SPN ciphers like SIMON.

• Practical Forgery Attacks involving either an associated data-only, ciphertext-only, or associated data and ciphertext attack and data relationships.

[8] performs cryptanalysis on the lightweight KLEIN-64 cipher, shrinking the cipher by halving the number of bits and analyzing attacks for a differential. This approach uses knowledge of the cyclic structure of the cipher algorithm and properties. [9] applies biclique cryptanalysis, a type of meet-in-the-middle attack for block ciphers without using related keys, to a suite of popular lightweight ciphers.

All of the approaches cited above require some intuition and expertise in the algorithmic structure of the cipher. A primary goal of the neural cryptanalysis is

2.3 Deep Learning in Cryptanalysis

As cited in [1], the machine learning applications to cryptanalysis have limited scope. One section drawing major attention is learning-aided side channel attacks, such as that of [10]. Rather than the traditional statistical profiling in side-channel attacks, deep learning is used as a component. Note that this still requires algorithmic knowledge. 2.4. NEURAL CRYPTANALYSIS 7

Other applications are not reliant on side-channel information but still reliant on the white- box algorithm assumption. One such application is [11], which successfully attacks the 2- round and 3-round HypCiphers using neural network networks to perform traditional crypt- analysis key recovery.

2.4 Neural Cryptanalysis

One work closely related to the neural cryptanalysis methodology is [12], which treats DES and Triple DES as black-boxes. Plaintexts are fed as input to a cascade neural network architecture, which trains and predicts corresponding ciphertexts. However, these results were not reproducible and the networks had too few parameters as cited in [1].

Another work on lightweight block ciphers [13] uses deep learning models to try and deter- mine the key given known plaintext-ciphertext pairs. This still requires knowledge of the algorithm, as a series of DL-based attacks are launched to ”break” the cipher and recover the key.

Another set of works closely related to this work’s neural cryptanalysis approach is neural- network based plaintext restoration, such as that in [14] and [15]. These techniques utilize a black-box approach to simulate decryption as a neural network, which trains to predict plaintexts from ciphertexts. This work mimicks both encryption and decryption cipher operations and the goal is to achieve a match rate above a base match rate rather than achieve a benchmark of plaintext recovery.

The foundation of this thesis is inspired by [1], which proposes the methodology and metrics for the neural cryptanalysis technique used throughout the experimental configurations. This technique is a black-box, automated, scalable method that trains neural networks to predict 8 CHAPTER 2. REVIEW OF LITERATURE and mimic cipher oprations. [1] experiments with lightweight cryptographic algorithms for proof of concept, including the round-reduced DES algorithm replicated and furthered in Section 4.2. Chapter 3

Experimental Setup

3.1 Methodology & Metrics

Neural cryptanalysis as defined in [1] addresses traditional symmetric cryptanalysis deficien- cies such as limited scalability and required algorithm knowledge. The neural cryptanalysis methodology is scalable, black-box, and automatic in measuring cipher strength. The key idea guiding this approach is measuring cipher strength by how well a neural network can mimic cipher operations without knowledge of the innerworkings of that cipher. Rather than key extraction, the goal is to predict how the cipher algorithm will transform the inputted text. As stated in [1], the only required information is plaintext and ciphertext pairs, not the encryption algorithm or keys themselves. A suite of neural networks will train on these input/output pairs, then predict the corresponding outputs. How well the neural network is able to predict these outputs is indicative of the cipher algorithm’s security; easier to mimic implies less secure while harder to mimic implies more secure.

The formal definition of the neural cryptanalysis system (M1,M2,N,S) as posed in [1] is composed of a four-element tuple (M1,M2,N,S), where each element is defined below:

• M1, M2 = mutually exclusive finite sets of plaintext/ciphertext pairs

• N = neural network suite trained on M1 and tested on M2

• S = strength indicator

9 10 CHAPTER 3. EXPERIMENTAL SETUP

There are three primary operations involved in the neural cryptanalysis methodology [1], summarized as follows:

1. Cipher Data Collection produces the plaintext/ciphertext pairs used for the neural network to train and test under supervised learning. A random bit generator is used to output binary plaintexts, which are then passed through the black-box encryption cipher to generate the matching binary ciphertexts. This process is shown in Figure

3.1. M1 is collected for training and M2 is collected for testing.

Figure 3.1: Cipher Data Collection Process

2. Mimic Model Training trains each neural network in the set N (described in Section

3.2 on the plaintext/ciphertext pairs in M1. After each training epoch, the model

predicts the encrypted output from the input using the testing set M2.

3. Security Indicator Generation calculates the security metric defined by the tuple

S = (Cmr, Compdata, Comptime), where Cmr is the cipher match rate (bitwise predic-

tion accuracy), Compdata is the minimum training data required, and Comptime is the converged training time. The final security indicator is defined by the maximum Cmr

and the minimum Compdata and Comptime. The specifics are shown in Figure 3.2.

For this thesis, the primary focus for the security indicator is the cipher match rate Cmr, as this was the most straightforward metric to measure in the given time period. The amount of training data is viewed as a variable influencing the cipher match rate rather than as a component of S. 3.2. NEURAL NETWORK ARCHITECTURES 11

Figure 3.2: Security Indicator Generation

3.2 Neural Network Architectures

The neural network architectures selected to train on plaintext/ciphertext input/output pairs for this thesis was consistent with those chosen in [1]. A suite of three multi-layer fully connected neural networks with softmax classifiers was used, posed to best represent each plaintext bit’s impact on every ciphertext bit and the ciphertext randomness. These neural network architectures are shown in Figure 3.3. Note that the two networks under opposite shape settings (fat/shallow and deep/thin) are fully connected just between adjacent layers while the cascade network also fully connects interval layers.

Figure 3.3: Three neural network architectures applied in experiments [1]

For the 64-bit ciphers studied in this work (i.e. DES, SIMON/SPECK), the dimensions are: 12 CHAPTER 3. EXPERIMENTAL SETUP

Memory 31.1 GiB Processor IntelCore i9-9900 CPU @ 3.10GHz x 16 Graphics Mesa Intel UHD Graphics 630 (CFL GT2) Disk Capacity 2.3 TB

Table 3.1: Table summarizes test environment hardware

• Deep/thin = 4 hidden layers: 128, 128, 128, 128 neurons

• Fat/shallow = 1 hidden layer: 1000 neurons

• Cascade = 4 hidden layers: 128, 256, 256, 128 neurons

For the 128-bit ciphers analyzed in this work (i.e. AES), the neural network dimensions are:

• Deep/thin = 4 hidden layers: 256, 256, 256, 256 neurons

• Fat/shallow = 1 hidden layer: 1000 neurons

• Cascade = 4 hidden layers: 256, 512, 512, 256 neurons

Snippets of the source code for the neural network architectures built for 64-bit plaintexts and ciphertexts is attached in Appendix A.

3.3 Testing Environment and Implementation

A 64-bit Ubuntu 20.04.1 LTS machine was used to run all of the plaintext-generation, en- cryption, and neural network training tasks for this thesis work. For reference, the hardware information has also been included in Table 3.1.

Note that the majority of the plaintext-generation and encryption tasks can be run fairly efficiently. For this set of experiments, a Python script invoking the random. library was used to generate random bits (0s or 1s) concatenated together to build a plaintext string of length 64 or 128, depending on the encryption algorithm tested. 3.3. TESTING ENVIRONMENT AND IMPLEMENTATION 13

For encryption, C scripts were built based on existing or publicly-available libraries and repositories. The DES code was written by the previous researcher on the project, Ya Xiao. The SIMON64/96 and SPECK64/96 code (and test vector set) was taken from the imple- mentation document [16]. The AES-128 code was sampled from the tiny-AES-c GitHub repository at [17].

The neural network training tasks took significantly longer computation time than the in- put/output pair generation stage, averaging a few hours to one day for each 350-epoch training session. All neural network architectures were built in Tensorflow 2.4.1 (with GPU support) and trained in a virtual environment using a Python script (version 3.8.5). Note that Tensorflow and Python versions must be compatible when running deep learning algo- rithms, a key factor to check before implementation. A screenshot taken from running one of the scripts is shown in Figure 3.4 for reference.

Figure 3.4: Tensorflow model training in Ubuntu terminal Chapter 4

Data Encryption Standard

4.1 Background

In 1972, the National Bureau of Standards (NBS) – now NIST (National Institute of Stan- dards and Technology) – launched a project to construct a class of standards for data encryp- tion and computer security [18]. Working from the predecessor [19][20] developed by IBM in the late 1960s and early 1970s, a joint IBM-NSA task force worked to modify the algorithm for standardization [21]. After recommendation by the NBS, the Data Encryp- tion Standard (DES) was published as Federal Information Processing Standards Publication (FIPS)-46 on January 15, 1977 with provisions allotted for NBS review every five years. The Conference on Computer Security and Data Encryption Standard [22] was held on February 15, 1977 by NBS and the Civil Service Commission to available computer security technology information and address any DES-related concerns. Six months after publication, on July 15, 1977, DES was officially adopted as a standard.

DES underwent revisions in 1988, 1993, and 1999, outlined in the following publications: FIPS 46-1 [23], FIPS 46-2 [24], and FIPS 46-3 [25].

Eventually, both significant security deficiencies revealed through cryptanalysis and the ap- proved federal use of the Advanced Encryption Standard (to be discussed in Chapter 6) led to the deprecation of the DES standard. A notice by NIST on May 19, 2005 in Federal Register

14 4.1. BACKGROUND 15

[26] announced the official withdrawal of the following DES-related standards: FIPS 46-3, Data Encryption Standard (DES) [25], FIPS 74, Guidelines for Implementing and Using the NBS Data Encryption Standard [27]; and FIPS 81, DES Modes of Operation [28].

While the traditional DES standard is no longer approved for encryption of federal govern- ment data, Triple DES (3DES) or Triple Data Encryption Algorithm (TDEA), which used three keys and , remained a secure variation utilized in the banking com- munity up through the early 21st century. However, after security analysis and real-world testing of practical attacks, NIST announced in July 2017 its plans to reduce the allowable plaintext length from 232 to 220 and strongly suggest migration to AES instead [29].

This work focuses on round-reduced DES, the structure and security of which are discussed in the following subsections.

4.1.1 DES Structure

DES is a block cipher that takes a 64-bit plaintext as input and converts it to a 64-bit ciphertext using a 56-bit cipher key. The overall encryption algorithm as shown in Figure 4.1 is comprised of an initial and final permutation (P-box) and sixteen Feistel rounds [2]. A round key function is used to generate sixteen different 48-bit round keys, one for each individual round.

This work uses a round-reduced version of DES, which chains together rounds of DES and reduces the overall complexity of the encryption. The structure of a single round of DES is depicted in Figure 4.2. Each DES round is a where the left (LI−1) and right (RI−1) 32-bits from the previous round are used to generate the left (LI )and right (RI ) 32-bits of the following round. As defined in [2], each round consists of two invertible cipher elements: a mixer and a swapper. The swapper simply swaps the left half with the right 16 CHAPTER 4. DATA ENCRYPTION STANDARD

Figure 4.1: General DES structure [2]

half of the text. The mixer involves the XOR operation and the DES function f(RI−1,KI ), where most of the complexity of the algorithm lies. Note the relationship LI = RI−1, which appears as a simple bit shift in the outputs between rounds. This correlation will factor into calculating the base-match rate for 1-round DES, as discussed in Section 4.2.1.

The DES Function, summarized in Figure 4.3, takes the rightmost 32 bits as input (RI−1), applies a 48-bit round key, and outputs a 32-bit value. There are four components of this function:

• Expansion D-box: Expand RI−1 from 32 bits to 48 bits using permutation.

• Whitener (XOR): XOR operation applied on expanded RI−1 and 48-bit round key.

• S-Boxes: Source of mixing (confusion) using 8 S-boxes with input 6-bits and output 4-bits.

• Straight D-Box: Final permutation to the data with 32-bit input and 32-bit output. 4.1. BACKGROUND 17

Figure 4.2: DES encryption round [2]

Overall, this DES function results in a complex output at each iteration, making one half of the round output (RI ) difficult to attack or predict.

Note that the DES encryption and decryption algorithms are symmetric in nature, as shown in Figure 4.4. This is a key characteristic studied in Section 7.2, where neural cryptanalysis is performed in both ”encryption” and ”decryption” mode.

4.1.2 DES Previous Cryptanalysis

After extensive cryptanalysis spanning two decades since its inception, the DES cipher was ultimately broken by three theoretical attacks with higher efficiency than the ”brute force” search [21]:

• Differential Cryptanalysis [30] 1990 – This technique involves collecting data by feeding the ciphertexts encrypted with the target machine’s unknown key through a data analysis algorithm to extract its differential behavior and determine the key. 18 CHAPTER 4. DATA ENCRYPTION STANDARD

Figure 4.3: DES function [2]

• Linear Cryptanalysis [31] 1993 – An essentially known-plaintext attack, this method aims to find a cipher’s linear approximation expression by statistically building a linear path between S-box input and output bits.

• Improved Davies’ Attack [32][33] 1995/1997 – This methodology relies on intuition of the nonuniform distribution resulting from S-box output during the DES expan- sion operation. This is used to calculate the empirical distribution of plaintext and determine the amount of necessary data to collect.

Note that the above approaches all involved some level of intuition, heuristics, and knowledge of the algorithm, which limits the scalability, efficiency, an of this cryptanalysis.

4.2 Neural Cryptanalysis Results

In order to establish and verify proof of concept for the neural cryptanalysis methodology, DES was used to generate input and output data for the three fully connected neural net- 4.2. NEURAL CRYPTANALYSIS RESULTS 19

Figure 4.4: DES Encryption and Decryption Algorithms [2]

works. Section 4.2.1 replicates a series of tests from [1] and Section 4.2.2 uses the original input data as output data (and vice versa) for training.

4.2.1 Round-Reduced DES Across Different Networks

For the first set of experiments, the goal was to reproduce a subset of the results from the original neural cryptanalysis work in [1]. To study the impact of the three different neural network architectures in Figure 3.3, the TensorFlow networks trained on 64-bit (8-byte) 20 CHAPTER 4. DATA ENCRYPTION STANDARD

plaintext-ciphertext pairs generated from 1-round DES.

For the experimental method, a Python script randomly generated 216 strings of 1s and 0s, each 64 characters long, for the training data input. This dataset became the binary 64-bit plaintext input fed through the DES encryption C program. Note that the key was kept the same for each encryption iteration so that the network could attempt to learn a consistent algorithm. Each binary ciphertext produced was the result of running only one DES encryption round (whose structure is shown in Figure 4.2). These ciphertexts were used as the training data output vectors. This process was replicated for the testing (or validation) dataset, yielding 212 input vectors (plaintext) and 212 output vectors (ciphertext). Note that in [1], tests were run with 216 plaintexts for training and 216 for testing. The training/testing split for these replicated experiments is fairly large, where the number of entries in the testing set is only 6.25% of the number in the training set, but experimentation showed that the results remained consistent when the validation percentage increased. This is acknowledged as a possible limitation, and is further addressed in Section 8.1.

The base match rate for 1-round DES is 75%. As discussed in [1], this stems from the Feistel structure of DES rounds (shown in Figure 4.2). Half of the bits of the plaintext undergo a simple bit shift permutation while the other half are passed through the more complex round function. Thus, it is assumed that half the bits can be predicted with 100% accuracy while the other half has a threshold equivalent to random guessing or 50%. In total, this equates to the 75% base match rate, a threshold used to determine if the neural network can extract any useful information after training.

The three different neural network architectures (fat/shallow, deep/thin, and cascade) un- derwent supervised learning, training on the given input plaintexts and predicting the cor- responding ciphertexts, which were evaluated for accuracy bit by bit. Figure 4.5 shows the performance of each network architecture, charting the corresponding cipher match rate for 4.2. NEURAL CRYPTANALYSIS RESULTS 21 each. The fat and shallow network architecture performed best (consistent with the results in [1]), converging to an accuracy of approximately 99.8% after 350 training epochs. How- ever, neither the deep and thin nor cascade networks extracted any useful information after training, as the cipher match rate converged at or below the base match rate, approximating 75%. This implies that the fat and shallow neural network architecture is best-suited for cryptanalysis of cipher algorithms or their components which exhibit a Feistel structure.

Figure 4.5: Predicted Accuracy on 1-round DES

One additional test from the original paper that, while not replicated in this set of exper- iments, displays a key result is shown in Figure 4.6. The researchers ran three different round-reduced variations of DES using the fat and shallow neural network configuration. 22 CHAPTER 4. DATA ENCRYPTION STANDARD

The base match rates for the 1-round, 2-round, and 3-round DES algorithms were 75%, 50%, and 50% respectively. As displayed in Figure 4.6, the fat/shallow network learned both 1-round and 2-round DES, mimicking them with 99.7% and 60% cipher match rates respectively. However, no useful information about the algorithm was gained from the 3- round DES variation, converging only to a 50% accuracy. This implies that the fat/shallow network constructed is restricted to mimicking and evaluating ciphers weaker than 3-round DES.

Figure 4.6: Attack capacity summary for round-reduced DES [1]

4.2.2 DES Decryption

Another set of experiments was performed on DES to determine the variance in cipher match rate when training the fat/shallow neural network on the algorithm in encryption 4.2. NEURAL CRYPTANALYSIS RESULTS 23

vs. decryption mode. The encryption mode is equivalent to the methodology as described in Section 4.2.1, where 216 plaintext binary strings are fed in as input data to the neural network and trained on to predict the corresponding ciphertext bits. The decryption mode is modelled by the inverse relationship, where 216 ciphertext binary strings are the input data trained on to predict plaintext bits. [1] predicts that the accuracies for both methods should measure the same, as the DES decryption and encryption operations are both equivalent. This theory is tested for both 1-round and 2-round DES.

Figure 4.7: Predicted Accuracy on 1-round DES for Encryption vs. Decryption Mode

As shown in Figure 4.7, the 1-round DES encryption and decryption methods both converge to the same cipher match rate, averaging between 99% and 100%. Figure 4.8 shows the results for 2-round DES. Note that the decryption method (i.e. training on ciphertexts as 24 CHAPTER 4. DATA ENCRYPTION STANDARD input) does not converge to as high a cipher match rate as the encryption method (i.e. training on plaintexts as input). Note that the curves between the two have similar shapes and if run with multiple iterations of data, the converged values may average to be closer.

Figure 4.8: Predicted Accuracy on 2-round DES for Encryption vs. Decryption Mode

Overall, the above results for the encryption and decryption modes of round-reduced DES suggest that the neural cryptanalysis method maintains algorithmic equivalency relations as anticipated; both encryption and decryption for DES operate similarly, so this black-box method should converge to the same prediction accuracy. Chapter 5

SIMON and SPECK Lightweight Ciphers

5.1 Background

In 2011, with the potential for US government lightweight cipher requirements and circulating concerns that topical solutions were overly restrictive, a dedicated cryptographic team was established at the Information Assurance Research lab within the National Security Agency’s (NSA’s) Research Directorate [34]. This research effort eventually produced the SIMON and SPECK cipher families in 2013. The original aim of these block ciphers was to secure highly constrained devices against chosen plaintext or ciphertext attacks and any related- key attacks. Since its inception, SIMON and SPECK have been presented at the Design Automation Conference in June 2015 and the NIST Lightweight Cryptography Workshop in July 2015.

In 2015, the SIMON and SPECK block ciphers were recommended for a joint effort by the International Organization for Standardization (ISO) and International Electrotechni- cal Commission (IEC): ISO/IEC 29192-2 Information technology – Security techniques – Lightweight cryptography – Part 2: Block Ciphers [35]. In April 2018, after years of deliber- ation by the ISO and IEC Joint Technical Committee (JTC 1) for information technology, the SPECK and SIMON ciphers were ultimately rejected for this standardization. The jus-

25 26 CHAPTER 5. SIMON AND SPECK LIGHTWEIGHT CIPHERS

tification for cancelling the project reads that ”both algorithms included in the amendment are not properly motivated an their security properties are not sufficiently understood” [35].

5.2 SIMON/SPECK Previous Cryptanalysis

Multiple symmetric cryptanalysis techniques and attacks have been used to evaluate SI- MON/SPECK. [36] attacks round-reduced versions of SIMON using linear cryptanalysis and knowledge of patterns in the encryption algorithm. [37] performs differential cryptanal- ysis on both SIMON and SPECK, using derived differential characteristics to launch a series of differential key-recovery attacks. [38] presents a fairly novel key recovery attack using the cube attack approach, which uses a distinguisher to determine a non-random cipher prop- erty. [39] is a differential-linear cryptanalysis approach, which also involves a distinguisher built by concatenating a linear approximation and a differential. Additional cryptanalysis work is listed in a Bibliography page on the SIMON and SPECK main website at [40].

Note that all of the above approaches are heuristically involved, limited their scalability. All require uncovering some property of the algorithm itself, the goal being key recovery. Neural cryptanalysis will measure the security without involving such knowledge.

5.3 SIMON Ciphers

This work conducts a series of experiments on the 64-bit block sized and 96-bit key sized variant of SIMON, termed SIMON64/96. The following sections introduce the structure and pre-determined security of SIMON64/96 and present the results from neural cryptanalysis. 5.3. SIMON CIPHERS 27

5.3.1 SIMON Structure

The SIMON family of lightweight block ciphers is engineered off of the Feistel structure and defined in [3]. The algorithm is designed to run efficiently in hardware without sacrificing security performance in software.

The encryption (and decryption) operation for SIMON64/96 utilizes three operations over 32-bit words: bitwise XOR ⊕, bitwise AND &, and left circular shift Sj (shifted by j bits). The SIMON64/96 version involves 42 rounds with word size 32 and 3 key words.

32 Given the key k ∈ GF(2 ), the round function for SIMON64 is a two-stage Feistel map Rk: GF(232) × GF(232) → GF(232) × GF(232) defined by Equation 5.1 [3]:

Rk(x, y) = (y ⊕ f(x) ⊕ k, x) (5.1)

th The effect of the round function Rki , where ki is the key at the i step, on the first word and second word of the cipher is shown in Figure 5.1.

Figure 5.1: Feistel stepping within SIMON round function [3] 28 CHAPTER 5. SIMON AND SPECK LIGHTWEIGHT CIPHERS

Note that the structure of each round is equivalent and have symmetric operations; the difference lies in the round keys. The SIMON key schedule ensures that any symmetric properties are obscured. This key expansion is shown in Figure 5.2.

Figure 5.2: SIMON three-word key expansion [3]

5.3.2 Neural Cryptanalysis Results

Similar to the testing suite run in [1] and reproduced in Section 4.2, two primary factors were tested during the SIMON64/96 neural cryptanalysis: the impact of neural network architectures on predicted accuracy and the relative cipher match rate of different round- reduced versions. The results of this neural cryptanalysis are discussed in the following subsections.

Note that the input/output data collection for this set of experiments was slightly different than that for DES. A set of randomly generated plaintexts was still fed through the black-box algorithm (SIMON64/96) to generate corresponding ciphertext output data. The training dataset remained as large as that for DES at 216 input/output pairs. However, the testing dataset was larger at 6554 (or 10% of 216) input/output pairs. This adjustment was made by the researcher to reduce potential for overfitting during training, though no significant 5.3. SIMON CIPHERS 29

impacts were observed.

The source code for the SIMON implementation in C used to encrypt random plaintext bit streams is derived from [16], which also includes a set of test vectors for validation.

SIMON Across Different Networks

The same three neural network architectures depicted in Figure 3.3 and discussed in Section 3.2 were were used to train on the input/output (plaintext/ciphertext) pairs and mimic the SIMON64/96 algorithm: fat/shallow, deep/thin, and cascade.

Given the complexity of the full 42-round algorithm, the estimated base match rate for the experiments run on SIMON64/96 is 50%, where each bit has equal probability of encrypting to 1 or 0 given a plaintext input. As shown in the results in Figure 5.3, none of the three neural networks converged to a prediction accuracy greater than the base match rate, all hovering at 50% or random guess. Therefore, no distinguishable information was extracted to mimic SIMON64/96. This result was anticipated, as the security of SIMON64/96 is definitely greater than 3-round DES, which was not predictable by any of these neural network architectures.

To better distinguish neural network performance on the SIMON cipher components, the three architectures were trained on 2-round SIMON64/96. To generate these pairs, a varia- tion of the encryption algorithm in [16], where the resulting ciphertext was outputted after the first 2 out of 42 total rounds. Note that as provided in the source code, Feistel ciphers like SIMON often are encrypted in increments of two rounds to prevent word swapping, so the 2-round encryption tests represent a basic encryption block. Just as with full encryption, the base match rate is 50% since both halves of the ciphertext undergo sufficient obfuscation.

The results of mimicking 2-round SIMON64/96 using three different neural networks is shown 30 CHAPTER 5. SIMON AND SPECK LIGHTWEIGHT CIPHERS

Figure 5.3: Predicted Accuracy of SIMON64/96 Across Different Network Architectures

in Figure 5.4. Similar to the predicted accuracy on 1-round DES (shown in Figure 4.5, the fat/shallow network achieved the highest cipher match rate for 2-round SIMON64/96, converging to almost 85%. The deep/thin architecture also mimicked 2-round SIMON64/96 to some extent, converging to just over 55% predicted accuracy. However, the cascade network was unable to extract any useful information from the plaintext-ciphertext pairs to mimic the cipher, converging at but not above the base match rate of 50%. Ultimately, this result supports the conclusion from [1] that a fat/shallow neural network is the best of the given architectures for Feistel-based cipher algorithms. 5.3. SIMON CIPHERS 31

Figure 5.4: Predicted Accuracy of 2-round SIMON64/96 on Different Network Architectures

Round-Reduced SIMON

To measure the relative security of round-reduced SIMON variations, the fat/shallow network (determined in the previous section as most suitable for this cipher structure) was trained on the 1-round, 2-round, and 3-round SIMON64/96. The results are depicted in Figure 5.5. Note that the 1-round SIMON64/96 has a base match rate of 75%. Similar to 1-round DES, 1-round SIMON64/96 only performs significant obfuscation on one half of the plaintext input while the other half is permuted by a trivial bit shift. However, the base match rates after multiple iterations (2-round and 3-round) are 50%.

As seen on the graph of the cipher match rate, the fat/shallow network mimics the 1-round SIMON64/96 (with base match rate of 75%) with a high prediction accuracy converging to 32 CHAPTER 5. SIMON AND SPECK LIGHTWEIGHT CIPHERS

99.9%. The neural network also mimicked the 2-round version to some extent, converging to 85% predicted accuracy. Some useful information from the 3-round variation was also extracted, resulting in a 63.5% cipher match rate.

Given that neural cryptanalysis provides a comparable security measure as established in [1], these results suggest that 1-round, 2-round, and 3-round SIMON64/96 are all less se- cure (weaker) than 3-round DES, especially given that the same neural network was used to operate on both of these input/output pairs. Note though that this measure, as with those produced in other cryptanalysis methods, is relative, a ”best-effort” strength eval- uation; future neural network architecture designs may be able to crack these algorithms more successfully and reverse their relative ordering. This limitation in mind, the conclusion this work remains that 1-round, 2-round, and 3-round SIMON64/96 are all less secure than 3-round DES.

5.4 SPECK Ciphers

For this work, neural cryptanalysis of SPECK was run on the 64-bit block sized and 96-bit key sized variant (or SPECK64/96), consistent with the sizes run for SIMON. The following sections introduce SPECK64/96’s structure and security and summarize the neural crypt- analysis results.

5.4.1 SPECK Structure

The SPECK family of ciphers was built for both hardware and software applications, but is optimized for microcontroller performance. SPECK’s structure is defined in [3].

The SPECK64/96 version runs over 26 rounds with word size 32 and 3 key words. The 5.4. SPECK CIPHERS 33

Figure 5.5: Predicted Accuracy of Round-Reduced SIMON64/96

operations on 32-bit words during the SPECK64 encryption map are: bitwise XOR ⊕, addition modulo 2n +, and left/right circular shifts Sj/S−j (shifted by j bits).

32 32 Given the key k ∈ GF(2 ), the round function for SPECK64 is the map Rk: GF(2 ) × GF(232) → GF(232) × GF(232) defined by Equation 5.2 [3]:

−8 3 −8 Rk(x, y) = ((S x + y) ⊕ k, S y ⊕ (S x + y) ⊕ k) (5.2)

th The effect of the round function Rki , where ki is the key at the i step, is shown in Figure 5.6.

The SPECK key schedule generates 26 keys, one for each round, using the round function. 34 CHAPTER 5. SIMON AND SPECK LIGHTWEIGHT CIPHERS

Figure 5.6: SPECK round function after i encryption steps [3]

For SPECK64/96, the key K is written as K = (l1, l0, k0) with sequences defined by [3] in Equation 5.3 (note that m = 3 for SPECK64).

−8 li+m−1 = (ki + S li) ⊕ i (5.3) 3 ki+1 = S ki ⊕ li+m−1

A diagram of the SPECK key expansion algorithm is shown in Figure 5.2.

Figure 5.7: SPECK key expansion [3] 5.4. SPECK CIPHERS 35

Recall that both DES and SIMON are based on the Feistel model. SPECK is also related to this structure since it can be represented as two Feistel-like map compositions, which is shown in Equation 5.4 and depicted in Figure 5.8.

(x, y) 7→ (y, (S−8x + y) ⊕ k) (5.4) (x, y) 7→ (y, S3x ⊕ y)

Figure 5.8: SPECK round function split into Feistel-like steps [3]

5.4.2 Neural Cryptanalysis Results

As for the previous cipher algorithms, both the impact of neural network architectures on cipher match rate and the relative predicted accuracies between different round-reduced vari- ations of SPECK64/96 were evaluated through neural cryptanalysis. The training/testing data split for supervised training on SPECK64/96 was the same as that for SIMON64/96 in Section 5.3: 216 input/output training pairs and 6554 (or 10% of 216) input/output test- ing pairs. Like SIMON, the SPECK C implementation taken directly from [16] and the accompanying test vector set was used to verify the code behavior.

The results of the SPECK64/96 neural cryptanalysis are detailed in the following subsections. 36 CHAPTER 5. SIMON AND SPECK LIGHTWEIGHT CIPHERS

SPECK Across Different Networks

To determine the impact of the neural network architecture choice in mimicking SPECK, the same fat/shallow, deep/thin, and cascade architectures constructed in [1] and depicted in Figure 3.3 were trained to predict fully-encrypted SPECK64/96 ciphertexts. Given that this cipher is loosely based on the Feistel structure, the round-reduced version was not used in this testing suite, though this does present potential for future verification.

Given the complexity of obfuscation over the 26 rounds in the full SPECK64/96 algorithm, the base match rate for this set of tests is 50%.

The mimicking performance of each of the three neural networks on the full SPECK64/96 cipher algorithm is shown in Figure 5.9. None of the networks was able to extract any meaningful information for prediction, all converging to a cipher match rate of 50% (equal to the base match rate). The fat/shallow network had a slightly higher performance but the differences between each was so small as to be trivial. This result was expected given the complexity of the algorithm. Note the caveat to the neural cryptanalysis method that this conclusion on complexity is based on the current suite of architectures tested; more complex designs in the future may yield better mimicking results.

Round-Reduced SPECK

To compare the relative security of different round-reduced versions of SPECK to one an- other and to previous cipher algorithms (i.e., round-reduced DES and SIMON64/96), the same fat/shallow neural network from previous experiments trained on a set of plaintext/ci- phertext pairs for 1-round, 2-round, and 3-round SPECK64/96. As the neural cryptanalysis methodology established in [1] is comparable, this procedure supports systematically com- paring these ciphers. 5.4. SPECK CIPHERS 37

Figure 5.9: Predicted Accuracy of SPECK64/96 on Different Network Architectures

The base match rate for each round-reduced SPECK version tested in this experiment is 50%. The round operation is complex enough such that both halves of the inputted text are not easily predictable. If any useful information is extracted from the cipher, the neural network should simply have a cipher match rate higher than that achieved by a random guess bit by bit.

The predicted accuracy results of the fat/shallow neural network training on round-reduced SPECK is shown in Figure 5.10. 1-round SPECK64/96 was mimicked with high accuracy, converging to a 92% match rate. However, the neural network failed to learn any rela- tionships in either 2-round or 3-round SPECK; the cipher match rate remained at 50%. Compared to the previous round-reduced ciphers analyzed, this result implies that 2-round and 3-round SPECK are more secure than 1-round/2-round DES and 1-round/2-round/3- round SIMON64/96. Additionally, looking at the shape of the learning curve in Figure 5.10 38 CHAPTER 5. SIMON AND SPECK LIGHTWEIGHT CIPHERS compared to those for 1-round DES in 4.6 and 1-round SIMON64/96 in 5.5, the cipher match rate changed more gradually over the 350 epochs for 1-round SPECK64/96. This shows that the neural network learned the relationships between plaintext and SPECK-encrypted ci- phertext bits slower than the learning curve rates for 1-round DES or 1-round SIMON64/96. This is an interesting result suggesting that the number of epochs may be a variable to con- sider when analyzing ciphers with neural cryptanalysis in the future and possibly providing an additional comparative indicator.

Figure 5.10: Predicted Accuracy of Round-Reduced SPECK64/96 Chapter 6

Advanced Encryption Standard

6.1 Background

The Advanced Encryption Standard dates back to 1997 when the National Institute of Standards and Technology (NIST) announced the AES development initiative in January and formally requested algorithmic candidates in September [41]. The goal of the effort was to establish a Federal Information Processing Standard (FIPS) for a symmetric key block cipher encryption algorithm equipped to secure sensitive U.S. government and (voluntary) private sector data. After fifteen original candidates, three AES candidate conferences in Jan/Feb 1999, Jul/Aug 1999, and April 2000, and two rounds of public comments, NIST announced on October 2, 2000 that Rijndael had been selected for the AES proposal. The accompanying FIPS document was published in February 2001 for public comments, followed by a review and approval process.

Ultimately, in November 2001, FIPS 197 [4] announcing AES was published. All conformance operations are run under the Cryptographic Algorithm Validation Program (CAVP), which has validated over 5700 AES algorithm implementations as of 2020 [41]. AES remains in practice today as an encryption standard for securing information.

39 40 CHAPTER 6. ADVANCED ENCRYPTION STANDARD

6.1.1 AES Structure

This thesis focuses on the AES-128 specification as defined in [4]. All of the algorithms internal operations are performed on a two-dimensional array representation of bytes termed the State. For AES-128, the State is a 4 byte × 4 byte array. The input, output, and State array are all 4 words (or 128 bits). The input bytes are transformed into an initial State array, which undergoes a series of 10 rounds executed during the AES-128 encryption algorithm and is then transformed into the output byte arrangement. These stages are depicted in Figure 6.1.

Figure 6.1: State array input and output for AES [4]

Each of the 10 rounds (except for the slightly different final round) of AES involves four separate transformations, shown in Figure 6.2: SubBytes, ShiftRows, MixColumns, and AddRoundKey. The independent and modularized nature of these cipher components makes them interesting mimic targets for cryptanalysis (which is performed in Section 6.2.3). Each of these algorithms are discussed briefly in the subsections to follow.

SubBytes

The SubBytes algorithm implements a non-linear byte substitution where each byte is inde- pendently changed following an invertible substitution table, or S-box. This S-box is built using mathematical concepts including multiplicative inverse, finite fields, and affine trans- 6.1. BACKGROUND 41

Figure 6.2: AES Cipher Round Structure [4] formations. Ultimately, the S-box is simply indexed by the hexadecimal digits of the current byte.

ShiftRows

The ShiftRows operation involves cyclically shifting rows of the State array where the offset is equal to the row number as follows:

• row = 0 → shift = 0 bytes left

• row = 1 → shift = 1 bytes left

• row = 2 → shift = 2 bytes left

• row = 3 → shift = 3 bytes left 42 CHAPTER 6. ADVANCED ENCRYPTION STANDARD

MixColumns

The MixColumns stage operates on each of the columns of the State array, viewing them as four-term polynomials in GF(28) that are multiplied modulo x4 + 1 by a fixed polynomial.

AddRoundKey

The AddRoundKey transformation adds a round key generated by a key expansion algorithm to the State using bitwise XOR. This key expansion routine involves both substitution with an S-box and cyclic permutation.

6.1.2 AES Previous Cryptanalysis

AES has underwent thorough cryptanalysis since its inception. One survey [42] lists some pre-existing attack strategies, including linear cryptanalysis and differential analysis as well as the (an ”upgraded” differential cryptanalysis method), truncated differ- entials (generalized differential approach), the attack (reliant on properties of chosen plaintexts), and interpolation attacks (involving high-order polynomials).

According to the 2018 overview [43], there are five primary approaches to current cryptanal- ysis of AES:

1. Algebraic Attacks solve a system of multivariate polynomials over a GF modeling AES to find the key.

2. XL and XSL Attacks applies linearization to solving multivariate quadratic systems.

3. Cube Attacks determines low-order polynomial model of cipher output to find key bits. 6.2. NEURAL CRYPTANALYSIS RESULTS 43

4. Side Channel Attacks exploits physical implementation vulnerabilities rather than algorithmic cryptographic vulnerabilities.

5. Related-Key and Distinguishing Attacks rely on chosen plaintext pairs, a black- box oracle, and differences of ”related” keys to reveal the unknown keys.

Again note that all of the traditional approaches discussed here, as with the DES and SIMON/SPECK ciphers, rely on operations to recover the key by exploiting some algorithmic or physical property. This approach has limited scalability and requires a level of intuition.

6.2 Neural Cryptanalysis Results

Similar to the experiments run on DES and SIMON/SPECK previously, both the impacts of different neural network architectures and the relative security of round-reduced versions were analyzed through neural cryptanalysis. Additionally, due to the complexity of each full AES round and its modularization, input/output pairs were also generated and analyzed for individual transformations - SubBytes, ShiftRows, MixColumns, and AddRoundKey - to generate more conclusive results. The results of performing neural cryptanalysis on these different variations of AES are discussed in the following subsections.

6.2.1 AES Across Different Networks

To determine the impact of neural network architecture choice on mimicking accuracy (and thus attack success), the three baseline architectures were tested on input/output pairs generated from the full AES encryption algorithm. Recall that AES-128 operates on a 128- bit plaintext and produces a 128-bit ciphertext while the previously analyzed algorithms had 44 CHAPTER 6. ADVANCED ENCRYPTION STANDARD

a block size of only 64 bits. Thus, the 128-bit versions of the neural network architectures depicted in Figure 3.3 and described in Section 3.2 are used for these tests.

The testing/training data split for AES was equivalent to that for DES: 216 input/output training pairs and 212 input/output testing pairs. Note that this split may have resulted in some overfitting, but this limitation is left for future work as no severe impact was noted. The ciphertext data was generated after encrypting the plaintext bits with the full 10-round AES algorithm, which greatly obfuscates the input data. Because of the overall algorithmic complexity, the base match rate is set to 50%, equivalent to randomly guessing bit by bit.

The predicted accuracy of mimicking AES using the three different fully-connected archi- tectures (i.e. fat/shallow, deep/thin, and cascade) is shown in Figure 6.3. Note that there is no singular network is most successful in attacking this cipher; both the fat/shallow and deep/thin networks converge to just below the base match rate (at about 49.95%) while the cascade network converges to right around the base match rate (at 50%). So, no useful information was extracted from the plaintext/ciphertext input/output pairs trained on by these network architectures. Given the complexity of the full AES encryption algorithm and the simplicity of the architectures tested, this result was anticipated.

6.2.2 Round-Reduced AES

Consistent with the neural cryptanalysis of DES and the SIMON/SPECK cipher family, the round-reduced versions of AES were also studied. A set of plaintext/ciphertext pairs was generated for 1-round, 2-round, and 3-round AES with the following testing/training data split: 216 input/output training pairs and 212 input/output testing pairs. Recall that the full encryption AES-128 algorithm involves 10 rounds, as discussed in Section 6.1.1 while full-DES is 16 rounds, full SIMON64/96 is 42 rounds, and full SPECK64/96 is 26 rounds. 6.2. NEURAL CRYPTANALYSIS RESULTS 45

Figure 6.3: Predicted Accuracy of AES Across Different Network Architectures

Although no conclusive results on the best-suited neural network was determined from the previous runs on the full AES encryption, the fat/shallow network was selected to run the tests on round-reduced AES due to its promising results on round-reduced versions both DES and SIMON/SPECK. Note that given the complexity of the algorithm applied at each round, the base match rate is set to 50%.

The predicted accuracy on round-reduced versions of AES using the fat/shallow networks is shown in Figure 6.4.

Note the small scale for cipher match rate on the y-axis, ranging only 0.2% in either direction of the base match rate. After 350 epochs, no useful information was extracted to predict 46 CHAPTER 6. ADVANCED ENCRYPTION STANDARD

Figure 6.4: Predicted Accuracy of Round-Reduced AES

the encryption for 1-round, 2-round, or 3-round AES, all converging to a cipher match rate of at or barely above the base match rate of 50%. This result suggests that, based on the current suite of network architectures tested, 1-round AES is more secure than 1-round DES, 1-round SIMON64/96, and 1-round SPECK64/96. Note that the number of rounds required to fully encrypt that data is less for AES than the other tested algorithms, so it can be conjectured that each round of AES is more secure than those for the other encryption methods, a conclusion supported by neural cryptanalysis. 6.2. NEURAL CRYPTANALYSIS RESULTS 47

6.2.3 AES Algorithm Components

To determine the impact of neural network architecture choice in training on each of the AES round algorithm components and guide which architecture to choose when comparing their relative security, training/testing sets were generated from running a set of 128-bit plaintext through the different transformation algorithms: SubBytes, ShiftRows, MixColumns, and AddRoundKey (all described in Section 6.1.1). The four graphs depicting the predicted accuracy resulting from training on the three different neural network architectures are shown in Figures 6.5, 6.6, 6.7, and 6.8.

Figure 6.5: Predicted Accuracy on AES SubBytes Across Different Network Architectures

For the SubBytes step (Figure 6.5), fat/shallow performed relatively best (51.75% cipher match rate), followed by cascade (51.5% cipher match rate) and then deep/thin (50% cipher 48 CHAPTER 6. ADVANCED ENCRYPTION STANDARD match rate). Note from the shape of the graph that the fat and shallow network learned information over time, following a steady incline in prediction accuracy. Conversely, the deep and thin network maintained relatively the same prediction accuracy within a margin of about 0.5% while the cascade network grows larger in oscillations converging around relatively the same cipher match rate between 51% and 51.5%.

Figure 6.6: Predicted Accuracy on AES ShiftRows Across Different Network Architectures

For ShiftRows (Figure 6.6), fat/shallow and cascade both perform well (converging to near 100% cipher match rate) while deep/thin lags in accuracy (at around 57% cipher match rate). For this cipher component, the fat/shallow and cascade networks both significantly outperform the deep/thin network in prediction accuracy.

For MixColumns (Figure 6.7), deep/thin and cascade mimic with about equivalent cipher 6.2. NEURAL CRYPTANALYSIS RESULTS 49

Figure 6.7: Predicted Accuracy on AES MixColumns Across Different Network Architectures

match rates (50%), followed by fat/shallow (49.95% cipher match rate). There is no notable correlation among any of these networks. Both the fat/shallow and cascade networks oscillate in cipher match rate while the deep/thin network converges to just above the base match rate.

For AddRoundKey (Figure 6.8), fat/shallow has the highest cipher match rate at 58%, followed by cascade at 53.5% and deep/thin at 51%. The prediction accuracy curves for this cipher component are distinguishable. The fat/shallow network follows logistic growth in prediction accuracy, converging to 58% as discussed. The deep/thin network does not learn much information over time, following a step curve rather than a linear growth in cipher match rate. The cascade network exhibits oscillations rather than a positive trend in 50 CHAPTER 6. ADVANCED ENCRYPTION STANDARD

Figure 6.8: Predicted Accuracy on AES RoundKey Across Different Network Architectures prediction accuracy.

Ultimately, given these results, there was no singular conclusive network performing best across all of these transformations. This further supports the conclusion from [1] that the most powerful mimicking network may differ depending on the cipher algorithm. Looking at the network resulting in the optimized cipher match rate across all transformations, the fat/shallow network architecture is selected as the best architecture to choose when compar- ing the mimicking accuracy of AES cipher components.

To compare the relative security of the four different transformations composing an AES round, the cipher match rate will be used as an indicative measure. Note that the base match rate for SubBytes is 50% given the obfuscation of the S-box independently operating 6.2. NEURAL CRYPTANALYSIS RESULTS 51 on each byte. The base match rate for ShiftRows is set to 75% since this operation is mostly dependent on a cyclical shift based on the row of the state, a relatively straightforward algorithm that is easier to predict by the neural network to some degree. The base match rate for MixColumns is 50% since the polynomial multiplication method is not easy to mimic or predict. The base match rate for AddRoundKey is 50% given the complexity of the key scheduling algorithm. Recall that the attack is deemed successful if the resulting cipher match rate is greater than the established base match rate. The prediction accuracy for these cipher components is shown in Figure 6.9, with a scaled graphic included in Figure 6.10 without the ShiftRows step included for improved visibility.

Figure 6.9: Predicted Accuracy of AES Components

Looking at the results, the fat/shallow network was able to mimic the ShiftRows transforma- 52 CHAPTER 6. ADVANCED ENCRYPTION STANDARD

Figure 6.10: Predicted Accuracy of AES Components (sans RoundKey)

tion with a nearly 100% cipher match rate, indicating that this individual component of the algorithm is very insecure. The RoundKey step was also mimicked with fairly high accuracy at a 58% cipher match rate, higher than the base match rate of 50% and thus a successful attack. The SubBytes component achieved only a slight increase in accuracy over the base match rate at 51.75%. The MixColumns transformation converged to the same accuracy as the base match rate at 50%, so no useful information was extracted to successfully mimic this cipher.

Given these cipher match rates results, the relative security of the cipher components with 6.2. NEURAL CRYPTANALYSIS RESULTS 53 respect to the neural cryptanalysis can be ranked as the following, 1 being most secure and 4 being least secure:

1. MixColumns

2. SubBytes

3. RoundKey

4. ShiftRows

Note the key caveat that this relative security ranking is constructed under the current capabilities of the neural cryptanalysis method, specifically the 1000-neuron fat/shallow network architecture. This ranking does not attempt to represent a conclusive security rankings across attack models.

That stated, the application of ranking the relative security of cipher building blocks with neural cryptanalysis presents a promising quantitative application for future cryptanalysis and cipher design. Chapter 7

Discussion

Over the course of this work, many neural cryptanalysis experiments were conducted that indicate the strength of the ciphers studied and their round-reduced or modularized versions relative to one another (given this attack model). This thesis serves as an extension of [1] to determine the applicability of the neural cryptanalysis method over various configurations and ciphers with an application goal of CPS. The sections below provide a holistic discussion on all of the results as well as a discussion on the use of neural cryptanalysis for CPS.

7.1 Fat/Shallow Network Architecture

Looking across the various neural cryptanalysis experiments run on round-reduced and mod- ularized ciphers, the fat/shallow network architecture performed best on most configurations as opposed to the other two simple networks tested. This result is likely due to the sheer number of neurons involved; there are 1000 neurons in the first (and only) layer of fat/shal- low, while there are 128 (for 64-bit) or 256 (for 128-bit) neurons in the first layers of the deep/thin and cascade architectures. This suggests that the number of neurons may out- weigh the structure of the network for this set of experiments. An area of future work (as discussed in Section 8.1.1) is fine-tuning neural network architectures, increasing their ability to attack more complex ciphers. One initial starting point may be to significantly increase the number of neurons in the already existing fully-connected models presented in this work

54 7.2. ENCRYPTION VS. DECRYPTION MODE 55 to determine the attack success.

7.2 Encryption vs. Decryption Mode

As predicted in [1], note that the symmetric nature of the DES algorithm yields approxi- mately equivalent prediction accuracies when using either plaintexts or ciphertexts as input in Section 4.2.2. This was demonstrated for both 1-round and 2-round DES, though the difference between ”encryption mode” and ”decryption mode” mimicking for 2-round DES was greater than the relative differences for 1-round DES. Note that this experiment was conducted using one set of plaintexts and ciphertexts; had the cipher match rate been av- eraged over a number of runs with different datasets, the results would likely average to be more consistent between the encryption and decryption mimicking. This result could be used as an indicator of encryption/decryption algorithm symmetry and should be further explored for other cipher structures.

7.3 Relative Security of Cipher Rounds

The round-reduced ciphers yielded different prediction accuracies, which can be used for security comparison under the neural cryptanalysis method. A key clarification to note is these securities are ranked given the neural cryptanalysis methodology used in this work, not as any final representative determination.

In Chapter 4, attacks on 1-round/2-round DES were successful, but not on 3-round. In Section 5.3, attacks on 1-,2-, and 3-round SIMON64/96 were all successful. In Section 5.4, attacks on 1-round SPECK64/96 were successful, but not on 2-round or 3-round. In Chapter 6, no attacks on round-reduced AES were successful. Given these results, we can rank the 56 CHAPTER 7. DISCUSSION strength of these symmetric cipher rounds from most to least secure (under the given neural cryptanalysis model):

1. AES-128 (10 rounds)

2. SPECK64/96 (26 rounds)

3. DES (16 rounds)

4. SIMON64/96 (42 rounds)

One related characteristic about these ciphers to note is the number of rounds required for their full encryption versions (shown in parenthesis next to each cipher listed above). As shown, AES-128, whose round structure was determined most secure by this neural cryptanalysis testing suite, requires the smallest number of rounds (only 10). This is followed by SPECK64/96 at 26 rounds, DES at 16 rounds, and SIMON64/96 at 42 rounds. Note the increase in the number of rounds as the round algorithm grows less secure (as measured by neural cryptanalysis) with the exception of DES, which has been withdrawn. This result suggests that neural cryptanalysis provides an accurate relative cipher strength measure.

7.4 Security of Algorithmic Components

Recall from Section 6.2.3 the relative ranking of AES algorithmic components. Note that this is not a final ranking of these algorithmic components’ securities (as there exists literature studying the impact of each of these transformations on the overall AES algorithmic secu- rity) but instead represents a relative security metric under the neural cryptanalysis model. Ultimately, we can relatively rank the strength of the four AES algorithmic components based on the current extent of neural cryptanalysis in Figure 6.9, from most to least secure: 7.5. NEURAL CRYPTANALYSIS ON FULL CIPHER ALGORITHMS 57

1. MixColumns

2. SubBytes

3. RoundKey

4. ShiftRows

Also recall the algorithmic structure of each of these cipher components. Given the relative security rankings above, this result implies, at least in the context of AES, neural cryptanal- ysis using the fat/shallow network architecture best attacks operations based off of XORing with a round key (RoundKey) or applying simple bit-shifting permutations (ShiftRows). More complex polynomial operations (MixColumns) and S-boxes (SubBytes) meanwhile are more secure against and resistant to fat/shallow network neural cryptanalysis.

7.5 Neural Cryptanalysis on Full Cipher Algorithms

Looking at the full encryption algorithms, neither DES, SIMON/SPECK, or AES produced any conclusive results, all converging to a cipher match rate approximately equal to the base match rate. So, no useful information was extracted by the neural network suite after 350 training epochs to mimic these ciphers. This does not provide a representative relative security measure between these algorithms. As posed in [1], this implies that the current suite of neural network architectures used is not intensive or complex enough to handle such complicated cipher algorithms. This is suggested as a direction for future work in Section 8.1.1. 58 CHAPTER 7. DISCUSSION

7.6 Application of Neural Cryptanalysis to CPS

Overall, the results of the experiments conducted in this work given the current neural net- work architecture suite present a promising application of neural cryptanalysis to lightweight symmetric block cipher components or round-reduced versions of more complex CPS ciphers. Recall from [1] that neural cryptanalysis successfully mimicked Hitag2, a CPS keyless entry , with a cipher match rate between 1-round and 3-round DES. This lightweight cipher has less complexity and lower security, making it appropriate for the current suite of neural networks tested. To perform neural cryptanalysis on fully encrypted plaintext/ci- phertext pairs in CPS applications, the current neural networks must be strengthened and improved. Chapter 8

Conclusion & Future Work

8.1 Future Work

By no means is this thesis work an extensive exploration of the neural cryptanalysis method- ology. There are a few limitations highlighted throughout that pave the way for future directions. Suggestions for related future work are covered in the following subsections.

8.1.1 Fine-Tuning Architectures

One area of improvement for future work is fine-tuning the neural network architectures trained on input/output pairs. Current neural cryptanalysis work, as inspired by [1] and done in this thesis, tests only simple multi-layer fully connected neural networks. This choice was made given the desired behavior of the network; each plaintext bit should impact any ciphertext bit and the ciphertext should resemble a randomly generated bitstream. However, note from experimentation (particularly with AES components) that no singular network design performs best for all algorithms. Note that the number of neurons involved in these simple architectures are kept small; one suggestion is to increase the number of neurons and/or the number of layers in the network to determine the impacts on attack success. It also may be the case that other more advanced neural network architectures may yield more attack successes and higher cipher match rates. Suggestions for possible

59 60 CHAPTER 8. CONCLUSION & FUTURE WORK

network designs are convolutional neural networks (CNN), long short-term memory (LSTM) and pre-existing models like the Transformer, Inception, and Resnet neural networks with more capacity and likely more attack capability.

8.1.2 Testing/Training Split

The training/testing split was a variable factor throughout this work. For the DES and AES experiments, the same split was used: 216 input/output training pairs and 212 input/output testing pairs. For the SIMON and SPECK experiments, a different split was used: 216 input/output training pairs and 6554 (or 10% of 216) input/output testing pairs. For the original experiments in [1], both the training and testing sets were equal in size; for example, 216 input/output training pairs and 216 input/output testing pairs. This variance in the testing/training data split might have impacts not assessed in this work’s analysis, such as overfitting due to limited testing data. The prior work in [1] analyzed how the size of the training data impacts the security measurement but not how the training/testing split could impact prediction accuracy, which will be left for future work.

8.1.3 Incorporating White-Box Knowledge

Recall that neural cryptanalysis has a black-box assumption, where no knowledge of the algorithm is required to perform the method. One interesting area of study may be to determine if white-box knowledge would be a useful addition to the technique. Utilizing algorithm knowledge may assist in mimicking internal operations of the cipher scheme, which may in turn assist in learning some internal stages of the algorithm. This white-box area of research is left for future endeavors and consideration. 8.1. FUTURE WORK 61

8.1.4 AI-Based Attack Capabilities

A fundamental goal of understanding that this work introduces is what capabilities AI-based attacks may have in the field of symmetric cryptanalysis. The security rankings covered in the discussion are all relative with respect to the specific machine learning attacks and neural networks presented in this work. By further dissecting cipher building blocks, (e.g. looking at the fundamental s-box and p-box components) and testing neural cryptanalysis against them, conclusions may be drawn about the relative success of the current neural network architecture suite in analyzing different algorithmic structures. Rather than just studying neural cryptanalysis as a relative measure of cipher security, a future direction of work can be committed towards assessing the relative strengths of neural cryptanalysis in an attack model view towards ”breaking” various cipher operations and structures. The scalability of the neural cryptanalysis model should also be assessed, specifically measuring the efficiency of increasingly complex neural networks for more complex ciphers.

8.1.5 Comparative Metric to Traditional Cryptanalysis

One resulting question throughout this work may be the lack of a comparative study between neural cryptanalysis and other symmetric cryptanalysis methods as discussed in the literature review. A key caveat here is that traditional cryptanalysis approaches have a different definition of attack success, where an attack achieving a greater efficiency than a brute-force approach is deemed a successful. Alternatively, for neural cryptanalysis, an attack success is achieved when the cipher match rate (prediction accuracy) is greater than the base match rate (random guessing). Traditional cryptanalysis relies on efficiency measures while neural cryptanalysis focuses more on a quantitative prediction accuracy. Thus, comparing the results across cryptanalysis is a difficult challenge, left here as an avenue for future work. 62 CHAPTER 8. CONCLUSION & FUTURE WORK

One suggestion is to work from the Security Indicator metric presented by [1], which involves an efficiency measure.

8.1.6 NIST Lightweight Cryptography

One prime application of neural cryptanalysis is for lightweight ciphers, like Hitag2, or block cipher components. In 2015, NIST began the process of establishing a lightweight cryptography standard, so far involving two rounds of candidates and feedback [44]. Neural cryptanalysis was intended to evaluate symmetric ciphers, which many of these lightweight cryptography candidates are. Neural cryptanalysis poses a potential comparative evaluation metric for components of these lightweight ciphers and, if neural network architectures could be strengthened (discussed in the next section), could possibly rank the strength of fully encrypting plaintext with these ciphers. This application is left as a direction for future work.

8.2 Conclusions

The motivation driving this work was to affirm and expand the applicability of the neural cryptanalysis method for evaluating the strength of lightweight symmetric ciphers, particu- larly for cyber-physical systems. We reproduced the DES work from [1], adding additional tests for switching the input/output pairs. We also tested an additional lightweight block cipher family SPECK/SIMON in round-reduced and full versions. The original 64-bit neural network architecture suite was also expanded to train on and mimic 128-bit plaintext/cipher- text pairs generated from full encryption, round-reduced, and components of AES. Overall, these experiments all capture the relative strengths of cipher components across bit lengths 8.2. CONCLUSIONS 63 and algorithm types in the context of the neural cryptanalysis attack model. The results generated from the current simple neural network architectures pose a promising application for researchers to evaluate the security of components of lightweight symmetric ciphers. If the neural network architectures were bettered, this could even serve to generate a strength indicator for full cipher operations. Bibliography

[1] Y. Xiao, . Hao, and D. D. Yao. Neural Cryptanalysis: Metrics, Methodology, and Applications in CPS Ciphers. In 2019 IEEE Conference on Dependable and Secure Computing (DSC), pages 1–8, 2019.

[2] Behrouz A. Forouzan. Cryptography and Network Security, chapter 6. McGraw Hill Higher Education, 2007.

[3] Ray Beaulieu, Douglas Shors, Jason Smith, Stefan Treatman-Clark, Bryan Weeks, and Louis Wingers. The simon and speck families of lightweight block ciphers. Cryptology

ePrint Archive, Report 2013/404, 2013. https://eprint.iacr.org/2013/404.

[4] National Institute of Standards and Technology. Advanced Encryption Standard. NIST FIPS PUB 197, 2001.

[5] Mitsuru Matsui. Linear Cryptanalysis Method for DES Cipher. In Tor Helleseth, editor, Advances in Cryptology — EUROCRYPT ’93, pages 386–397, Berlin, Heidelberg, 1994. Springer Berlin Heidelberg.

[6] Eli Biham and Adi Shamir. Differential cryptanalysis of the full 16-round des. In Ernest F. Brickell, editor, Advances in Cryptology — CRYPTO’ 92, pages 487–496, Berlin, Heidelberg, 1993. Springer Berlin Heidelberg.

[7] Raghvendra Rohit. Design and Cryptanalysis of Lightweight Symmetric Key Primitives,

2020. http://hdl.handle.net/10012/15556.

[8] V. Jha. Cryptanalysis of lightweight block ciphers. 2012.

64 BIBLIOGRAPHY 65

[9] Kitae Jeong, HyungChul Kang, C. Lee, Jaechul Sung, and Seokhie Hong. Biclique cryptanalysis of lightweight block ciphers present, piccolo and led. IACR Cryptol. ePrint Arch., 2012:621, 2012.

[10] Gabriel Hospodar, Benedikt Gierlichs, Elke Mulder, Ingrid Verbauwhede, and Joos Vandewalle. Machine learning in side-channel analysis: A first study. J. Cryptographic Engineering, 1:293–302, 12 2011.

[11] A. M. B. Albassal and A. . A. Wahdan. Neural network based cryptanalysis of a feistel type block cipher. In International Conference on Electrical, Electronic and Computer Engineering, 2004. ICEEC ’04., pages 231–237, 2004.

[12] Mohammed Alani. Neuro-cryptanalysis of des and triple-des. volume 7667, pages 637– 646, 11 2012.

[13] Jaewoo So. Deep learning-based cryptanalysis of lightweight block ciphers. Security and Communication Networks, 2020:1–11, 07 2020.

[14] Xinyi Hu, Yaqun Zhao, and Ching-Nung Yang. Research on plaintext restoration of aes based on neural network. Sec. and Commun. Netw., 2018, January 2018.

[15] Sijie Fan and Yaqun Zhao. Analysis of des plaintext recovery based on bp neural network. Security and Communication Networks, 2019:1–5, 11 2019.

[16] Ray Beaulieu, Douglas Shors, J. Smith, Stefan Treatman-Clark, B. Weeks, and Louis Wingers. SIMON and SPECK Implementation Guide. January 2019.

[17] Kokke. kokke/tiny-aes-c, Jan 2021. https://github.com/kokke/tiny-AES-c/blob/ master/aes.c.

[18] M. Smid and D. Branstad. Data encryption standard: past and future. 1988. 66 BIBLIOGRAPHY

[19] Horst Feistel. Cryptography and computer privacy. Scientific American, 228(5):15–23, 1973.

[20] J Smith. The design of Lucifer: a cryptographic device for data communications”. IBM T.J. Watson Research Center, April 1971.

[21] Alex Biryukov and Christophe De Cannière. Data encryption standard (DES), pages 129–135. Springer US, Boston, MA, 2005.

[22] Dennis Branstad. Computer security and the data encryption standard: Proceedings of the conference on computer security and the data encryption standard. Special Publication (NIST SP), National Institute of Standards and Technology, Gaithersburg, MD, 1978-02-01 1978.

[23] National Institute of Standards and Technology. FIPS PUB 46-1 FEDERAL INFOR- MATION PROCESSING STANDARDS PUBLICATION Data Encryption Standard, January 1988.

[24] National Institute of Standards and Technology. FIPS PUB 46-2 FEDERAL INFOR- MATION PROCESSING STANDARDS PUBLICATION Data Encryption Standard, December 1993.

[25] National Institute of Standards and Technology. FIPS PUB 46-3 FEDERAL INFOR- MATION PROCESSING STANDARDS PUBLICATION Data Encryption Standard, October 1999.

[26] William Barker. Announcing Approval of the Withdrawal of Federal Information Pro- cessing Standard (FIPS) 46-3, Data Encryption Standard (DES); FIPS 74, Guidelines for Implementing and Using the NBS Data Encryption Standard; and FIPS 81, DES Modes of Operation. Federal Register, 70:28907–28908, May 2005. BIBLIOGRAPHY 67

[27] National Institute of Standards and Technology. FIPS PUB 74 FEDERAL INFOR- MATION PROCESSING STANDARDS PUBLICATION Guidelines for Implementing and Using the NBS Data Encryption Standard, April 1981.

[28] National Institute of Standards and Technology. FIPS PUB 74 FEDERAL INFOR- MATION PROCESSING STANDARDS PUBLICATION DES Modes of Operation, December 1980.

[29] National Institute of Standards and Technology. Update to Current Use

and Deprecation of TDEA, July 2017. https://csrc.nist.gov/News/2017/ Update-to-Current-Use-and-Deprecation-of-TDEA.

[30] Eli Biham and Adi Shamir. Differential Cryptanalysis of the Data Encryption Standard. Springer-Verlag, Berlin, Heidelberg, 1993.

[31] Mitsuru Matsui. Linear Cryptanalysis Method for DES Cipher. In Tor Helleseth, editor, Advances in Cryptology — EUROCRYPT ’93, pages 386–397, Berlin, Heidelberg, 1994. Springer Berlin Heidelberg.

[32] Eli Biham and Alex Biryukov. An Improvement of Davies’ Attack on DES. J. Cryptol., 10(3):195–205, June 1997.

[33] D. Davies and S. Murphy. Pairs and Triplets of DES S-Boxes. J. Cryptol., 8(1):1–25, December 1995.

[34] Ray Beaulieu, Douglas Shors, Jason Smith, Stefan Treatman-Clark, B. Weeks, and Louis Wingers. Notes on the design and analysis of SIMON and SPECK. IACR Cryptol. ePrint Arch., 2018.

[35] Tomer Ashur and Atul Luykx. An Account of the ISO/IEC Standardization of the Simon 68 BIBLIOGRAPHY

and Speck Block Cipher Families, pages 63–78. Springer International Publishing, Cham, 2021.

[36] Reham Almukhlifi and Poorvi Vora. Linear Cryptanalysis of Reduced-Round Simon Using Super Rounds. Cryptography, 4:9, 03 2020.

[37] Farzaneh Abed, Eik List, Stefan Lucks, and Jakob Wenzel. Differential cryptanalysis of round-reduced simon and speck. In Carlos Cid and Christian Rechberger, editors, Fast Software Encryption, pages 525–545, Berlin, Heidelberg, 2015. Springer Berlin Heidelberg.

[38] Zahra Ahmadian, Shahram Rasoolzadeh, Mahmoud Salmasizadeh, and Moham- mad Reza Aref. Automated Dynamic Cube Attack on Block Ciphers: Cryptanalysis of SIMON and KATAN. IACR Cryptology ePrint Archive, 2015:40, 2015.

[39] Yanqin Chen and Wenying Zhang. Differential-linear cryptanalysis of SIMON32/64. International Journal of Embedded Systems, 10:196, 01 2018.

[40] National Security Agency’s Research Directorate. Bibliography, 2019. https:// nsacyber.github.io/simon-speck/bibliography/.

[41] Information Technology Laboratory Computer Security Division. AES Development - Cryptographic Standards and Guidelines: CSRC, Jan 2021.

[42] Alan Kaminsky, Michael Kurdziel, and Stanislaw Radziszowski. An overview of crypt- analysis research for the advanced encryption standard. 09 2010.

[43] Rashmi Mishra Km. Amrita, Neha Gupta. An Overview of Cryptanalysis on AES. International Journal of Advance Research in Science and Engineering (IJARSE), 07, April 2018. BIBLIOGRAPHY 69

[44] Information Technology Laboratory Computer Security Division. Lightweight Cryptog- raphy: CSRC, Mar 2021. Appendices

70 Appendix A

Neural Network Code Implementation

The figures contained in this appendix display the Python source code used to build the neural network architectures for 64-bit ciphers in this work. Credit must be given to the original author of [1] Ya Xiao for building this neural network code for cryptanalysis of DES and Hitag2. For this experimentation, these networks were also used for 64-bit versions of SIMON/SPECK. The dimensions for the 128-bit versions differ slightly and are discussed in Section 3.2.

Figure A.1: Fat/Shallow Network Source Code for 64-bit ciphers

71 72 APPENDIX A. NEURAL NETWORK CODE IMPLEMENTATION

Figure A.2: Deep/Thin Network Source Code for 64-bit ciphers 73

Figure A.3: Cascade Network Source Code for 64-bit ciphers