<<

arXiv:1911.04020v3 [cs.CR] 26 Nov 2019 antatmtclyehutalpsil ah oidentif to one cryptanalysi conventional scalability. space, paths limited Thus, search possible ones. huge all probabilistic the high exhaust to automatically t Due cannot need identified. relationships manually mathematical differ be unbalanced high- These bias, relationshi path). approximation mathematical unbalanced linear a certain feasible (e.g. a a by Because play enabled limitation. intrinsic is effort an manual – role and central intervention the convent of For human level attack. best-effort security , the The by . determined a la is evaluate to cipher to one has ex- by [2] cryptanalyst several one cryptoanalysis A them are [5]). linear There complexity cryptoanalysis (e.g., search. a differential approaches with attack recovered brute-force be isting the can than key the less if successful as ylucigvrosatcsadassigwehro o a attacks. not these or to whether resistant assessing is and primitive cryptographic attacks wo various launching symmetric measures of by systematically Cryptanalysis that ciphers. work of of strength line the (e.g. to i Cryptanalysis refers empirically. of ciphers [4]) evaluated hardness symmetric be the of to to security required use reduced the to be problems, can common mathematical security is whose it asymme ciphers the 1), ric unlike Figure However, ciphers. in symmetric vehicles lightweight modern of entry ok,CSciphers CPS works, fHtg applrsra ihrue ntekyesetyo entry keyless DES. that the 3-round in than show used weaker cipher results is strengt stream cars) The security popular modern the layer. (a DES, Hitag2 output round-reduced of par with the multiple compared with at when networks classifiers neural binary connected demonstr fully experimental Our utilizes ciphers. dir rate, of to security us cipher the represent allows match compare approach the quantitatively measurement cipher This to mimic tha strength. networks cipher complexity) (e.g., to neural time network from metrics training computed and neural new complexity a data measurin define training for by We cipher is t a . it know of difficult to strength an having the how without quantify describe mea ciphers We we proprietary to . of approach work, strength evaluation this the security In black-box algorithms. easy-to-use cipher proprietary ntaiinlsmerccytnlss natc sregar is attack an cryptanalysis, symmetric traditional In keyless (e.g. systems cyber-physical real-world many For ne Terms Index Abstract erlCytnlss erc,Mtoooy and Methodology, Metrics, Cryptanalysis: Neural Mn elwrdcbrpyia ytm CS use (CPS) systems cyber-physical real-world —Many bakbxeauto,cytnlss erlnet- neural cryptanalysis, evaluation, —black-box .I I. NTRODUCTION plctosi P Ciphers CPS in Applications h 09IE ofrneo eedbeadScr Computing Secure and Dependable on Conference IEEE 2019 The aXa,Qnyn a n afn Dpn)Yao (Daphne) Danfeng and Hao Qingying Xiao, Ya { a9,qa21,danfeng qhao2018, yax99, optrSine ignaTech Virginia Science, Computer lcsug VA Blacksburg, [1]– , ential has s ional ation ttack unch are t ectly sure allel and ded rks the the he ps t- o h y g s f epciey h okaot acd erlntokarch network neural cascade tecture. a adopts work The respectively. 2mntsuigMTA nasnl optrfrDSad3-DES and DES for computer single a on MATLAB respectively. using minutes 72 be cannot 3DES and IV-A). Section DES in full-round discussed on (further [10] reproduced [9], in claims rdc litxsfo ihret fDSad3DSby 3-DES and DES successfully to of around able from be learning to from claims author predict The [10]. [9], Alani compare further We I. Table scalable. in and them stati automatic - is to approach value calcula key Our to the of influence mathematical predic the manual to aims cryptana delicate Traditional cryptanalysis performs key. the neural knowing while without attack ciphertexts the of cess cuayadcrepnigrqie riigdt n time and predictio data training by required difficulty corresponding mimic and and the accuracy plaintexts represent between We mapping breaks the ciphertexts. symmetry success uncovering mimicking by the cipher The versi to the decryption. opposite due and its ciphertexts to of from equivalent i plaintexts is the task predicting on ciphe ciphertexts This plaintex predict this of plaintexts. to collection for of is a task is is The it data pairs. training difficult ciphertext The more mimicked. the be is, to cipher the stronger strength the measure to ciphers. networks neural define of ability We learning been literature. not the cryptanalysis has cryptanalysis without the problem in strengths This algorithms. addressed cipher cipher the of knowledge measure automatically specificall ciphers, to symmetric for cryptanalysis scalable the recovering alway not Fully may possible. engineering applied. be reverse through ap- directly algorithm cryptanalysis cipher be Traditional e.g cannot [6]. proprietary, proaches Crypto usually commercial Megamos are on Hitag2, Ciphers (CPS) algorithm. systems cipher cyber-physical the of knowledge 1 rdtoa rpaayi ead e xrcina h su the as extraction key regards cryptanalysis Traditional h lss eae oko erlcytnlssi oeby done is cryptanalysis neural on work related closest The etannua ewrst ii ihragrtm.The algorithms. cipher mimic to networks neural train We and black-box of problem the address we paper, this In require also methods cryptanalysis traditional Besides, ln 9,[0 eotdteaeaetann ieo 1minu 51 of time training average the reported [10] [9], Alani } 1 @vt.edu oee,oreprmna nig ugs htthe that suggest findings experimental our However, sacytnlssapoc htlvrgsthe leverages that approach cryptanalysis a as 2 11 and 2 12 litx-ihretpairs, plaintext-ciphertext neural how y e and tes stics. lysis nput of s on, c- te . i- t- 1 ., n s r t , Attack Success Approaches Required Data Required Knowledge Attack Goal Attack Enabler Condition Attack Complexity Unbalanced Plaintext-ciphertext Cryptanalysis Cipher Algorithm Key Recovery < Exhausitive Key Statistical Pairs Search Complexity Property Attack Complexity Unbalanced Learning-aided Plaintext-ciphertext Cipher Algorithm Key Recovery < Exhausitive Key Statistical Cryptanalysis (e.g. [7], [8]) Pairs Search Complexity Property Plaintext-ciphertext Cipher Match Rate Predictability by Neural Cryptanalysis No Further Knowledge Ciphertext Prediction Pairs > Base Match Rate Neural Network

TABLE I: Comparison between differential cryptanalysis and neural-cryptanalysis.

Our neural cryptanalysis approach should not be confused training with 220 pairs, the cipher match rate reaches with learning-aided cryptanalysis that applies neural networks 98%. to improve the statistical profiling in the conventional - Neural cryptanalysis enables the automatic black-box eval- analysis (e.g., as in [7], [8]). These solutions are still based uation of cipher strengths. This evaluation methodology ap- on traditional cryptanalysis, e.g., requiring the knowledge of pears quite powerful and exciting, potentially applicable to all cipher algorithms. Our neural cryptanalysis has completely ciphers, enabling researchers to compare cipher strengths in a different attack goals and methodology. unified framework.

II. BACKGROUND AND CIPHER MIMICKING Conventional cryptanalysis heavily relies on unbalanced statistical properties. According to the different ways to find the unbalanced properties, cryptanalysis is categorized into different approaches. Differential cryptanalysis is one of the Fig. 1: Car keys with Hitag2 transponders shown in [11] most common approaches among them. We use it as the representative to introduce conventional cryptanalysis. Our contributions are summarized as follows. Definition 1 (Difference). The difference ∆ of two equivalent • We a new methodology to evaluate the strength length bit streams P1 and P2 is defined as ∆= P1 ⊕ P2 . of cipher primitives with neural networks. Compared with traditional cryptanalysis, it does not rely on the Definition 2 (Differential Path). When two plaintexts are knowledge of cipher algorithms. We define the difficulty encrypted by a multi-round cipher, a k-round differential path of mimicking a cipher by three metrics, cipher match ∆i → ∆i+k is a path where their difference is ∆i at the i-th rate, training data, and time complexity, to measure the round and ∆i+k at the (i + k)-th round, respectively. strength of a cipher. Definition 3 (Probability of Differential Path ∆i → ∆i+k). • We show the effectiveness of neural cryptanalysis on Given a fixed path, its probability is the fraction of plaintext round-reduced DES and proprietary cipher Hitag2. The pairs that have the difference ∆i+k at the (i + k)-th round experiments show that the strength of Hitag2 is weaker among all the pairs having difference ∆i at the i-th round. than the 3-round DES in neural cryptanalysis. We discuss the network architecture by applying three different net- A conventional differential cryptanalysis approach (e.g., [1], works. Experiments show that the most powerful attack [5]) works as follows. neural network varies from cipher to cipher. While a fat- 1) ATTACK PATH CONSTRUCTION PHASE: There are two and-shallow shaped fully connected network is the best steps: the p rounds and the rounds. The attacker’s to attack the round-reduced DES, a deep-and thin shaped goal is to construct a p-round differential path with a fully connected network works best on Hitag2. sufficiently high probability. Given an attacker-chosen • We compare three common activation functions (e.g. plaintext difference ∆0, each possible value of the sigmoid, relu, tanh) in neural cryptanalysis. Experiments difference ∆p at the p-th round should occur with the show that Sigmoid converges more quickly than the other approximately equivalent probability of 2−n in a perfect two activation functions, which means it can reach certain information-theoretically secure cipher, where n is the accuracy with the minimum training time. There is no length of the plaintext-ciphertext. substantial difference in the converged cipher match rate. In reality, when there is a path with a much higher We explore the impact of the training data volume on the probability than 2−n, an attacker may leverage it to infer converged cipher match rate. It shows that more training the subkey involved in the last q rounds. q is usually a data significantly improves the converged cipher match small value. rate. With 216 Hitag2 training pairs, the neural network 2) SUBKEYEXTRACTIONPHASE: The attacker is assumed achieves a cipher match rate of around 66%. When to know many plaintext pairs satisfying the above ∆0

The 2019 IEEE Conference on Dependable and Secure Computing 2 and their corresponding ciphertext pairs. The high prob- cascade network includes the extra full connection between ability of differential path ∆0 → ∆p guarantees that interval layers. We compare this cascade architecture with the there are much more pairs having the given difference ordinary fully connected architecture. ∆p at the p-th round than an arbitrary value. All of these The ability of neural networks also varies from their scale ciphertext pairs are partially decrypted to the p-th round and shape. Intuitively, a larger neural network with more by a guessed subkey in the last q round. parameters has a more powerful mimic capability. Is a fat but • If the guessed subkey is correct, then the calculated shallow neural network better or a deep but thin neural network differences at p-th round would match ∆p with the better for the cipher mimic task? To answer this question, expected high probability. we also perform experiments with two neural network models • If the guessed subkey is wrong, then the partial under opposite shape settings. decryption is meaningless. The attacker picks a new Choice of metrics. Given a n-bit ciphertext, we count the guess and repeats. correct prediction for every single bit independently instead of 3) EXHAUSTIVE SEARCH PHASE: With the known subkey the entire ciphertext. Therefore, we count the bitwise cipher bits identified in the above phase, the attacker aims to match rate as the neural network’s prediction accuracy. extract all the unknown key bits by trying all the pos- The converged training time and required minimum training sible key combinations, a partial brute-force approach. dataset are also important indicators of the mimic difficulty. The attack complexity required to crack the secret key According to our measurement, this training complexity varies represents the strength indicator. from different neural network architectures. However, there is Differential cryptanalysis relies on the identified multi- no single neural network architecture that always outperforms round high-probability differential paths. The entire path space others across all objectives. Therefore, we decide to quantify is too large to search exhaustively. Thus, human’s intuition and the cipher strength as its optimal attack complexity ever experiences are important. achieved. Mimicking ciphers by neural networks. In neural crypt- Best-effort strength evaluation For each cipher, we apply analysis, we assess the strength of a cipher by measuring how multiple neural networks in a network suite to mimic it. The easy it is for a neural network to mimic the cipher. Intuitively, best-effort mimicking metrics (i.e. highest accuracy and lowest an attacker should not compute ciphertexts from plaintexts, or complexity) are identified as its strength. Similar to traditional vice versa. Thus, a neural network (without the key) should not case, the strength is a relative notion to the attacking capability be able to compute ciphertexts, either. The cipher’s strength is – the neural network suite. We can never conclude a cipher reflected by how well an entity (a neural network) mimics the is absolute strong since more powerful attack may appear cipher’s operations. We test whether or not a neural network in the future. Similar limitations also exist for conventional trained on plaintext-ciphertext pairs can output the correct cryptanalysis. The neural network suite indicates the current ciphertexts on new plaintexts. Intuitively, in our model the attacking capability. easier a cipher algorithm can be mimicked by a well trained III. DEFINITIONS,METRICS, AND COMPUTATION neural network, the less secure this cipher is. Symmetric encryption is generally described as the relation- Choice of neural network architectures. Different net- ship C = E(P, k), where C denotes the ciphertext space, P work architectures have different advantages. Long short-term denotes the plaintext space, k is the secret key, and E(·) is memory (LSTM) is good at predicting sequence dependency the encryption algorithm. Traditional cryptanalysis evaluates relationships while convolutional neural network (CNN) is ciphers by attack complexity for cracking the key k with the good at extracting features within a small local area. Neither knowledge of P , C and E(·). of them fits the cipher algorithm case, as symmetric ciphers In neural cryptanalysis, we assume that E(·) is unknown. usually apply substitutions and among bits. Fully Therefore, we transfer the goal as mimicking the operation connectivity is able to represent these relationships between E(·, k), instead of cracking key k. The difficulty of mim- bits. Thus, we choose fully connected neural networks. icking algorithm E(·, k) is our security indicator. We choose Ideally, the relationship between the ciphertext and plaintext to represent E(·, k) as a trainable neural network. Like an should achieve: 1) each bit in the plaintext can have influences encryption algorithm, this neural network takes plaintexts as on all of the bits in the ciphertext and vice versa; 2) ciphertext input and outputs ciphertexts, without the knowledge of key is designed to be close to a randomly generated binary stream. k. Intuitively, stronger ciphers are more difficult for neural Therefore, we decide to choose the multi-layer fully connected networks to mimic. We formalize neural cryptanalysis as a neural network as the architecture and the softmax classifier 4-element tuple (M ,M ,N,S) in Definition 4. for each bit of the ciphertext stream. The full connection 1 2 ensures that each bit in the plaintext stream can have an impact Definition 4 (Neural-cryptanalysis System). A neural crypt- on every bit in the ciphertext. analysis system can be described as a tuple (M1,M2,N,S), Besides the regular fully connected neural network, we also where M1 and M2 are two mutual exclusive finite sets of evaluate cascade fully connected neural networks mentioned in plaintext-ciphertext pairs (p,c) that satisfy c = E(p, k), m n [9]. The ordinary one only fully connects adjacent layers. The p ∈ Z2 , and c ∈ Z2 . Z2 denotes the binary value space

The 2019 IEEE Conference on Dependable and Secure Computing 3 t {0, 1} and Z2 represents a binary stream of length t. N is a ciphertexts are generated by the neural network, set of neural networks trained by M1 and tested by M2. S m P P ′ 1 i=0 j∈[0,n−1],cij =c is the strength indicator generated by training and testing the Cmr = ij neural networks in N. m × n where c and c′ denote the correct and predicted j-th bit of Our cryptanalysis consists of three operations, CIPHER ij ij i-th ciphertext, respectively. DATA COLLECTION, MIMIC MODEL TRAINING, and SECU- RITY INDICATOR GENERATION. We briefly describe the first We represent the entire evaluation process of neural crypt- two operations, as they are straightforward. We present the analysis in Algorithm 1 CIPHER STRENGTH QUANTIFICATION operation in more de- tails. Algorithm 1: Pseudocode for neural cryptanalysis evalu- ation. Each plaintext is a m-bit binary stream. CIPHER DATA COLLECTION collects required plaintext- ciphertext pairs to train and test the neural networks using the Result: S = (Cmr,Compdata,Comptime) black-box cipher. We collect two sets of plaintext/ciphertext Initialization: Set Hyperparameter m1,m2,Cmrbase (2m1 is the number of plaintext-ciphertext pairs in M ; pairs, M1 for training and M2 for testing. 1 m2 2 is the number of pairs in M2; Cmrbase is the base MIMIC MODEL TRAINING trains multiple neural networks match rate.); to mimic the target cipher and to identify the one with the Cmr =0,Cmrimprove =0,Compdata = m1 − 1; m2 best performance. We train a neural network suite N. Each Generating M2: Randomly select 2 plaintexts and candidate neural network is trained with pairs M1. The neural obtain corresponding ciphertexts; network suite is the key to the maximum attack capability. while (Cmr 6 Cmrbase or Cmrimprove > 0) and We include three multi-binary classifier neural networks in Compdata

The 2019 IEEE Conference on Dependable and Secure Computing 4 Hidden Ciphertext Layer1 Hidden Hidden Softmax Softmax Plaintext Layer2 Layer4 0 Ciphertext Layer Ciphertext Layer 1 Hidden Hidden Hidden Hidden Plaintext Plaintext 0 Layer1 Layer2 Layer3 Layer4 0 · 0 1 1 · 1 · · · · · 1 · 0 · · · · · 0 1 1 1 · · · · · · · · · 1 · 1 · · · · · · · · · · · · · 1 · · · · · · · · · · · · · · · · · · · 1 · · 1 1 1 Hidden Hidden Softmax Layer1 Layer3 Layer (a) Deep and thin network (b) Fat and shallow network (c) Cascade network Fig. 2: Three different neural network architecture applied in experiments

1) How well do the three neural networks mimic cipher algorithms? 2) How do the various factors (e.g. network shapes, con- 1.0 nections, activation functions, and training data volume) impact the mimicking capabilities? 0.9 3) How strong is the cipher Hitag2 compared with a round-

reduced DES? 0.8 The three neural networks are implemented with Tensor-

flow. We use the sigmoid activation function as the default 0.7 choice except for the comparative experiments on activation Cipher Match Rate functions. We choose the softmax classifier and the cross- 0.6 entropy loss function for backward propagation optimization. The maximum training epoch is set to 350. The batch size 0 50 100 150 200 250 300 350 is 1000. We use a personal laptop to run the training tasks. Epoches We compare the attack capability between the three different neural networks: i) a fat and shallow neural network, ii) a thin Fig. 3: Predicted Accuracy on 1-round DES and deep neural network, and iii) a cascade neural network achieves the highest accuracy. It almost perfectly predicts the from [9]. ciphertext bits with the cipher match rate of 99.7%. The other two neural networks only achieve 75% cipher match rate, A. Impact of Different Networks on DES which is no better than the base match rate for one-round DES. To decide whether the attack succeeds, we use a base match It means that the two networks are not better than random rate as the baseline accuracy. In our experiment, when and guessing for the right 32 bits cryptographic function output. only when the cipher match rate is higher than the base match Therefore, we conclude that when the attack object is DES- rate, the attack is regarded as successful. like algorithms, the fat and shallow shaped network is the best Base Match Rate of 1-round DES The base match rate choice among the three neural networks. for most ciphers is 50%, which can be achieved by random guess statistically. However, we set the base match rate as B. Impact of Different Activation Functions on DES 75% for 1-round DES. The reason for which is explained Activation functions are the only non-linear components next. DES algorithm follows the Feistel structure. In such a of the neural networks. There are three common activation structure, cipher bits are separated into two equivalent length functions sigmoid, tanh and relu. We conduct comparative parts. In each round, only one part (half block) is updated experiments on the three activation functions. We use the by the cryptographic round function. The other half is only fat and shallow network in the comparative experiments for permuted by bit shifting. Therefore, we assume the half block activation functions. The result is shown in Fig. 4. They reach bits updated by the cryptographic functions is hard to guess similar cipher match rates. However, sigmoid function learns while the other half is easy to guess. In that case, the base much faster than tanh and relu functions. match rate for the difficult half is still 50%, however, the base match rate for the simple shifting half is 100%. Therefore, for C. DES Measurement Results the entire 64 ciphertext bits, the base match rate is 75%. Based on the former experiments, we identify the fat and We implement the round-reduced DES and collect two 217 shallow network with sigmoid activation as the optimal choice plaintext/ciphertext pairs. Half of them are used to train the among these settings for DES. We test higher rounds DES three neural networks. The other half is used to test the well- to observe the mimicking capability of this neural network. trained model. As shown in Fig. 3, the fat and shallow network The mimic results for 1-round, 2-round and 3-round DES are

The 2019 IEEE Conference on Dependable and Secure Computing 5 1.00 0.700

0.675 0.95 0.650

0.90 0.625

0.85 0.600

0.575

Cipher Match Rate 0.80 Cipher Match Rate 0.550

0.75 0.525

0.500 0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350 Epoches Epoches

Fig. 4: Influence of Activation Functions Fig. 6: The predicted accuracy of Hitag2 on two models algorithm as a n bits to n bits mapping and the 1.0 stream cipher algorithm as m bits to 1 bit mapping. Neural cryptanalysis treats the 2 cipher mimicking tasks the same. 0.9 Hitag2 only outputs 1 bit at a time. Our neural networks only have 1 softmax binary classifier rather than 64 classifiers as 0.8 in the DES cases. We implemented Hitag2 and collected 217 input/output 0.7 pairs. 216 pairs are used as training data and 216 pairs are

Cipher Match Rate used in the evaluation. We apply the fat and shallow network, 0.6 deep and thin network to mimic Hitag2 cipher. The cascade connection architecture is discarded because it does not show 0.5 any advantage over the ordinary full connectivity architecture 0 50 100 150 200 250 300 350 in the DES experiments. Epoches Figure 6 displays the increasing cipher match rate along the Fig. 5: Attack capacity summary for round-reduced DES training epochs for Hitag2. With the cipher match rates being higher than 50%, both the two networks mimic the Hitag2 shown in Fig. 5. (Only 1-round DES holds the base match successfully. The deep and thin network outperforms the fat rate as 75%. 2-round and 3-round DES have the base match and shallow one. However, this result is contrary to DES cases rate of 50%.) The fat and shallow neural network successfully in Section IV-A. The fat and shallow network shows more attacked 1-round DES and 2-round DES, with the cipher match power to attack round-reduced DES than the deep and thin rate higher than their base match rate. However, this network network. This observation suggests that there is no single one cannot mimic 3-round DES successfully. The cipher match rate optimal choice for all ciphers. stays at 50%. Therefore, this model cannot evaluate ciphers stronger than 3-round DES. The reason is likely the simplicity E. Impact of Training Data Size of the network used in these experiments. To evaluate more We measure whether or not a larger training data set can complex cipher algorithms, more sophisticated architecture substantially improve the converged cipher match rate. or larger scale neural networks with more parameters are In Section IV-D, with 216 pairs training data, the maximum required. We leave it as future work. cipher match rate of the fat and shallow neural network is around 66%. We increase the available training data to 220 D. Hitag2 Measurement Results pairs and test the converged model on the same test set. Fig. 7 Besides the round-reduced DES, we also conduct exper- shows that when training set increases to 220 from 216, the iments on Hitag2, which is a widely deployed stream ci- cipher match rate rises to around 98% from 66%. pher used in modern car key systems. Stream ciphers differ To compare the cipher strengths between two cipher primi- from block ciphers. Block ciphers (e.g. DES) encrypt n bits tives, we maximize their cipher match rates by increasing the plaintext into n bits ciphertext with a key. Stream ciphers training data. The maximum cipher match rate with its data consistently output an unlimited secret stream bit by bit. and time complexity are used for the comparison. The single output bit is calculated based on an m-bit binary stream held in an LFSR. Then, the state of the LFSR is F. Security Comparison w.r.t. Round-reduced DES automatically updated and another secret bit is output based We compare the of Hitag2 with respect to var- on the updated binary stream in the LFSR. We regard the ious round-reduced DES algorithms. As visualized in Fig. 8,

The 2019 IEEE Conference on Dependable and Secure Computing 6 V. DISCUSSION

1.0 We train neural networks to predict ciphertexts from plain- texts. One can use the approach to predict plaintexts from

0.9 ciphertexts. It is equivalent to mimic the reverse function of a cipher algorithm. HOICE OF ASE ATCH ATE 0.8 C B M R The base match rate is a manually defined baseline of cipher match rate. When the network fails to learn any useful information from the 0.7 plaintext-ciphertext pairs, the cipher match rate is always Cipher Match Rate around 50% for a large test set. A higher cipher match rate 0.6 indicates that useful information to predict the cipher bits is learned from the plaintext-ciphertext pairs. In the one-round 0.5 0 50 100 150 200 250 300 350 DES case, half cipher bits are updated by the cryptographic Epoches function while the other half cipher bits are shifted from the plaintext bits. Thus, we set the base match rate of 1-round Fig. 7: Influence on predicted accuracy by training size DES to be 75%, instead of 50%.

1.0 30 LIMITATIONS AND FUTURE WORK Although successfully mimicking Hitag2, the neural networks we use are simple. 25 These simple neural networks can only evaluate ciphers 0.8 weaker than 3-round DES. To evaluate more complex or 20 non-deprecated ciphers in the Federal Information Processing 0.6 Standard (FIPS) Publication 140-2, more powerful neural 15 networks are required. We plan to improve the attack capability 0.4 in the future work. It involves the following open problems. 10 HOW TO DESIGN THE NEURAL NETWORK TO FIT CIPHER Cipher Match Rate 0.2 MIMICKING TASKS? Cipher algorithms are composed of it- 5 Data and Time Complexity erated arithmetical and logical operations, which are not in

0.0 0 the traditional scope of neural networks. Current neural net- 1-round DES Hitag2 3-round DES work architectures are not designed for cipher relations. Fully connected architecture, although capable of mimicking the Fig. 8: Comparison of Hitag2 and Round-reduced DES. The complicated dependency between ciphertext bits and plaintext left y-axis shows the cipher match rate. The right y-axis shows bits, might not be the most efficient architecture. Customizing the minimum training data and time complexity required neural networks with fewer parameters to mimic the cipher to achieve such a cipher match rate. The data and time function is interesting future work. complexity is shown in log scale. HOW TO TRAIN THE NEURAL NETWORK EFFECTIVELY AND EFFICIENTLY? Symmetric ciphers are complex func- the optimal attack result for 1-round DES is (99.7%, 216, 219). tions. To mimic them, the neural network must include many The optimal attack result for Hitag2 is (98%, 220, 222). The 3- parameters. However, networks with too many parameters, are round DES is not attacked successfully with the 220 training difficult to train. Understanding the impact of loss functions data and 230 iterations. Therefore the strength of Hitag2 is and optimization algorithms is important. Which is the best located between the 1-round DES and 3-round DES, indicating choice combination for cipher mimic tasks? Can we customize very weak security. the loss function and optimization algorithm for this specific Summary of findings. From these experiments, we observe that task? These questions need to be answered by future work. the following findings. VI. RELATED WORK • Fully connected neural networks successfully mimic the The related work can be summarized into two categories, 1,2-round DES and Hitag2. cryptanalysis solutions and neural networks for cryptanalysis • For different ciphers, the most powerful mimicking neural research. networks may be different. This result indicates that one needs to try different neural networks when evaluating a A. Cryptanalysis for Symmetric Ciphers new unknown cipher. The specific approaches to evaluate cipher strengths vary • The network with Sigmoid activation function learns dramatically case by case. The representative methods include faster than Tanh and Relu. However, there is no substan- differential cryptanalysis [5], [2] and their tial difference in their converged cipher match rates. variants. The differential approach attacks DES with 237 • More training pairs significantly increase the converged and the linear approach attacks DES with 247 cipher match rate as expected. known-plaintexts complexity, respectively.

The 2019 IEEE Conference on Dependable and Secure Computing 7 In cyber-physical systems, the evaluation can only be done has the potential to automate the security evaluation for cipher with the help of [13]. Many real-world systems. CPS ciphers have been shown broken. The widespread DST40 REFERENCES cipher which is often used in immobilizer transponders was [1] E. Biham and A. Shamir, “Differential cryptanalysis of DES-like cryp- reverse-engineered and found broken [14]. Another widely tosystems,” Journal of CRYPTOLOGY, vol. 4, no. 1, pp. 3–72, 1991. [2] M. Matsui, “Linear cryptanalysis method for DES cipher,” in Workshop used CPS cipher, Hitag2, was also cracked after reverse- on the Theory and Application of Cryptographic Techniques. Springer, engineering [11], [15], [16]. CPS cipher Megamos Crypto 1993, pp. 386–397. was found to be vulnerable after reverse-engineering and [3] R. C.-W. Phan, “Impossible differential cryptanalysis of 7-round ad- vanced encryption standard (AES),” Information processing letters, cryptanalysis [17], [18]. The analysis found that there is a vol. 91, no. 1, pp. 33–38, 2004. flawed key generation which can reduce the exhaustive search [4] A. Biryukov and D. Khovratovich, “Related-key cryptanalysis of the full complexity from 296 to 257. Those flaws in CPS ciphers have AES-192 and AES-256,” in International Conference on the Theory and Application of Cryptology and . Springer, 2009, affected many real-world systems [16]. pp. 1–18. [5] E. Biham and A. Shamir, “Differential cryptanalysis of the full 16-round B. Neural Networks in Cryptanalysis DES,” in Annual International Cryptology Conference. Springer, 1992, pp. 487–496. The applications of deep learning in cryptanalysis are [6] S. EM-Microelectronic-Marin, “125khz crypto read/write contactless limited. One major direction is to the learning-aided side identification device,” EM, vol. 4170, pp. 1–13. channel attacks. [8], [19]–[21]. Researchers apply learning [7] A. M. Albassal and A.-M. Wahdan, “Neural network based cryptanalysis of a Feistel type block cipher,” in International Conference on Electrical, approaches to replace the traditional statistical profiling phase Electronic and Computer Engineering, 2004. ICEEC’04. IEEE, 2004, in side-channel attacks. This approach differs greatly from pp. 231–237. our approach. The deep learning approach only acts as one [8] Z. Martinasek, P. Dzurenda, and L. Malina, “Profiling attack based on MLP in DPA contest v4. 2,” in 2016 39th International component of the entire side-channel attack. Therefore, the Conference on Telecommunications and Signal Processing (TSP). IEEE, analysis still requires the cipher algorithm. 2016, pp. 223–226. There are also a few applications in traditional cryptanalysis [9] M. M. Alani, “Neuro-cryptanalysis of DES and triple-DES,” in Interna- tional Conference on Neural Information Processing. Springer, 2012, which do not rely on side-channel information. [22] lever- pp. 637–646. ages neural networks to classify ciphertexts generated from [10] ——, “Neuro-cryptanalysis of DES,” in World Congress on Internet different cipher algorithms. They successfully distinguish the Security (WorldCIS-2012). IEEE, 2012, pp. 23–27. [11] R. Verdult, F. D. Garcia, and J. Balasch, “Gone in 360 seconds: ciphertexts of Enhanced RC6 and SEAL. [7] uses neural Hijacking with Hitag2,” in Presented as part of the 21st USENIX networks to determine the correct key in the key recovery Security Symposium USENIX Security 12), 2012, pp. 237–252. attack of traditional cryptanalysis. Their attack is successful [12] I. Wiener, “Hitag2 pcf7936/46/47/52 stream cipher reference implemen- tation,” August 2007. on the 2-round and 3-round HypCipher. However, similar to [13] K. Nohl, D. Evans, S. Starbug, and H. Pl¨otz, “Reverse-engineering a the learning-aided side-channel attacks, it does not change the cryptographic RFID tag.” in USENIX security symposium, vol. 28, 2008. white-box algorithm assumption which the traditional attack [14] S. Bono, M. Green, A. Stubblefield, A. Juels, A. D. Rubin, and M. Szydlo, “Security analysis of a cryptographically-enabled RFID relies on. device.” in USENIX Security Symposium, vol. 31, 2005, pp. 1–16. The work most relevant to ours is [9], [10]. The work treats [15] A. Tekian, “Doctoral programs in health professions education,” Medical DES as a black-box. A cascade neural network structure is teacher, vol. 36, no. 1, pp. 73–81, 2014. [16] F. D. Garcia, D. Oswald, T. Kasper, and P. Pavlid`es, “Lock it and still applied to directly take the plaintexts as input and trained to lose it on the (in) security of automotive remote keyless entry systems,” output corresponding ciphertexts. The author claimed to have in 25th USENIX Security Symposium (USENIX Security 16), 2016. successfully attacked DES and 3-DES. However, the results [17] R. Verdult, F. D. Garcia, and B. Ege, “Dismantling Megamos crypto: Wirelessly lockpicking a vehicle immobilizer,” in Supplement to the are not reproducible by us. In addition, the parameters of Proceedings of 22nd USENIX Security Symposium (Supplement to their networks are too few to approximate the full DES and USENIX Security 15), 2015, pp. 703–718. 3-DES. Our work presents a universal metric to enable the [18] R. Verdult and F. D. Garcia, “Cryptanalysis of the Megamos crypto automotive immobilizer,” USENIX; login, vol. 40, no. 6, pp. 17–22, direct comparison between ciphers, which is new. Applying 2015. neural cryptanalysis to CPS ciphers is also new. [19] G. Hospodar, B. Gierlichs, E. De Mulder, I. Verbauwhede, and J. Vande- walle, “Machine learning in side-channel analysis: a first study,” Journal VII. CONCLUSIONS of Cryptographic Engineering, vol. 1, no. 4, p. 293, 2011. [20] L. Lerman, G. Bontempi, O. Markowitch et al., “Power analysis attack: In this paper, we proposed a new approach to evaluate the an approach based on machine learning.” IJACT, vol. 3, no. 2, pp. 97– strength of symmetric ciphers in a black-box fashion without 115, 2014. [21] E. Cagli, C. Dumas, and E. Prouff, “Convolutional neural networks knowing the specific algorithm. We defined quantitative met- with data augmentation against jitter-based countermeasures,” in Interna- rics to capture a cipher’s strength. These metrics are calculated tional Conference on Cryptographic Hardware and Embedded Systems. from the difficulty to attack the cipher via neural networks. Springer, 2017, pp. 45–68. [22] B. Chandra and P. P. Varghese, “Applications of cascade correlation Using Hitag2 and round-reduced DES algorithms, we ex- neural networks for cipher system identification,” World Academy of perimentally demonstrated that our metric and methodology Science, Engineering and Technology, vol. 26, pp. 312–314, 2007. are practical and useful to quantify cipher strengths and allow one to directly compare between ciphers. We showed how various factors associated with the neural networks can impact the analysis outcomes. This new neural cryptanalysis approach

The 2019 IEEE Conference on Dependable and Secure Computing 8