Neural Aided Statistical Attack for Cryptanalysis Yi Chen, Yantian Shen, Hongbo Yu
Total Page:16
File Type:pdf, Size:1020Kb
1 Neural Aided Statistical Attack for Cryptanalysis Yi Chen, Yantian Shen, Hongbo Yu Abstract—In CRYPTO’19, Gohr proposed the first key recov- exist in the prepended differential. Up to date, there are no ery attack based on deep learning, which opens the direction of solutions for these two problems. neural aided cryptanalysis. Up to date, neural aided cryptanalysis In this paper, we have explored neural aided cryptanalysis still faces two problems: (1) its attack complexity estimations rely purely on practical experiments. There is no theoretical and made contributions as follows. framework for estimating theoretical complexity. (2) it can not • We figure out the reason why Gohr’s attack does not allow work when there are no enough neutral bits that exist in the the adversary to estimate the theoretical complexities. prepended differential. To our best knowledge, we are the first In [9], each key guess corresponds to a key rank score that to solve these two problems. In this paper, our work contains is directly determined by the output of ND. A key guess three aspects: (1) we propose a Neural Aided Statistical Attack (NASA) that is a generic neural aided cryptanalysis technique. is returned as a candidate when its key rank score exceeds NASA allows us to estimate the theoretical complexities without a threshold. Since the output of ND is unpredictable, performing practical experiments. Moreover, NASA does not rely the threshold is set without any theoretical basis. Then on any special properties including the neutral bit. (2) we propose it is impossible to estimate the attack success rate and three methods for reducing the attack complexities of NASA. One theoretical complexities unless practical experiments are of the methods is based on a newly proposed concept named Informative Bit that reveals an important phenomenon related executed. to neural distinguishers. (3) We prove the superiorities of NASA • We propose a Neural Aided Statistical Attack (NASA), based on applications to round reduced DES and Speck32/64. which supports theoretical complexity estimation and Our work arguably raises a new direction for neural aided does not rely on neutral bits. The theoretical basis of cryptanalysis. NASA is as follows. By analyzing the key recovery Index Terms—Neural Aided Statistical Attack, Deep Learning, process, we find that there are only four different sce- Cryptanalysis, Informative Bit, DES, Speck narios. A statistic designed in this article obeys a normal distribution in each scenario. Based on this statistic, the key recovery is transformed into the distinguishing I. INTRODUCTION between two normal distributions, which tells the required Deep learning has received much expectation in the cryp- data complexity of NASA. tography community since the last century. Rivest in [1] • We propose three methods to reduce the attack complex- reviewed various connections between machine learning and ities of NASA. The first one is reducing the key space cryptography. Some possible directions of research in crypt- by building ND on partial ciphertext bits. The initial analytic applications of machine learning were also suggested. ND proposed by Gohr takes the complete ciphertext Greydanus proved that a simplified version of Enigma can be pair as input, which forces the adversary to guess all simulated by recurrent neural networks [2]. the key bits simultaneously. Adopting the new ND, the Although deep learning has shown its superiorities in var- adversary guesses partial key bits at a time. The second ious fields such as computer vision [3], natural language one is a highly selective Bayesian key search algorithm. It processing [4], and smart medical [5], its application in the allows the adversary to search key guesses that are most field of conventional cryptanalysis has been stagnant. A few likely to be the right key instead of traversing all the key valuable applications are only concentrated in the side-channel guesses. The third one is reducing the data complexity by analysis [6]–[8]. exploiting neutral bits. When there are available neutral In CRYPTO’19, Gohr proposed a deep learning-based dis- bits, the data complexity of NASA can also be reduced. tinguisher [9] that is also called neural distinguisher (ND). • We perform experiments on DES and Speck32/64 to By placing a differential before ND, Gohr developed a key fully analyze NASA. First, experiments on DES prove recovery attack on 11-round Speck32/64. Gohr’s attack shows that NASA can attack a cipher covering more rounds considerable advantages in terms of attack complexities over than Gohr’s attack when there are no enough neutral the traditional differential attack, which opens the direction of bits. Second, the experiment on Speck32/64 shows that neural aided cryptanalysis. NASA can achieve comparable performance with Gohr’s However, Gohr’s attack faces two problems. First, it can not attack when the proposed three optimization methods are be applied to the theoretical security analysis of a cipher. More available. exactly, we do not know what the required data complexity is to attack a specific cipher. We can estimate the attack Organization Section III presents the neural aided statisti- complexities and success rate empirically only if the attack is cal distinguisher. Section IV summarizes NASA and provides finished within an acceptable running time. Second, when we the correctness verification. The three optimization methods add a differential before ND for attacking a cipher covering are introduced in Section V, VI, VII. Applications to DES more rounds, it requires that enough neutral bits [10] must and Speck32/64 are presented in Section VIII, IX. 2 II. RELATED WORK Algorithm 1 Basic version of Gohr’s key recovery attack Require: k neutral bits that exist in ∆P ! ∆S; Let (P0;P1) denote a plaintext pair with difference ∆P . The An ND built over ∆S; corresponding intermediate states, ciphertexts are (S0;S1), A key rank score threshold, c1; (C0;C1). A maximum number of iterations. Ensure: A possible key candidate. A. Neutral Bit 1: repeat ∆P ! ∆S E 1 1 1 1 Consider a differential . Let denote the 2: Random generate a plaintext pair (P0 ;P1 )jP0 ⊕ P1 = encryption function covering the differential. For any plaintext ∆P ; pair (P0;P1) conforming to the differential, if the following 3: Create a plaintext structure consisting of 2k plaintext condition always holds pairs by k neutral bits; i i E(P ⊕ ej) ⊕ E(P ⊕ ej) = ∆S; ej = 1 j; 4: Collect corresponding ciphertext pairs, (C0;C1); i 2 0 1 f1; ··· ; 2kg; the j-th bit is called a neutral bit [10]. 5: for each key guess kg do k Based on k neutral bits fj1; ··· ; jkg and a plaintext pair 6: Partially decrypt 2 ciphertext pairs with kg; (P0;P1)jP0 ⊕P1 = ∆P , we can generate a plaintext structure 7: Feed decrypted ciphertext pairs to ND and collect k consisting of 2 plaintext pairs. Once (P0;P1) satisfies the the outputs; k differential, the remaining 2 − 1 plaintext pairs also conform 8: Calculate the key rank score vkg based on collected to the differential. outputs; 9: if vkg > c1 then B. Neural Distinguisher 10: stop the key search and return kg as the key candidate; The target of ND [9] is to distinguish two classes of 11: end if ciphertext pairs 12: end for 1; if S0 ⊕ S1 = ∆S 13: until a key candidate is returned or the maximum number Y (C0;C1) = ; (1) 0; if S0 ⊕ S1 6= ∆S of iterations is reached. where Y = 1 or Y = 0 is the label of (C0;C1). If the difference between S0 and S1 is the target difference ∆S, the pair (C0;C1) is regarded as a positive sample drawn from The rank score is likely to exceed c1 only when the plaintext the target distribution. Or (C0;C1) is regarded as a negative structure passes the prepended differential and kg is the right sample that comes from a uniform distribution. key. If the plaintext structure does not pass the differential or N the key guess is wrong, the rank score should be very low. A neural network is trained over 2 positive samples and N Thus, the right key can be identified by comparing the rank 2 negative samples. The neural network can be used as an ND if the distinguishing accuracy over a testing database is score with a threshold. When the performance of ND is weak, higher than 0:5. 2k needs to be large. Then more neutral bits are required. Let NDh denotes a neural distinguisher against the cipher The application of Gohr’s attack is limited by two aspects: reduced to h rounds. Given a sample (C0;C1), ND will output • The adversary can not estimate the required attack com- a score Z which is used as the posterior probability plexities and success rate theoretically. Since the output P r(Y = 1 j(C0;C1)) = Z = ND(C0;C1); 0 6 Z 6 1 Z of ND is unpredictable, the threshold c1 is set without (2) any clear theoretical basis. Then it is unknown how many When Z > 0:5, the predicted label of (C0;C1) is 1 [9]. plaintext pairs a plaintext structure should contain. As a In [9], a Residual Network (ResNet) is adopted by Gohr. result, the success rate under an attack setting is also The training pipeline can refer to [9]. unknown. • If the number of neutral bits is not large enough, Gohr’s C. Gohr’s Key Recovery Attack attack does not work. Algorithm 1 summarizes the core idea of the basic version (unaccelerated version) of Gohr’s key recovery attack [9]. Decrypting 2k ciphertext pairs drawn from the target or D. Distinguishing between Two Normal Distributions uniform distribution with a key guess kg, the adversary uses Here, we present the details of the distinguishing between the following formula two normal distributions.