<<

Communications in Computational and Applied Mathematics, Vol. 1 No. 1 (2019) p. 1-7

CCAM Communications in Computational and Applied Mathematics

Journal homepage : www.fazpublishing.com/ccam

e-ISSN : 2682-7468

An Intelligence Brute Force Attack on RSA

Chu Jiann Mok1, Chai Wen Chuah1,*

1Information Security Interest Group (ISIG), Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia 86400 Parit Raja, Batu Pahat, MALAYSIA

*Corresponding Author

Received 27 November 2018; Abstract: RSA cryptosystem is the one of the public cryptosystem used for secure data Accepted 01 March 2019; transmission. In this research, an intelligence brute force attack is proposed for attacking RSA Available online 07 March cryptosystem cryptanalysis. In the paper, the effectiveness in cryptanalysis is simulated. An 2019 experimental analysis of the proposed approach is carried out in order to evaluate its effectiveness in terms of time used for recovery RSA key. Performance of proposed algorithm compared with prime factorization and brute force attack on RSA toy box. The results are used to estimate the time required for real-life RSA cryptosystem in ideal conditions. Keywords: RSA cryptosystem, public key cryptosystem, brute force, cryptanalysis

1. Introduction Table 1 - Relationship between number of bits used in RSA was invented by Ron Rivest, Adi Shame and RSA cryptosystem and prime-to-nature numbers ratio Leonard Adleman in 1977. RSA is a public key cryptosystem Maximum nature No. Prime Prime-to- for securing data transmission. RSA is asymmetric number numbers nature cipher, which consists of two keys: public key ( ,ne ) and numbers ratio private key ( ,pqd , ) . The value n is depending on product of p 16 6 37.50 and q. Hence, the p and q show a reciprocal relationship 32 11 34.38 between each other. At least one of p or q will less than n . In 64 18 28.13 this case, adversary can use prime factorization to obtain the 128 31 24.22 smaller value from p and q. During this process, adversary can 256 54 21.09 filter most composite numbers to increase cryptanalysis speed. 512 97 18.94 However, the ratio of prime numbers decreases slightly when 1024 172 16.80 number of bits increases to certain level. Table 1 shows the 2048 309 15.09 relationship between number of bits, and prime-to-nature 4096 564 13.77 numbers ratio. 8192 1028 12.55 In Table 1, the probability to getting a prime number is 16384 1900 11.60 decreased from 0.375 (4 bits) to 0.0590 (26 bits), which means 32678 3512 10.72 that getting a random prime number from a large nature number 65536 6542 9.98 pool becomes more difficult, but the difficulty to obtain a 131072 12251 9.35 random prime number becomes consistent when the total 262144 23000 8.77 natural number increased. 524288 43390 8.28 Brute force attack is a generic cryptanalysis that breaking 1048576 82085 7.82 cryptosystem by testing all of the possible key. This is typically 2097152 155611 7.42 time consuming. Therefore, this research proposed an 4194304 295947 7.06 intelligence brute force attack which may provide a faster way 8388608 564163 6.73 to perform cryptanalysis on RSA cryptosystem as compared to 16777216 1077871 6.42 traditional brute force attack. This research performs 33554432 2062993 6.15 simulation of cryptanalysis of RSA toy box. The RSA toy box 67108864 3956855 5.90 varies from 8-bit to 32-bit. Hence, predict time required for 100000000 5761455 5.76 larger sized RSA cryptosystem. To implement part of the 110000000 6308388 5.73 intelligence brute force attack algorithms, the main programming language used is C and C++.

*Corresponding author: [email protected] 2019 FAZ Publishing. All right reserved. Mok, C. J. et al. Communications in Computational and Applied Mathematics, Vol. 1 No. 1 (2019) p. 1-7

2. Literature Review cryptosystem cracked [5]. Padding scheme is another possible is the discipline of writing a message in restriction for implementation RSA cryptosystem widely. To enhance the security strength of RSA cryptosystem, the ciphertext, usually by a translation from plaintext according to 0.292 some frequency changing key text, with the aim of protecting a requirements of d is d > n [6]. For a typical 1024-bit RSA cryptosystem, the minimum value of d is d > n0.292, this is an secret from adversaries, interceptors, intruders, interlopers, 90 eavesdroppers, opponents, and enemies [1, 2]. The professional extremely large number (approximately 1.018 x 10 ). For a cryptography protects not only for the plaintext, but also for the message length of 1024-bit, the minimum number of the bit key, and more generally, tries to protect the whole required to perform modular exponentiation will be extremely cryptosystem. Before modern era, designation of cryptography large (≈306176 bits). The minimum number of bit required is is by taking advantages from literacy, such as Caesar cipher or calculated by Equation (4): "Yin" book. When the public literacy increases, the risk of (1024299) ciphertext disclosure increases. In computer era, efficiency of log2221024299log2 306176 (4) information spreading enhanced with the existence of internet. Thus, Kerckhoff’s principle was proposed in the 19th century. In order to enhance security strength of RSA encryption, In Kerckhoff’s principle, a secure cryptosystem should be, a padding scheme is deployed. Padding scheme must be one- everything about of the cryptosystem is public accessible, but to-one function to avoid disambiguation of message. If padding neither encryption key nor decryption key [3]. scheme algorithm is formularized, once the adversary obtains the padding scheme algorithm, a new padding scheme 2.1 RSA Cryptosystem algorithm must be generated to replace the old algorithm [7, 8]. RSA encryption is the first successful algorithm applied in If a padding scheme algorithm is substitution-based, it required 1024 public key encryption. This encryption algorithm is proposed a huge memory space to store all the 2 possible input values. and invented by Rivest et. al. in MIT, and published in 1978 Although substitution-based padding scheme algorithm will [4]. The RSA encryption process is as follows: increase difficulties for adversaries, however, it greatly reduces • Public key generation and private key generation efficiency for sender and receiver. Sender and receiver need to Public key and private key are important in public key of search the padding scheme before performing encryption, or after performing decryption. The searching difficulty is RSA cryptosystem. Public key is in the form of ( ,ne ) whereas proportional to number of possible inputs. n is the semi factor product of two prime numbers used in RSA cryptosystem is a public key cryptosystem. The private key. e is the encrypt exponent, which must co-prime public key is available to both legitimate users and illegitimate with ()n . Private key is in the form of ( ,pqd , ) , where p and users. If an adversary is able to encrypt every possible message q are prime numbers. d is the decryption exponent. The value, the adversary can obtain the plaintext from the ciphertext relationship between e and d is shown in Equation (1). intercepted by comparing the ciphertext. It makes the RSA vulnerable to chosen plaintext attack. Thus, in real life edmodnde1(())  1 (1) application, RSA cryptosystem is used to encrypt the keys of symmetric ciphers, such as AES. Due to the efficiency of Equation (1) shows that d is the modulo inverse of e, under symmetric ciphers, they are usually used to encrypt large size data [9]. In daily life, RSA cryptosystem is used to encrypt the mod . keys of symmetric ciphers such as AES, or generate digital • RSA encryption algorithm signatures, and the symmetric ciphers used to encrypt the large To encrypt the message via RSA, sender needs to turn the size data. message, M, into an integer value m, which 0 ≤ m ≤ n by using a padding scheme. Once M is converted into numeric m, the 2.3 Proof of Correctness ciphertext c will be generated by modular exponentiation as in Equation (2). In this section, Fermat’s Little theorem and Euler’s c me ( mod ( n )) (2) theorem are used to describe the RSA cryptosystem’s functionality. Fermat’s little theorem was used to prove the correctness of RSA cryptosystem’s functionality by Rivest et. Equation (2) shows that e is modulus equivalence with al. [10]. mod (n). • RSA decryption algorithm Theorem 1 (Fermat’s little theorem). If p is a prime number, Once the recipient received the ciphertext, the recipient then for any integer a, the number ap-a is an integer multiple of can decrypt the ciphertext again by modular exponential with p. In the notation of modular arithmetic, this is expressed as his/her private key, as shown in Equation (3). p aap (mod) . m ce ( mod ( n )) (3) Proof: From Equation (1), it can be concluded that:

2.2 Restriction in Implementation edpq1(mod(1)(1)) (5) RSA cryptosystem uses modular exponentiation to reduce the probability of confidential message disclosed. However, Rewrite Equation (5), we obtain during the encryption and decryption process, general purposed computer is unable to process the high computational power ed1  k ( p  1)( q  1), k  0, k  (6) required in key generation and modular exponential during the encryption. This makes RSA cryptosystem is infeasible in ed general purposed or larger message transmitting. Besides that, To check the congruency mod pq of m , and m is RSA cryptosystem is vulnerable to chosen plaintext attack. equivalent to check they are congruent mod p and mod q This increases the risk of the cryptosystem cracked. Padding separately. Two cases are considered, which are shown in scheme is another possible restriction for implementation RSA Equations (7) and (8), respectively. 2 Published by FAZ Publishing http://www.fazpublishing.com/ccam Mok, C. J. et al. Communications in Computational and Applied Mathematics, Vol. 1 No. 1 (2019) p. 1-7

mp 0 ( m o d ) (7) mp 0 ( m o d ) (8)

For case 1,

mmped 0(mod) (9)

For case 2, mmmmededhpq(1)(1)(1) m (10) ()1(mod)mm( ph1)(1)(1) qh q mp

Substitution p with q will complete the proof that, for any 2 d Fig. 1 - Pseudocode for integer m, (m )  mpmod .

Theorem 2 (Euler’s theorem)[1, 2]. If n and a are coprime prime number will be generated again as e, to ensure that e and positive integers, then aφ (n)≡1 (mod n). n is coprime. If e equals to p or q, a random prime number will have to be generated again to replace it. Proof: Since e and d are positive, 3.2 Random Number Generation edknk1(),  (11) The simulation program’s random number generation is determined by time, which takes seed by current time. Assuming m is relatively prime to n, Equation (11) can be However, due to extremely short computer processing speed, written as: the number generator will generate a value that is same with previous generated number. The pseudocode and flow chart for mmmedhnnhh mmmn1()() ()(1)(mod) (12) random number generation is shown in Fig. 1.

If m is not relative prime to n, the argument is invalid. 3.3 Bruce Force Attack However, the probability of m is not co-prime with n is too low. Brute force attack is trying all possible private keys. In the Thus, the congruence is still true. simulation, the adversary is given as the public key and ciphertext. The adversary will try to brute force all possible 2.4 General Brute Force Attack on RSA Cryptosystem keys to decrypt the ciphertext. At the beginning, the simulation Brute force attack is a generic attack that can be performed generates a random public key. Next, the simulation generates on RSA cryptosystem to find the private key (,,)n p d . Brute a random M value to encrypt with different sized toy box RSA cryptosystem. The encrypted M value will decrypt by all force attack is a time-consuming attack due to the large sample possible private keys until the decrypted value is equal to the space of possible keys that have to search. Brute force attack original M value. The pseudocode for brute force algorithm is on RSA can be classified into two categories: padding scheme shown in Fig. 2. based and private key based. Padding scheme based brute force attack requires large amount of memory to build a table listing 3.4 Prime Factorization Algorithm of all possible plaintext, message value, and its corresponding ciphertext. Private key focused brute force attack can be further Prime factorization attack attempts to recover the p or q classified to two types: p, q-oriented and d-oriented. In d- values from private key. This method focusses on factorization oriented attack, the major problem is the ability to identify the of n value of public key. In the simulation, adversary is message decrypted is exactly same with the message send from provided as public key only. At the beginning, the simulation sender, since RSA cryptosystem is mainly used to encrypt the generates a random public key. Next, the simulation program key of other cryptosystem, such as AES encryption. p and q- performs n-value factorization. If one of the n factor is found, oriented brute force attack is more focus on integer the simulation timer will stop for next simulation. The factorization that tries to factorize from the n value in public pseudocode for prime factorization algorithm is shown in Fig. key. When one of p or q value is found, the whole private key 3. will have to be revealed.

3. Research Design This section describes the methods used to perform comparison between generic brute force attacks, prime factorization attacks, and proposed attack on RSA cryptosystem.

3.1 Key Generation In the simulation, the program will generate 2 random numbers, set as p and q. If p and q are not prime number, program will jump to the smallest prime number that is greater than p and q to replace it. After p and q are generated, a random Fig. 2 - Pseudocode for brute force algorithm 3 Published by FAZ Publishing http://www.fazpublishing.com/ccam Mok, C. J. et al. Communications in Computational and Applied Mathematics, Vol. 1 No. 1 (2019) p. 1-7

Table 2 – The goal for each attack simulation Type of Information Goal to stop the timer attack provided Bruce force n,e, C When the correct d is found Prime factorization n When p or q is revealed Proposed When d is found within 1 method N iteration

Thus, it can be concluded that:

mmns  (mod) m..(mod) mm(1)1s mn (16) mmodns1 1()

The value of e and d must be integer, thus, s and its continuous value (,21,32,...)sss  must divisible by e. Thus, first s that is divisible by e, its quotient is the decryption key d.

Fig. 3 - Pseudocode for prime factorization 3.6 Simulation 3.5 The Proposed Algorithm The simulation generates a public key with ( ,p , ) q e . The simulation will test 25 public keys from 4 to 20 bits toy box In the proposed algorithm, simulation program tries to find RSA. The toy box RSA larger than 32 bits is unable to be the minimum d value when m = 2, and set it as s. After s is implemented due to the maximum 64-bit bus line. If toy box found, divide skskkksk11,,1  with e. If RSA implemented with more than 32-bits key, buffer overflow skskk1,1, is divisible to e, that’s means d value is may occur. For all simulations, the maximum record time is 30 found. The pseudocode is shown in Fig. 4. minutes. Simulations for each attack have their own goals. The To prove the correctness of the proposed algorithm, goals for each type of attacks are shown in Table 2. consider the key generation where If the simulation for 25 public keys consumes more than 20 minutes, the simulation program will terminate manually to edn1(mod) (13) stop the possible infinity loop. The recorded data will be analysed.

That means that, 3.7 Result Analysis

ednkkk1,0, (14) During result analysis, various calculations are deployed depending on the result’s data points characteristics. Logarithm

is used to enhance calculation speed. When we performing encryption and decryption, we found that: 4. Results and Discussion

mmneds  (mod) (15) During the simulation, some assumptions are made: 1. The relationship between the time used for

s s cryptanalysis and size of toy box RSA is assumed as continuous and m satisfies mmn (mod). relationship to perform extrapolation. 2. In the result, some of the values (outliers) may disturb the graph’s property. Outliers are accepted in the result, but outliers will not be used in result analysis. 3. Every key pairs generated in each simulation counts as 1 iteration, even the key pairs generated was repeated.

4.1 Bruce Force Attack For the brute force attack, the simulation program is trying to recover the decryption exponent, d. Once the decryption exponent is recovered, a new key will be generated for cryptanalysis. The simulation timer stops when 25 decryption exponent recovered, or 30 minutes elapsed. Table 3 shows the results of time test from Test 1 to Test 5. The time used for cryptanalysis per key recovery is illustrated in Fig. 5. From this figure, it can be found that the time used for cryptanalysis is increases exponentially when the number of bits increased. Fig. 6 presents the same results in log base 10. The blue Fig. 4 - Pseudocode for the proposed algorithm line shows the changing of time in log base 10. However, as compared to red line, we can consider the blue line is linear. 4 Published by FAZ Publishing http://www.fazpublishing.com/ccam Mok, C. J. et al. Communications in Computational and Applied Mathematics, Vol. 1 No. 1 (2019) p. 1-7

Table 3 – Time required to perform 25 times of brute force attack Number of bits Time used (seconds) Test 1 Test 2 Test 3 Test 4 Test 5 Average 4 0.003 0.003 0.003 0.003 0.004 0.0032 5 0.013 0.003 0.007 0.018 0.022 0.0132 6 0.116 0.197 0.105 0.108 0.094 0.124 7 0.287 0.3 1.945 0.451 0.336 0.6638 8 3.264 17.289 9.263 9.996 6.658 9.294 9 165.126 375.255 237.083 129.82 128.547 207.1662 10 1104.596 2257.753 Abort Abort Abort 1688.013

Equation (19) is the time used for brute forcing one RSA private key.

4.2 Prime Factorization For the prime factorization, once the simulation program obtained the smaller factor of semi-prime n, the simulation program will generate a new public key for cryptanalysis. The simulation timer stops when 100 smaller factor of semi- prime n is found or time elapsed more than 30 minutes. The 30 minutes’ time limit was set to prevent the possible infinite loop during the simulation. Table 4 shows the result of time test for prime factorization from Test 1 to Test 5. In all simulations, all Fig. 5 – Time used per key recovery for brute force RSA toy box with bit size more than 20 return no result due to attack exceeding the 30 minutes’ time limit. Fig. 7 presents the time used for cryptanalysis per key recovery using prime factorization, where the exponential increasing of time used for cryptanalysis is observed with the increasing number of bits. The same results in log base 10 is illustrated in Fig. 8.

Fig. 6 – Time used per key recovery for brute force attack (log base 10)

To calculate the time used for real-life RSA cryptosystem, we use (4, -2.49485) and (10, 3.227376) as reference for the linear equation. Hence, from ymxc, Fig. 7 – Time used per key recovery for prime

factorization yy10  3.227376( 2.49485) m  0.95370 (17) xx10104

Substitute x  4 and y 2.49435 into the linear equation, we obtain c 6.30917 and hence, yx0.953706.30917 . To obtain the time used for x-bits RSA cryptosystem, let

yt log (18) t 1010yx0.953706.30917

If t is measured in year,

100.95370xx 2.982702 10 0.95370 2.982702 t 100.95370x 12.808 (19) 3153600 106.498807 Fig. 8 – Time used per key recovery for prime factorization (log base 10)

5 Published by FAZ Publishing http://www.fazpublishing.com/ccam Mok, C. J. et al. Communications in Computational and Applied Mathematics, Vol. 1 No. 1 (2019) p. 1-7

Table 4 – Time required to perform 25 times of prime factorization attack on toy box RSA Number of bits Time used (seconds) Test 1 Test 2 Test 3 Test 4 Test 5 Average 4 <0.001 <0.001 <0.001 <0.001 <0.001 0.0032 8 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 9 0.015 0.015 <0.001 0.015 <0.001 0.009 10 0.016 0.016 0.016 <0.001 0.016 0.0128 11 0.031 0.047 0.031 0.031 0.032 0.0344 12 0.094 0.078 0.093 0.093 0.079 0.0874 13 0.25 0.282 0.282 0.265 0.219 0.2596 14 0.625 0.687 0.703 0.641 0.797 0.6906 15 2.172 2.016 2.093 2.36 2.031 2.1344 16 5.844 6.09 5.704 5.953 6.407 6.0004 17 17.532 18.235 17.501 17.361 18.282 17.7822 18 51.065 56.299 55.284 56.878 52.659 54.437 19 181.463 179.431 185.836 182.493 170.852 180.015 20 528.979 499.149 539.51 514.103 543.963 525.1408

Table 5 – Time required to perform 25 times of proposed algorithm attack on toy box RSA Number of bits Time used (seconds) Test 1 Test 2 Test 3 Test 4 Test 5 Average 4 <0.001 <0.001 <0.001 <0.001 <0.001 <0.001 8 0.016 <0.001 0.016 <0.001 0.016 0.000384 9 0.0154 0.0024 0.014 0.024 0.018 0.0188 10 0.081 0.081 0.061 0.067 0.085 0.075 11 0.146 0.235 0.268 0.257 0.205 0.2222 12 1.031 0.812 0.937 0.89 1.047 0.9434 13 2.891 3.047 3.812 2.829 4.828 3.4814 14 15.485 13.219 16.251 15.642 13.063 14.732 15 47.862 62.409 53.331 44.987 42.909 50.2996 16 208.775 196.681 171.086 187.243 238.715 200.5 17 1121.773 1842.823 1773.799 1240.897 1601.252 1516.109

In Fig. 8, the blue line shows the changing of time in log base 10. However, as compared to the red line, we can consider the changing behavior is linear. However, the data from 4 bits to 10 bits do not follow most of the points. In this case, points from 4 bits to 10 bits are considered as outliers. To calculate the time used for real-life RSA cryptosystem, we use (11, -2.86138) and (20, 1.322336) as reference for the linear equation. Repeating the similar calculation as in Equations (17)-(19), the time used to recover one RSA private key by prime factorization is given by

0.12208814.47365x t 10 (20) Fig. 9 – Time used per key recovery for the proposed 4.3 The Proposed Algorithm algorithm The purpose of time test is to determine time consumed when performing proposed algorithm on toy box RSA with single iteration, and input parameter i = 2. Table 4 5 shows result of the proposed algorithm time test from Test 1 to Test 5. In the proposed algorithm time test, the simulation runs only 1 iteration with parameter i = 2. Fig. 9 shows the time used to recover one RSA private key using the proposed method, which is an intelligent brute force attack. From this figure, it can be seen that the time used for cryptanalysis is increasing exponentially when the number of bits increased. Thus, we present the results in log base 10, as shown in Fig. 10.

Similarly, in comparison with the red line, the changes of Fig. 10 – Time used per key recovery for the blue line in Fig. 10 is considered as linear, with points from 8 proposed algorithm (log base 10) bits and 17 bits are considered as outliners. To calculate the time used for real-life RSA cryptosystem, we use (9, -3.12378) the calculation, the time used to recover one RSA private key and (16, 0.904174) as reference for the linear equation. From using the proposed algorithm is given by 6 Published by FAZ Publishing http://www.fazpublishing.com/ccam Mok, C. J. et al. Communications in Computational and Applied Mathematics, Vol. 1 No. 1 (2019) p. 1-7

[8] Finke, T., M. Gebhardt, & W. Schindler (2009). A t 100.57542214.801385x (21) new side-channel attack on RSA prime generation, International Workshop on Cryptographic 4.4 Estimation for Real Life RSA Cryptosystem Hardware and Embedded Systems. 141-155. [9] Coppersmith, D. (1997). Small solutions to For the estimation for real life RSA cryptosystem, the polynomial equations, and low exponent RSA following assumptions are made: vulnerabilities. Journal of Cryptology, 10(4), 233- 1. The bus size used in RSA cryptosystem is at least 260. 2048-bits. [10] Rivest, R.L., A. Shamir, & L. Adleman (1977). On 2.The Arithmetic Logical Units (ALU) deployed in real Digital Signatures and Public-Key , RSA cryptosystem have exactly same architecture with 64 bits Massachusetts Institute of Technology. ALU architecture. By substituting x 1024 in Equations (19)-(21), the times used for brute force attack, prime factorization and the proposed algorithm on 1024-bit RSA cryptosystem in ideal condition are approximately 109 6 3 . 7 8 0 8 years, 101 1 0 . 5 4 4 4 6 years and 105 7 4 . 4 3 0 7 years, respectively.

5. Conclusions In this research, one may conclude that prime factorization attack is the most efficient way on RSA cryptanalysis. Among the remaining two algorithms, the proposed algorithm is faster than the generic brute force attack. In ideal condition, the prime factorization attack shows the highest efficiency in cryptanalysis. However, the time used for prime factorization in data does not involve the time used for e and d analysis. In this research, the number of bits used for cryptanalysis was limited to 32 bits, but the data obtained is limited to maximum 20 bits RSA toy box due to huge time consumption for cryptanalysis. Due to limitations in system bus architecture, the data collection is only available for RSA toy box less than 32 bits to prevent possible buffer overflow. The simulation was limited to maximum 40 minutes to increase efficiency for further calculation.

Acknowledgement This research was supported by Tier 1 H082 RMC UTHM and Gates IT Solution Sdn. Bhd

References [1] Stallings, W. (2014). Cryptography and Network Security: Principles and Practice (7th ed.). Pearson. [2] Schneier, B. (2017). Applied Cryptography: Protocols, Algorithms and Source Code in C ed.). Wiley. [3] Shannon, C.E. (1949). Communication theory of secrecy systems. The Bell System Technical Journal, 28(4), 656-715. [4] Rivest, R.L., A. Shamir, & L. Adleman (1978). A method for obtaining digital signatures and public- key cryptosystems. Communication ACM, 21(2), 120-126. [5] 9791-1:2002, I.I. (2002). Information technology- security techniques, message authentication codes - Part 1: Mechanisms using a block cipher algorithm. [6] DBoneh, G. (2000). Cryptanalysis of RSA with private key d less than N 0. 292. Proceedings of the 17th International Couference on Theory and Application of Cryptographic Techniques, 1349. [7] Kocher, P.C. (1996). Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems, Annual International Cryptology Conference. 104-113.

7 Published by FAZ Publishing http://www.fazpublishing.com/ccam