<<

On Bugs and Ciphers: New Techniques in

Yaniv Carmeli

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 On Bugs and Ciphers: New Techniques in Cryptanalysis

Research Thesis

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy

Yaniv Carmeli

Submitted to the Senate of the Technion — Israel Institute of Technology Adar 5775 Haifa March 2015

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 The research thesis was done under the supervision of Prof. in the Computer Science Department.

The generous financial support of the Technion is gratefully acknowledged.

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 Contents

Abstract 1

1 Introduction 3

2 Bug Attacks 9 2.1 Introduction ...... 9 2.1.1 Introduction to Side Channel Attacks (SCA) . . . . . 13 2.1.2 Fault Attacks ...... 14 2.2 Overview of Our Methods and Notations ...... 15 2.2.1 Multiplication of Big Numbers ...... 16 2.2.2 Notations ...... 16 2.2.3 Methods ...... 16 2.2.4 Complexity Analysis ...... 17 2.2.5 Exponentiation Algorithms ...... 17 2.2.6 Remarks ...... 18 2.3 Bug Attack on CRT-RSA with One Chosen . . . . 19 2.4 Bug Attacks on LTOR Exponentiations ...... 20 2.4.1 Bug Attacks on Pohlig-Hellman ...... 21 2.4.2 Bug Attacks on RSA ...... 25 2.4.3 Bug Attacks on OAEP ...... 29 2.5 Bug Attacks on RTOL Exponentiations ...... 30 2.5.1 Bug Attacks on Pohlig-Hellman ...... 30 2.5.2 Bug Attacks on RSA ...... 32 2.5.3 Bug Attacks on OAEP Implementations that use RTOL 34 2.6 Bug Attacks Using the Legendre Symbol and Roots . 35 2.6.1 Bug Attacks on Pohlig-Hellman Implementations that use RTOL ...... 35

i

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 2.6.2 Bug Attacks on Pohlig-Hellman Implementations that use LTOR ...... 38 2.7 Vulnerabilities of Other Kind of Schemes ...... 44 2.7.1 Elliptic Curve Schemes ...... 44 2.7.2 Bug Attacks on Symmetric Primitives ...... 45 2.8 Summary and Countermeasures ...... 45 2.A Brief Descriptions of Several ...... 46 2.A.1 The Pohlig-Hellman and Pohlig-Hellman- Shamir Protocol ...... 46 2.A.2 The RSA Cryptosystem ...... 48 2.A.3 RSA Decryption Using CRT ...... 48 2.A.4 OAEP ...... 49 2.B Known Hardware Bugs ...... 50

3 Efficient Reconstruction of RC4 Keys from Internal States 53 3.1 Introduction ...... 53 3.1.1 Previous Attacks ...... 54 3.1.2 Outline of Our Contribution ...... 57 3.1.3 Organization of the Chapter ...... 58 3.2 The RC4 ...... 58 3.2.1 Properties of RC4 Keys ...... 60 3.2.2 Notations ...... 60 3.3 Previous Techniques ...... 61 3.4 Our Observations ...... 63 3.4.1 Subtracting Equations ...... 64 3.4.2 Using Counting Methods ...... 66 3.4.3 The Sum of the Bytes ...... 67 3.4.4 Adjusting Weights and Correcting Equations . . . . . 69 3.4.5 Refining the Set of Equations ...... 70 3.4.6 Heuristic Pruning of the Search ...... 72 3.5 The Algorithm ...... 73 3.6 Efficient Implementation ...... 73 3.7 Discussion ...... 76

ii

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 4 An Improvement of with Addition Op- erations with Applications to FEAL-8X 79 4.1 Introduction ...... 79 4.2 The Cipher FEAL-8X ...... 81 4.2.1 An Equivalent Description of FEAL-8X ...... 83 4.3 First Attack – Finding the Key Using 215 Known . 84 4.3.1 The Linear Approximations ...... 85 4.3.2 The Basic Attack ...... 85 4.3.3 Matching Subkeys from the Backward and Forward Directions ...... 88 4.3.4 Retrieving the Rest of the Subkeys ...... 89 4.4 Our Partitioning Technique – Finding The Key Using 214 Known Plaintexts ...... 91 4.4.1 A Simplified Example ...... 92 4.4.2 The Attack ...... 94 4.5 Attacking FEAL-8X Using 210 Known Plaintexts with Com- plexity 262 ...... 95 4.6 Attacks with a Few Known or Chosen Plaintexts ...... 96 4.6.1 Differential and Linear Exhaustive Search Attacks . . 96 4.6.2 Meet in the Middle Attacks ...... 98 4.7 Summary ...... 99 4.A Retrieving The FEAL-8X Key from the Actual Subkeys . . . 100 4.A.1 The Key Processing Algorithm ...... 100 4.A.2 Finding the Key ...... 102 4.B Efficient Implementation ...... 104

א Abstract in Hebrew

iii

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 List of Figures

3.1 Two Probable Alternatives to the Positions of the Indices i and j Right Before the Assignment S[i′] ← x Occurred . . . . 71

4.1 The outline of FEAL-8 and of the F -function ...... 82 4.2 Equivalent Description of FEAL-8X Without Whitening at the End ...... 84 4.3 Approximation 1 – A six-round approximation with bias 2−6 86 4.4 Approximation 2 – A six-round approximation with bias 2−6 87 4.5 The Approximation of the Seventh Round ...... 92

4.6 The Key Processing Algorithm and the Fk Function . . . . . 101

iv

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 List of Algorithms

2.1 The Two Basic Exponentiation Algorithms ...... 18 2.2 Basic Adaptive Chosen-Ciphertext Attack Against Pohlig- Hellman with LTOR ...... 22 2.3 Improved Adaptive Chosen-Ciphertext Attack Against Pohlig- Hellman with LTOR ...... 23 2.4 Adaptive Chosen-Ciphertext Attack Against RSA with LTOR 26 2.5 Chosen-Ciphertext Attack Against RSA with LTOR . . . . . 28 2.6 Chosen-Ciphertext Attack Against Pohlig-Hellman with RTOL 31 2.7 Chosen-Ciphertext attack against RSA with RTOL ...... 33 2.8 Adaptive Chosen-Ciphertext Attack Against RSA-OAEP with RTOL ...... 35 2.9 Chosen-Ciphertext Attack Against Pohlig-Hellman with RTOL 37 2.10 Known- Attack Against Pohlig-Hellman with RTOL 38 2.11 Chosen-Ciphertext Attack Against Pohlig-Hellman with LTOR 42 2.12 Known-Plaintext Attack Against Pohlig-Hellman with LTOR 44 3.1 The RC4 Algorithms ...... 59 3.2 The FIND KEY Algorithm ...... 74 3.3 The Recursive REC SUBROUTINE Algorithm ...... 75 4.1 Basic Attack on FEAL-NX with 215 messages ...... 88 4.2 Breaking FEAL in 2112 Time and Only 5 Known Plaintexts . 97 4.3 Efficient Implementation of the Attack in Algorithm 4.1 . . . 105

v

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 List of Tables

2.1 Summary of the Presented Bug Attacks ...... 47

3.1 The Probabilities Given by Theorem 3.2 ...... 62 3.2 Success Probabilities and Running Time of the RecoverKey Algorithm of [65] ...... 64 3.3 Probabilities that s is Among the Four Highest Counters . . . 69 3.4 Empirical Results of The Proposed Attack ...... 77

4.1 The Subkeys of FEAL-8X and the Actual Subkeys of the Equivalent Descriptions ...... 83 4.2 A mapping between the standard notation for FEAL subkeys and the notation used in this appendix ...... 100 4.3 Relation Between the Bytes of the Decryption Actual Subkeys and the Subkeys of the Cipher ...... 102 4.4 Relation Between the Bytes of the Actual Subkeys and the Subkeys of the Cipher ...... 102

1

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 Abstract

This research thesis presents three independent contributions to cryptanal- ysis. The first contribution is Bug Attacks, a new type of side-channel attack that takes advantages of bugs in hardware or software. The best known example of a bug in hardware is the Intel division bug, which resulted in slightly inaccurate results for extremely rare inputs. Whereas in most ap- plications such bugs can be viewed as a minor nuisance, we show that in the case of RSA (even when protected by OAEP), Pohlig-Hellman, elliptic curve , and several other schemes, such bugs can be a security disaster: Decrypting on any computer which multiplies even one pair of numbers incorrectly can lead to full leakage of the secret key, sometimes with a single well-chosen ciphertext. Bugs may also be planted by tampering with otherwise bug-free hardware. Recent documents leaked by Edward Sonwden show that this is a method which is used by the US against intelligence targets. The second contribution is an efficient algorithm for the retrieval of the RC4 secret key, given an internal state. The algorithm we is several orders of magnitude faster than previously published algorithms. In the case of a 40-bit key, it takes only about 0.02 seconds to retrieve the key, with success probability of 86.4%. Even in cases where our algorithm cannot retrieve the entire key, it can retrieve partial information about the key. The key can also be retrieved if some of the bytes of the initial permutation are incorrect or missing. The third contribution is an improvement to the linear cryptanalysis of ciphers that use addition operations, which we demonstrate on the FEAL-8X. Since its introduction 27 years ago, FEAL played a key role in the development of many cryptanalytic techniques, including dif-

1

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 ferential and linear cryptanalysis. For its 25th anniversary Mitsuru Matsui announced a challenge for an improved known plaintext attack on FEAL-8X. We describe our attack as part of this challenge and introduce improvements to linear cryptanalysis that allow us to recover the key given 214 known plain- texts in about 14 hours of computation, and led us to win the challenge. An especially interesting improvement considers the approximation of addition- based S-boxes by partitioning into several sets in a way that amplifies the bias, and therefore allows for a reduction in the number of required known plaintexts as well as saving computation time. We also describe attacks that require only a few (even 2 or 3) known plaintexts that recover the key much faster than exhaustive search.

2

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 Chapter 1

Introduction

For many years the use of cryptography was limited only to the practical use of ciphers, mainly in military settings. The main reason for encrypt- ing data with ciphers is to provide secure communication such that if an eavesdropper intercepts the messages being sent she cannot learn anything about the content of those messages. Ciphers are also often used to encrypt information before storing it, in order to prevent unauthorized retrieval of the data. Ciphers have proved to be especially important in military applications – history shows that breaking the enemy’s ciphers may give a crucial advan- tage in wars. There are numerous examples where the fate of a war was decided over breaking the enemy’s , the most famous one being the breaking of the German Enigma by the allied forces in World War II. There are two faces to the field of cryptology: cryptography and - analysis, which have always been competing in a constant race. While cryptography studies the design of new ciphers and protocols, cryptanal- ysis studies ways to break them. Over the years many ciphers have been introduced, but also many creative ways to break them were suggested. Un- til the early 20th century ciphers had been fairly simple. Classical ciphers like substitution ciphers (where each letter is replaced by a different symbol) or transposition ciphers (where the order of the letters is changed) were sim- ple enough such that encryption and decryption could have been done with a pen and paper, but complex enough such that it was hard to break them with the knowledge and technology of those times. Advances in computing over the last several decades have meant a giant leap for cryptology as well.

3

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 Since encryption and decryption can now be performed by computers or dedicated hardware, modern ciphers are much more complex. On the other hand, the attacks on those ciphers are also performed by computers and therefore are potentially a lot more dangerous. Today, in the information age, ciphers are all around us – they are es- sential for private network communications, cellular conversations, satellite broadcasts and many more applications. Modern ciphers can be divided to two groups: symmetric cryptosystems and public key (asymmetric) cryp- tosystems [69]. Symmetric cryptosystems use the same shared keys for en- cryption and decryption, while in public key cryptosystems the sender and the receiver use different keys. The sender uses a public encryption key and the receiver uses a secret decryption key, thus no prior secure key-exchange is needed. Due to this property, public key cryptosystems are suitable for many applications in which symmetric ciphers cannot be used. In addition, public-key cryptosystems are more efficient in settings where pairwise-secure communication is needed between a large number of participants: public- key cryptosystems require that each party publish only a single public key which is used by all others to send messages to him (he also keeps a single private key to himself). For comparison, in symmetric ciphers every pair of parties must have a different shared key, and thus the overall number of keys in the system may be quadratic in the number of participants. Public-key cryptosystems, however, are usually several orders of magnitude slower than symmetric ciphers. Symmetric cryptosystems also have two main flavors: block ciphers and stream ciphers. In block ciphers (like DES [60], AES [24], FEAL [76]) one block of the message is handled at a time, such that each block is indepen- dently encrypted under the same secret key. In stream ciphers (like RC4 [3] or A5/1 [18]) smaller blocks of plaintext may be encrypted (typically as small as a single bit or a single byte), and a secret state is kept in memory, which is updated according to the previous states and previous input blocks. The only cipher which is secure in an information theoretical sense is the One Time Pad (OTP). In OTP, encryption is performed by XORing a random secret key with the message, where the parties share the same random secret key, and use it only once.1 OTP is only secure if the key is

1We describe OTP over the binary field, but it is theocratically secure over any other field (the XOR will be replaced by the addition operation of that field).

4

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 random and is at least as long as the message. The obvious disadvantage of OTP is that in order to encrypt very long messages, very long keys must be exchanged and stored in advance. In order to overcome this problem, stream ciphers were designed to create long pseudorandom strings from a short key, such that a computationally bounded attacker cannot distinguish between strings generated by the stream cipher and random strings. The pseudorandom strings are called keystreams, and they are used instead of the random key of OTP (they are XORed with the message). This design of stream ciphers eliminates the information theoretic security. Nowadays there are many more applications to cryptography besides encryption, many of them inseparable parts of modern everyday life (hash functions, digital signatures and are Message authentication codes (MAC) are just a few examples). Modern cryptography also offers a wide variety of primitives that can be used as building blocks for more complex protocols and applications, for example: zero-knowledge proofs, commitment schemes and oblivious transfer. In contrast to standard cryptanalytic attacks that target the weaknesses of the cryptographic algorithms, there are also side channel attacks (SCA) that target a specific implementation and take into account the physical settings in which the cryptographic algorithm is executed. Examples for side channel attacks include acoustic attacks (where secret information is retrieved by analyzing the sounds emitted by a processor), power attacks (that measure the power consumption of the processors), and many more. More details on side channel attacks are provided in Section 2.1.1. This dissertation presents three independent contributions to cryptanal- ysis of schemes and ciphers that were done as part of my PhD studies. The first one is a new type of side-channel attack, while the second and third are improvements to the cryptanalysis of the stream cipher RC4 and the block cipher FEAL, respectively. Chapter 2 describes the first contribution – a new type of side-channel attack, which we named Bug Attacks. We investigate the security impli- cations of using a buggy processor to perform cryptographic computations and show that if a processor multiplies even one pair of inputs a and b in- correctly (even in a single low order bit), then secret information may be leaked. Bug attacks are related to the notion of fault attacks [20], but seem to be more dangerous in their implications. Fault attack concentrate on soft

5

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 errors that yield random results when induced at a particular point of time by the attacker. They require the attacker to be in physical possession of the computing device in order to deliberately inject a transient fault. In contrast, with bug attacks millions of buggy processors may be remotely at- tacked over the internet. Unlike the case of fault attacks, in bug attacks the error is deterministic and is triggered whenever a particular computation is carried out; the attacker does not choose the timing or nature of the error, except for choosing the inputs of the computation. But the attacker does not have to wait for an “innocent” bug to be discovered, she may plant the bug herself by tampering with the hardware. Even commercially sold bug-free processors can be made buggy by anyone along the supply chain who modifies them. In February 2005 the matter was addressed in a US Department of Defense (DoD) report [80], which warned about the risks of importing hardware from foreign countries to the US. NSA documents leaked by Edward Snowden provide evidence that the NSA does exactly that. In December 2013 Der Spiegel ran a story [78] based on the leaked documents in which they reveal the method referred by the NSA as interdiction: when an NSA target orders new electronic equipment the NSA diverts the shipments to pass through their own workshops, where the packages are opened and the equipment is tampered with. Other leaked documents [5] provide insight into some of the ways used by the NSA in order to tamper with the equipment, including adding their own software, replacing the existing firmware and even completely replacing the hardware. We present bug attacks against several widely deployed cryptosystems (such as Pohlig-Hellman [66], RSA [69], elliptic curve schemes, and some symmetric primitives), and against several implementations of those schemes. For all the discussed schemes, we show that the secret exponent can be re- trieved by a chosen ciphertext attack, and in the case of Pohlig-Hellman, the secret exponent can also be retrieved by a chosen plaintext attack. In the case of RSA, we show that if decryption is performed using the Chinese remainder theorem (CRT) [56, Note 14.70] the public modulus n can be factored using a single chosen ciphertext. A particularly interesting obser- vation is that even though RSA-OAEP [7] was designed to prevent chosen ciphertext attacks, we can actually use this protective mechanism as part of our bug attack in order to learn whether a bug was or was not encountered during the exponentiation process.

6

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 The second contribution (Chapter 3) is a statistical algorithm which can be used to retrieve the key of the stream cipher RC4 from the initial internal state. RC4 was first introduced in 1987 by Ron Rivest. More than twenty five years after its release RC4 is still the most widely used software stream cipher in the world. Among other uses, it is used to protect internet traffic as part of the SSL (Secure Socket Layer) and TLS (Transport Layer Security [25]) protocols, and to protect wireless networks as part of the WEP (Wired Equivalent Privacy) and WPA (Wi-Fi Protected Access) protocols. The internal state of RC4 consists of a permutation S of the numbers 0,...,N − 1, and two indices i, j ∈ {0,...,N − 1}, where N = 256. RC4 is comprised of two algorithms: the Key Scheduling Algorithm (KSA), which uses the secret key to create a pseudo-random initial state, and the Pseudo Random Generation Algorithm (PRGA), which generates the pseudo-random stream. We present several observations about the KSA and the way the bytes of the secret key are used in it. We then use those observations to provide our algorithm for inverting the KSA. This algorithm recovers the secret key with a much higher success rate than previous results ([65]). Our Algorithm also works if some of the bytes of the initial permutation are missing or contain errors. Such scenarios are likely results of side channel attacks, as in [35]. In these cases, our algorithm can even be used to reconstruct the full correct initial permutation by finding the correct key and then using it to compute the missing values. Details of an efficient implementation of the data structures and internals of the algorithm are also discussed. The third contribution (Chapter 4) is an improvement to the linear cryptanalysis of ciphers that use the addition operation, which we demon- strate on the block cipher FEAL-8X. Like RC4, FEAL was introduced in 1987 [76]. Over the years FEAL inspired the development of many crypt- analytic techniques, including differential and linear cryptanalysis [14, 52]. The best known attacks on 8-round FEAL (before the attack we describe here) required a few hundreds of chosen plaintexts [15] or 16 million known plaintexts [8, 54]. In CRYPTO 2012 Mitsuru Matsui announced a challenge [54] for devel- oping improved attacks on FEAL-8X [57], and an award which will be given to the best attack capable of recovering the key of given sets of known plain- texts with various amounts of data. The attack recovering the key using the

7

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 smallest number of known plaintexts would be declared the winner. In the course of this year we developed an improved attack capable of recovering the key of FEAL-8X, and three weeks before the deadline we submitted our solution for the challenge set with a million known plaintexts, and were the first to submit a correct solution. A few days later another group submitted a solution for a smaller set of 215 known plaintexts. It took us another two weeks to finalize our program with all the additional tricks and to submit the solution for the set of 214 known plaintexts, which became the winning solution. We present the cryptanalytic attacks that we developed for this challenge and the techniques that we used to improve linear cryptanalysis. Our main contribution is a new partitioning method that can amplify the bias of a linear approximation of addition. The data is partitioned into two sets such that in one of the sets the bias of the linear approximation is higher than it is when all the messages are considered. Interestingly, we cannot tell in advance which of the two sets is the one with the increased bias, and therefore we try both of them. The amplified bias allows us to reduce the number of plaintexts needed for the attack while keeping the analysis time per plaintext the same. Due to the smaller number of required plaintexts the attack time when using this method even decreases. Incorporating this technique with our previous methods allowed us to find the key given 214 known plaintexts in about 14 hours. In addition to the practical attacks on FEAL-8X we also discuss attacks that can find the key with fewer plaintexts faster than exhaustive search. We describe an attack that can recover the key given 210 known plaintexts in time of 262 FEAL-8X encryptions. In addition, we describe attacks in which given only 11–21 known or chosen plaintexts the FEAL-8X key can be recovered with complexity of about 280 and given 2 or 3 known plaintexts the FEAL-8X key can be recovered with complexity of about 296. These attacks combine linear cryptanalysis and differential cryptanalysis with exhaustive search of many subkeys, as well as meet in the middle attacks. These attacks exploit the fact that the total size of the subkeys is not sufficiently larger than the size of the key.

8

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 Chapter 2

Bug Attacks

This chapter presents a new type of side-channel attack that takes advantage of bugs in hardware in order to recover secret information about the crypto- graphic computations it performs. We focus on the case of a processor with a bug in the multiplication instruction and present attacks against several ciphers, including Pohlig-Hellman, RSA and RSA protected with OAEP. An extended abstract of this contribution was published in the proceed- ings of CRYPTO 2008 [10]. It is a joint work with Prof. Eli Biham and Prof. Adi Shamir.

2.1 Introduction

With the increasing word size and the sophisticated optimizations of mul- tiplication units in modern microprocessors, it becomes increasingly likely that they contain undetected bugs. This was demonstrated by the accidental discovery of the Pentium division bug in the mid 1990’s, by the less famous Intel 80286 popf bug (that set and then cleared the interrupt-enable bit dur- ing execution of the very simple popf instruction, when no change in the bit was necessary), by the recent discovery of a bug in the Intel Core 2 memory management unit (which allows memory corruptions outside the permitted range of writing for a process), etc. A non-exhaustive list of known hardware bugs is given in Appendix 2.B. In this chapter we show that a bug in the microprocessor that is used to carry out cryptographic computations can be exploited by an attacker

9

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 to learn secret information about the cryptographic keys. We show that if some intelligence organization discovers (or secretly plants) even one pair of single-word integers a and b whose product is computed incorrectly (even in a single low order bit) by a popular microprocessor, then any key in any RSA- based security program running on any one of the millions of computers that contain this microprocessor can be easily broken, unless appropriate countermeasures are taken. In some cases, the full key can be retrieved with a single chosen ciphertext, while in other cases (such as RSA protected by the popular OAEP technique), a larger number of ciphertexts is required. The attack is also applicable to other cryptographic schemes which are based on exponentiation modulo a prime or on point multiplication in elliptic curves, and thus almost all the presently deployed public key schemes are vulnerable to such an attack. The new attack, which we call a Bug Attack, is related to the notion of fault attacks discovered by Boneh, Demillo and Lipton in 1996 [20] but seems to be much more dangerous in its implications. The original fault attack concentrated on soft errors that yield random results when induced at a particular point of time by the attacker (latent faults were briefly mentioned, but were never studied). They require physical possession of the computing device by the attacker, and the deliberate injection of a transient fault by operating this device outside its operating envelope (temperature, voltage, frequency, etc.), or subjecting it to clock and voltage glitches or light and laser pulses. Such attacks are feasible against smart cards, but are much harder to carry out against PC’s. In the new bug attack, the target PC’s can be located at secure locations halfway around the world, and millions of PC’s can be attacked simultaneously over the Internet without having to manipulate the operating environment of each one of them individually. Unlike the case of fault attacks, in bug attacks the error is deterministic and is triggered whenever a particular computation is carried out; the attacker cannot choose the timing or nature of the error, except for choosing the inputs of the computation. Since the design of modern microprocessors is usually kept as a trade secret, there is no efficient method for the user to verify that a single multi- plication bug does not exist. For example, there are 2128 pairs of inputs in a 64 × 64 bit multiplier, and we cannot try them all by exhaustive search. We can even expect that most of the 2128 pairs of inputs will never be mul-

10

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 tiplied on any processor. Even if we assume that Intel had learned its lesson and meticulously verified the correctness of its multipliers, there are many smaller manufacturers of microprocessors who may be less careful with their design, and less careful in testing the quality of the chips they produce. In addition, many PCs are sold with overclocked processors which are more likely to err when performing complex instructions such as 64-bit integer multiplication. The problem is not limited to microprocessors: many cellular phones are running RSA or elliptic curve computations on signal processors made by TI and others. FPGA or ASIC devices embed flawed multipliers from popular libraries of standard cell designs, and many security programs use optimized “bignum packages” written by others without being able to fully verify the correctness. In addition to such innocent bugs, there is the issue of intentionally tampered hardware, which is a major security problem. Even commercially sold bug-free processors can be made buggy by anyone along the supply chain who modifies them. In February 2005 the matter was addressed in a US Department of Defense (DoD) report [80], which warned about the risks of importing hardware from foreign countries to the US. NSA documents leaked by Edward Snowden provide evidence that the NSA does exactly that. In December 2013 Der Spiegel ran a story [78] based on the leaked documents in which they reveal the method referred by the NSA as interdiction: when an NSA target orders new electronic equipment the NSA diverts the shipments to pass through their own workshops, where the packages are opened and the equipment is tampered with. Other leaked documents [5] provide insight into some of the ways used by the NSA in order to tamper with the equipment, including adding their own software, replacing the existing firmware and even completely replacing the hardware. What we show in this chapter is that the innocent or intentional intro- duction of any bug into the multiplier of any processor (even when it affects only two specific inputs whose product contains a single erroneous low-order bit) can lead to a major security disaster, which can be secretly exploited in an essentially undetectable way by a sophisticated intelligence organiza- tion. Even though we are not aware of any such attacks being carried out in practice, hardware manufacturers and security experts should be aware of this possibility, and use appropriate countermeasures. In this chapter we present bug attacks against several widely deployed

11

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 cryptosystems (such as Pohlig-Hellman [66], RSA [69], elliptic curve schemes, and some symmetric primitives), and against several implementations of these schemes. For all the discussed schemes, we show that the secret ex- ponent can be retrieved by a chosen ciphertext attack, and in the case of Pohlig-Hellman, the secret exponent can also be retrieved by a chosen plain- text attack. In the case of RSA, we show that if decryption is performed using the Chinese remainder theorem (CRT) [56, Note 14.70] the public modulus n can be factored using a single chosen ciphertext. A particularly interesting observation is that even though RSA-OAEP [7] was designed to prevent chosen ciphertext attacks, we can actually use this protective mech- anism as part of our bug attack in order to learn whether a bug was or was not encountered during the exponentiation process. This demonstrates that in spite of the similarity between bug attacks and fault attacks, their coun- termeasures can be very different. For example, just stopping an erroneous computation or computing the result twice with a different exponentiation algorithm to verify the result may protect the scheme against fault attacks, but will leak the full key via a bug attack.

This chapter is organized as follows: Section 2.1.1 gives an introduction to side channel attacks and specifically fault attacks (as mentioned before, bug attacks are related to fault attacks). Section 2.2 gives an overview of the methods we use in most of our attacks, and describes the two most com- monly used implementations of modular exponentiations: the left-to-right (LTOR) and right-to-left (RTOL) exponentiation algorithms. Section 2.3 presents the simplest bug attack on RSA when decryption is performed us- ing the Chinese remainder theorem (CRT), using a single chosen ciphertext. Section 2.4 presents attacks on several cryptosystems when exponentiations are computed using the LTOR algorithm, and Section 2.5 presents attacks on the same schemes when the exponentiations are computed using the RTOL algorithm. Section 2.6 describes attacks which use the Legendre symbol to identify erroneous Pohlig-Hellman decryptions. In Section 2.7 we dis- cuss bug attacks on elliptic curve schemes and some symmetric primitives. Section 2.8 summarizes the contributions of this chapter, and presents the time and data complexities of all our attacks. In Appendix 2.A we pro- vide descriptions of the cryptosystems discussed in this chapter. Finally, Appendix 2.B includes a non-exhaustive list of known hardware bugs.

12

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 2.1.1 Introduction to Side Channel Attacks (SCA)

While standard attacks against cryptographic protocols use inherent weak- nesses of the protocol in order to extract secret information, side channel attacks (SCA) use weaknesses of the physical environment which is used to implement the cryptographic algorithms. Therefore, SCAs usually require physical access to the cryptographic device and are thus much easier to apply to smart cards than to PCs. Examples of side channel attacks include:

• Timing attacks [44] – In some cryptographic protocols the time it takes to compute and return the result can leak secret information. For ex- ample, in RSA decryptions the ciphertexts are exponentiated to power of a secret exponent. In common implementations of exponentiation algorithms, every set bit of the secret exponent causes the execution of an additional multiplication instruction, thus causing the computation time to take longer. Thus, the total number of 1-bits in the exponent can be approximated given the computation time.

• Power attacks [45] – The power consumption of the encrypting device during its computation is strongly correlated to the instructions it executes and to the data stored in its internal registers. For example, multiplying two numbers requires more power than adding them, and changing the value of a register from 0 to 1 requires more power than changing from 1 to 0. By measuring the power consumption of the device, it is possible to analyze its operation at various stages and retrieve secret information.

• Acoustic attacks [29, 30, 75] – Sounds (not necessarily audible) which are produced during the computation are also strongly correlated to the operations being performed by the device. An attacker who can record and analyze those sounds can retrieve secret information about the computations performed by the device.

• Cache attacks [64] – The cache memory of the CPU stores recently accessed data rows from the computer’s memory in order to improve performance. Modern computers run several processes concurrently, possibly in different cores, such that each process has different security

13

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 privileges and restrictions. The time it takes to retrieve data stored in the cache is much shorter than retrieving data from the memory. In cases where processes use the same cache it is possible for processes running on the CPU to learn about memory accesses of other processes (on the same or other core) which are run concurrently. Ciphers (or implementations of ciphers) in which memory access patterns depend on secret information (e.g., to read S-box entries) are vulnerable to cache attacks.

• Fault attacks [16, 20] – Faults which occur during the computation cause the device to output incorrect results. An attacker can use the incorrect results to retrieve secret information about the computation and its secret inputs. Computation faults that happen randomly are very rare, and thus the attacker often needs to actively induce faults by manipulating the physical environment of the device (for example, operating it inside a microwave oven, causing a surge in the power supply of the device at a precise moment or operating it under extreme temperatures or high-frequency clocks). A more detailed description of fault attacks is given in the next subsection.

2.1.2 Fault Attacks

The notion of fault attacks was introduced in 1996 [20]. A simple and efficient fault attack against CRT-RSA was presented, along with attacks that use register faults (where one bit of an internal register may flip with non-negligible probability).

2.1.2.1 A Fault Attack Against CRT-RSA

The fault attack against CRT-RSA [20] requires only one faulty decryption. Let c be an RSA ciphertext, which is decrypted using CRT. Assume that the attacker is able to induce a fault while the device is performing the ex- ponentiation modulo (without loss of generality), while the exponentiation modulo p is performed correctly (i.e., in the notation of Section 2.A.3, the

value mq is incorrect while the value of mp is correct). In this case, after applying the CRT, an incorrect decryptionm ˆ is obtained, which is different than the correct message m.

14

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 For this value it holds that: { { m ≡ m (mod p) mˆ ≡ m (mod p) p and p m ≡ mq (mod q) mˆ ̸≡ mq (mod q)

An attacker who obtains both m andm ˆ can find the factors p and q by computing: gcd (m − m,ˆ n) = p. Alternatively, if the attacker only has access to the faulty decryptionm ˆ she can perform the attack by computing gcd (C − mˆ e, n) = p.

2.1.2.2 Countermeasures Against Fault Attacks

The simplest countermeasure against fault attacks verifies the correctness of the output before releasing it. This can be done easily in decryptions of public key cryptosystems, by re-encrypting the output under the public key and verifying that the input ciphertext is obtained. If an error is detected the device can withhold the result, or try to compute it again (assuming there is a low probability that the fault will occur again). When public key schemes are used for signatures, it is possible to protect against fault attacks by appending a random to the message before signing it. This way even if a fault occurs, the attacker does not know the full value of the signed plaintext, and therefore cannot apply this attack.

2.2 Overview of Our Methods and Notations

We present several attacks which use multiplication bugs. We concentrate on multiplication since on one hand, it is a common operation in cryptographic computations, and on the other hand it is typically a complex operation and its implementations are aggressively optimized. Therefore, bugs are much more likely to exist in multiplication instructions than in simple operations like addition or XOR, and are less likely to be discovered by the manufac- turers. Furthermore, these complex operations are more likely to fail when the processor operates under unusual circumstances, such as overclocking, even when no bugs exist under normal operating conditions.

15

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 2.2.1 Multiplication of Big Numbers In cryptography, we are often required to perform arithmetic operations on big numbers, which must be represented using more than a single 32-bit or 64-bit word. Arithmetic operations on such values must be broken down into arithmetic operations on the different words which comprise them. For example, when multiplying two very long integers x and y, each represented by ten words, each of the ten words of x is multiplied by each of the ten words of y in some order, and the results are then summed up to the appropriate words of the product. If x contains a in the sense that one of the ten words of x is a, y contains b, and the processor produces an incorrect result when a and b are multiplied, then the result of multiplying x·y on that processor will typically be incorrect (unless there are multiple errors that exactly cancel each other during the computation, which is very unlikely when the other words in x and y are randomly chosen).

2.2.2 Notations We use the notation x · y to denote the result of multiplying x by y on a bug-free processor, and x ⊙ y to denote the result of the same computation when performed on a faulty processor. Similarly, the notation xl denotes the value of x to the power l as computed on a bug-free processor, and x⟨l⟩ denotes the value of x to the power l as computed by a particular algorithm on a faulty processor (see Section 2.2.5 for details of popular exponentiation algorithms). Since we assume that faults are extremely rare, for most inputs we expect the result of the computation to be the same on both the faulty and the bug-free processors. When no errors occur we use the notations x·y and xl, even when referring to computations done on the faulty processor.

2.2.3 Methods Our attacks request the decryptions of ciphertexts which may or may not invoke the execution of the faulty multiplications, as determined by the bits of the secret exponent d. The results of those decryptions are used to retrieve the bits of d. We develop two methods for creating the conditions under which the buggy instructions are executed. The first method chooses a ciphertext C, such that some intermediate value x during the decryption process contains both a and b. If x is squared, then we expect that x2 ≠ x⟨2⟩,

16

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 and thus the result of the entire decryption process is also expected to be incorrect. But if x is first multiplied by a different value y (as controlled by d), which contains neither a nor b, then we expect that x · y = x ⊙ y, and the decryption result is expected to be correct. The second method chooses C such that during decryption one interme- diate value x contains a, while another value y contains b. If x and y are multiplied then it is expected that x · y ≠ x ⊙ y, and the result of decryp- tion on the faulty processor is expected to be incorrect. If x and y are not multiplied during the decryption process (due to the bits of d) we expect the decryption result to be correct.

2.2.4 Complexity Analysis

Let w be the length (in bits) of the words of the processor. In the analysis of the complexity of our attacks throughout this chapter we assume that numbers (both exponentiated values and exponents) are 1024-bit long, and that w = 32 (in the summary of the chapter we also quote the complexities for w = 64). The standard representation of 1024-bit long numbers requires ⌈210/w⌉ words. Given a random 1024-bit value x, and a w-bit value a, the probability that x contains a (in any of its 210/w words) is about 2−w210/w. For w = 32 this probability is about 2−27, and for w = 64 it is about 2−60. Given two( w-bit values) a and b, the probability that x contains both a and b is about 2−w210/w 2. For w = 32 this probability is about 2−54, and for w = 64 it is about 2−120.

2.2.5 Exponentiation Algorithms

Given a value x and a secret exponent d = d d − . . . d d (where ∑log n log n 1 1 0 ∈ { } ⌈log n⌉ i di 0, 1 are the binary digits of d, i.e., d = i=0 di2 ), the exponen- tiation x 7→ xd mod n can be efficiently computed by several exponentiation algorithms [56, Chapter 14.6]. In this chapter we present attacks against implementations that use the two basic exponentiation algorithms, LTOR (left-to-right) and RTOL (right-to-left), described in Algorithm 2.1. Our techniques can be easily adapted to attack implementations that use other exponentiation algorithms such as the sliding window algorithm and the k-ary exponentiation algorithm.

17

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 LTOR Exponentiation RTOL Exponentiation

z ← 1 y ← x; z ← 1 For k = log n down to 0 For k = 0 to log n 2 If dk = 1 then z ← z · x mod n If dk = 1 then z ← z · y mod n Otherwise, z ← z2 mod n y ← y2 mod n Output z Output z

Algorithm 2.1: The Two Basic Exponentiation Algorithms

2.2.6 Remarks

The following remarks apply to most of the attacks presented in this chapter.

1. Microprocessors usually perform different sequences of microcode in- structions when computing a · b and b · a, and thus the bug is not expected to be symmetric: for a·b the processor may give an incorrect result, while for b · a the result may be correct. Therefore, the correct- ness of the result of multiplying two big numbers x and y, where x contains a and y contains b, depends on whether the implementation of x · y multiplies a · b or b · a. We assume that such implementation details are known to the attacker when she devises the attack.

2. Given a value n, the number of bits in the binary representation of n ⌊ ⌋ ⌊ ⌋ is log2 n + 1 (the indices of the bits of n are 0,..., log n , where 0 is the index of the least significant bit, and ⌊log n⌋ is the index of the most significant bit). Throughout this chapter we use log n (without the floor operator) as a shorthand for the index of the most significant bit of n.

3. It may be the case that more than one pair of buggy inputs a and b exist. In such cases, if γ > 1 multiplication bugs are known to the attacker the complexities of some of the attacks we present can be decreased. In attacks where the attacker can control only one of the operands of the multiplication, and the other operand is expected to appear randomly the time complexity can be decreased by a factor of

18

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 about min(γ, ⌊(log n) /w⌋) (It is enough that one of the pairs of the buggy inputs appears in order to cause an error in the computation, it does not matter which of them. Still, we cannot use more input pairs than the number of words in the representation of the big integer). For example, if two pairs of buggy inputs are known, a − b and c − d, we can choose the operand we can control to contain both a and c and then we need either b or d to appear randomly in the other operand. If some of the buggy pairs of operands share the same value for one of the operands, this factor can even be better (but it cannot be higher than γ). In attacks where both operands are expected to appear randomly, the time complexity can be decreased by a factor of γ. Note that for this remark symmetric bugs, where both the results of a · b and b · a are incorrect, are counted as two bugs.

4. If both operands of the buggy instruction are equal (i.e., a = b), the complexity of some of our attacks can be greatly reduced, while other attacks become impossible. The former case happens when attacks rely on faults in the squaring of values X, where X happens by chance to contain both a and b. In this case only one word (a) needs to appear in X, which makes the probability of this event much higher. On the other hand, attacks which use the existence of a bug in order to decide whether x and y were squared or multiplied together become impossible. When the attack requires that x contains a and that y contains b, our ability to distinguish between the cases of (x2 or xy) relies on whether a ≠ b.

2.3 Bug Attack on CRT-RSA with One Chosen Ciphertext

We now describe a simple attack on RSA implementations in which decryp- tions are performed using the Chinese remainder theorem (CRT). The attack is based on the attack of [20] which was described in Section 2.1.2.1. Let n = pq be the public modulus of RSA, where p and q are large primes, and assume without loss of generality that p < q. Knowing the target’s public key n (but not its secret factors p and q), the attacker can easily compute a half size integer which is guaranteed to be between the two secret factors p

19

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 √ √ ⌊ ⌋ ≤ ⌊ ⌋ and q of n. For√ example, n always satisfies p n < q, and any integer close to n is also likely to satisfy this condition.√ The attacker now chooses a ciphertext C which is the closest integer to n, such that both a and b appear as low order words in C, and submits this “poisonous input” to the target PC. The first step in the CRT-RSA computation is to reduce the input C

modulo p and modulo q. Due to its choice, Cp = C mod p is randomized modulo the smaller factor p, but Cq = C mod q = C remains unchanged modulo the larger factor q. The next step in RSA-CRT is always to square

the reduced inputs Cp and Cq, respectively. Since a and b are unlikely to remain in Cp, the computation mod p is likely to be correct. However, mod q the squaring operation will contain a step in which the word a is multiplied by the word b, and by our assumption the result will be incorrect. Assuming that the rest of the two computations mod p and mod q will be correct, the final result of the two exponentiations will be combined into a single output Mˆ which is likely to be correct mod p, but incorrect mod q. The attacker can then finish off his attack in the same way as in the original fault attack of [20], by computing the greatest common divisor (gcd) of n and Mˆ e − C, where e is the public exponent of the attacked RSA key. This gcd is the secret factor p of n. Note that if such C (p ≤ C < q) cannot be found, then q − p < 22w. In this case, n can be easily factored by other methods (e.g., Fermat’s fac- torization method, which will factor n in 2w time without any calls to a decryption oracle).

2.4 Bug Attacks on LTOR Exponentiations

In this section we present bug attacks against several cryptosystems, where exponentiations are performed using the LTOR exponentiation algorithm (rather than using CRT). We first present chosen plaintext (or chosen ci- phertext) attacks against the Pohlig-Hellman scheme, then present chosen ciphertext attacks against RSA, and finally discuss how to adapt our attacks on RSA to the case of RSA-OAEP.

20

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 2.4.1 Bug Attacks on Pohlig-Hellman

The Pohlig-Hellman cipher uses two secret exponents e and d: the former is used for encryption, and the latter for decryption. Given one of the secret exponents, the other can be computed by d ≡ e−1 (mod p − 1). We discuss adaptive and non-adaptive chosen ciphertext attacks which retrieve the bits of the decryption exponent d; similar chosen plaintext attacks can retrieve the encryption exponent e. We start by presenting a simple adaptive attack, which demonstrates the basic idea of our technique. We later improve this attack with additional ideas.

2.4.1.1 Basic Adaptive Chosen Ciphertext.

In this section, an attack which requires the decryption of log p + 1 chosen ciphertexts is presented. The attack retrieves the bits of the secret expo-

nent one at a time, from dlog p to d1 (d0 is known to be one, as d is odd). Therefore, when the search for di is performed, we can assume that the bits di+1, . . . , dlog p are already known. Algorithm 2.2 describes the attack. The attack is based on the following observations. Since p is a known prime, the attacker can compute arbitrary roots modulo p. The value of C is chosen such that when it is exponentiated to power d with LTOR, the intermediate value of the variable z after log p − i iterations is a modular square root of X. At the beginning of the next iteration z is squared (and thus its value becomes X). The next operation of the LTOR algorithm is

either squaring z, or multiplying it by C, depending on the value of di. Since the intermediate value z = X contains both a and b, we expect an incorrect

decryption if z is squared (i.e., when di = 0), and a correct decryption if z is first multiplied by C (i.e., when di = 1). Note that the bug-free decryption in Step 2d may be computed on the buggy microprocessor by using the multiplicative property of modular expo- nentiation. The attacker may request the decryption M ′ of C′ = C3 mod p (or any other power of C which is not expected to cause the execution of the faulty instructions), and then check whether Mˆ 3 ≡ M ′ (mod p) to learn if an error had occurred. Thus, no calls to a bug-free decryption device that uses the same secret key (which is usually unavailable) are required. In fact, since the same value of X is used for each of the iterations, the correct

21

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 1. Choose a value X which contains the words a and b.

2. For i = log p down to 1 do

′ (a) Denote∑ the value of the known bits of d by d = log p k−(i+1) k=i+1 2 dk. ′ (b) Compute C = X1/2d mod p. (c) Ask for the decryption Mˆ = C⟨d⟩ mod p on the faulty processor. (d) Obtain the correct decryption M = Cd mod p (we later describe how this step can be performed even without access to a bug-free processor).

(e) If M = Mˆ conclude that di = 1, otherwise conclude that di = 0.

3. Set d0 = 1.

Algorithm 2.2: Basic Adaptive Chosen-Ciphertext Attack Against Pohlig- Hellman with LTOR

decryption M can be computed from the value of the correct decryption ′ ′ in the previous iteration as: M = M¯ d /d¯ mod n, where M¯ and d¯′ are the values of the corresponding variables in the previous iteration. Therefore, no additional decryption requests (beyond the first one) are needed in order to obtain all the correct decryption results throughout the attack. The attack requires buggy decryption of log p + 1 chosen ciphertexts to retrieve d, or buggy encryption of log p + 1 chosen plaintexts to retrieve e. Each one of these values makes it easy to compute the other value since p is a known prime.

2.4.1.2 Improved Adaptive Chosen Ciphertext Attack.

We observe that X can be selected such that both X and X⟨2⟩ contain a and b (we later describe how to find such X). Using this observation we reduce the expected complexity of retrieving d by a constant factor. A further improvement uses values of X which contain a and b, such that when X is squared m times repeatedly on a faulty processor (for some m > 0), all the intermediate squares X⟨2j ⟩ (for 0 ≤ j ≤ m) contain a and b. The faulty

22

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 1. Choose X such that X,X⟨2⟩,X⟨22⟩,...,X⟨2m⟩ all contain a and b (see below how to implement this step in the most efficient way).

2. While not all the bits dlog p, . . . , d2, d1 are known:

(a) Let i be the smallest index such that all the bits di+1, . . . , dlog p are already known. ′ (b) Denote∑ the value of the known bits of d by d = log p k−(i+1) k=i+1 2 dk. ′ (c) Compute C = X1/(2d ) mod p. (d) Ask for the decryption Mˆ = C⟨d⟩ mod p on the faulty processor. (e) Obtain the correct decryption M = Cd mod p. 2i−j (f) Find j such that M/Mˆ = βj mod p (j ∈ {0, . . . , m}).

(g) Conclude that the next j bits of d are zero (i.e., di = di−1 = ... = di−(j−1) = 0), and if j < m, the following bit is one (i.e., di−j = 1).

3. Set d0 = 1.

Algorithm 2.3: Improved Adaptive Chosen-Ciphertext Attack Against Pohlig-Hellman with LTOR

squares of X as computed by the faulty processor are X⟨2⟩,X⟨22⟩,...,X⟨2m⟩. 2j ⟨2j ⟩ Let βj = X /X for 0 ≤ j ≤ m. Using such values for X, and assuming uniform and independent distribution of the bits of d, we can improve the expected complexity of the attack by a factor of α = 2 − 2−m. The trick is to identify the length of a subsequence of several consecutive zero bits of d using one chosen ciphertext. In this improved attack the bits of d are retrieved starting from the most significant bit, as in the original attack. The attack is described in Algorithm 2.3. As in the basic attack, the correct decryption of C can be obtained by a blinded decryption query to the faulty processor, which is not likely to trigger the fault. The attack chooses ciphertexts such that after log p − i iterations of the

23

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 exponentiation (i.e., decryption) the intermediate value of z in the LTOR algorithm is a square root of X. At the beginning of the next iteration its

values are squared (and thus it is now X). Then, if di = 1, X is multiplied by C, and therefore no errors are expected to occur during the algorithm

(and we get j = 0 in Step 2f of the attack). Otherwise, di = 0, X is squared, ⟨2⟩ and the result is X (due to the bug). If di−1 = 1, then the intermediate result is multiplied by C, and therefore no additional computation errors are expected to occur. In such a case, the output of the exponentiation algorithm is Mˆ = (X⟨2⟩)2i C∗, where C∗ is a value which depends only on C and on the unknown bits of d. For similar reasons the correct exponentiation of C 2 2i ∗ 2i is M = (X ) C , so we get M/Mˆ = β1 . Moreover, if there are j (j ≤ m) consecutive zero bits, they are successfully identified by the condition in − − Step 2f, as M = (X2)2i j C∗ and Mˆ = (X⟨2⟩)2i j C∗ for some appropriate ∗ 2i−j C , and we get M/Mˆ = β1 . Note that the length of sequences of zeros we can identify in this way is bounded by m, and that if we identify a sequence of m zero bits, we cannot determine whether the following bit (after the sequence) is set or not. Each iteration of the attack retrieves at least one bit of d, and may retrieve up to m bits of d. Assuming a uniform independent distribution of the∑ bits of d, the expected number of bits retrieved in each iteration is m −k − −m ≥ ∈ α = k=0 2 = 2 2 , which for m 1, is in the range α [1.5, 2). Therefore, on average log p/(2 − 2−m) + 1 chosen ciphertexts are required for the attack, which is 1.5–2 times more efficient than the basic version. Finding X. For a general m, finding such values of X as required for the attack may be hard, because the probability that the square of a random value which contains a and b also contains a and b is very low. However, for m = 2 and m = 3, when w = 32, we successfully found such values: When p > 2256 we can use X = 2127 + 232a + b, for which both X and X⟨2⟩ contain a and b. For p > 2893, we can use X = 2223 + 296(232a + b) + 210, for which all three values X, X⟨2⟩ and X⟨4⟩ contain both a and b. By investing a computation time of about 254 in each iteration of the attack, we can reduce the data complexity by a factor of 2. We search for ′ values X such that X, X⟨2⟩, and (X ⊙ X1/2d )⟨2⟩ = (X ⊙ C)⟨2⟩, all contain a and b (we can choose values X that satisfy the first two conditions, while the appearance of a and b in (X ⊙ C)⟨2⟩ has a probability of about 2−54 to occur randomly). Using such values we can retrieve two bits of d in every

24

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 iteration of the attack, reducing the number of calls to the faulty decryption oracle by a factor of 2. However, the total time complexity of the attack increases to about 263.

2.4.1.3 Chosen Ciphertext Attack.

The (non-adaptive) chosen ciphertext attack presented later in Section 2.4.2.2 is also applicable in the case of Pohlig-Hellman. The attack requires the de- cryption of 228 ciphertexts to retrieve the secret exponent d (the attack on RSA requires 227 ciphertexts, but in the case of Pohlig-Hellman an addi- tional decryption is required for each buggy decryption in order to verify the correctness of the decryption). As in the previous attacks on Pohlig- Hellman, a similar chosen plaintext attack can retrieve the secret exponent e.

2.4.2 Bug Attacks on RSA

In this section we describe several chosen ciphertext attacks on RSA, where the attacked implementation performs decryptions without using CRT. In- stead, we assume that the decryption of a ciphertext C is performed by computing Cd mod n using LTOR (where d is the secret exponent of RSA). We assume that the public exponent e and the public modulus n are known. The main difference between the case of RSA and the case of Pohlig-Hellman is that there is no known efficient algorithm to compute roots modulo a com- posite n when the factorization of n is unknown. In addition, unlike the case of Pohlig-Hellman, in RSA it is easy to check whether the decrypted message Mˆ is the correct decryption of a chosen ciphertext C by checking whether Mˆ e ≡ C (mod n). Therefore, there is no need to request the decryptions of additional messages for this purpose.

2.4.2.1 Adaptive Chosen Ciphertext Attack.

We describe an adaptive chosen ciphertext attack which requires the decryp- tion of log n chosen ciphertexts by the target computer. The generation of each of the ciphertexts requires 227 time on the attacker’s (bug-free) com- puter, and thus the total time complexity of the attack is about 237. The details are provided in Algorithm 2.4.

25

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 1. For i = log n down to 1 do

′ (a) Denote∑ the value of the known bits of d by d = log n k−(i+1) k=i+1 2 dk. (b) Repeatedly choose random values C which contain b, until ′ C2d mod n contains a. (c) Ask for the decryption Mˆ = C⟨d⟩ mod n using the faulty proces- sor. (d) Compute Cˆ = Mˆ e mod n.

(e) If Cˆ = C conclude that di = 0, otherwise conclude that di = 1.

2. Set d0 = 1.

Algorithm 2.4: Adaptive Chosen-Ciphertext Attack Against RSA with LTOR

The attack is similar to the basic attack presented in Section 2.4.1.1, except that here only the word a is contained in the intermediate value of the exponentiation. The word b is contained in the ciphertext C, and therefore the roles of the correct and incorrect results are exchanged: now

an incorrect result corresponds to di = 1 and a correct result corresponds to di = 0.

During the execution of the LTOR algorithm, the intermediate value of the variable z during iteration log n − i contains a (due to the selection of

C in Step 1b of the attack). If di = 0 then z is squared, and no errors in the computation are expected to occur, leading to Cˆ = C in Step 1e. If

di = 1, then z is multiplied by C, which contains the word b, and due to the bug, the result of the exponentiation is expected to be incorrect, leading to Cˆ ≠ C in Step 1e.

As explained in Section 2.2, the probability that the random number ′ Cd mod n contains somewhere along it the word a is 2−27 (for our standard parameters). Therefore, Step 1b takes an average time of 227 exponentia- tions on the attacker’s computer.

26

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 2.4.2.2 Chosen Ciphertext Attack.

The previous adaptive attack on exponentiations using LTOR is the basis for the following non-adaptive chosen ciphertext attack. The attack requests the decryption of 229 chosen ciphertexts, all of which contain the word b. It is expected that for every 0 ≤ i ≤ log n, there are about four ciphertexts (of the 229 for which the intermediate value of z in round i of the exponentiation

algorithm contains the word a. The value of di can be determined by the correctness of the decryption of those ciphertexts, using considerations sim- ilar to the ones used in the attack of Section 2.4.2.1. If for some i there are d′ no ciphertexts Cj for which Xj = Cj mod n contains a, there is no choice but to continue the attack recursively for both di = 0 and di = 1. However, when the wrong value is chosen, a contradiction may be encountered before

retrieving the rest of the bits (i.e., more than one ciphertext Cj for which Xj contains a is found, and the decryption of some, but not all, of them is in- correct). By using standard results from the theory of branching processes, 229 ciphertexts suffice to ensure that recursive calls which represent wrong bit values are quickly aborted. The attack is presented in Algorithm 2.5. We remark that there is a tradeoff between the number of ciphertexts and the time complexity of the attack. With more data there is a higher

probability that there will be some ciphertext Cj for which Xj contains a, for some iteration i of the attack, and the time complexity decreases. On the other hand with less data this probability is lower, and when there are

no such ciphertexts we have to continue the attack both with di = 0 and di = 1 (Step 2(c)iii), thus increasing the attack time. If for every i there d′ exists a j such that Cj contains b, the time complexity is equal to the data complexity (i.e., 229).

2.4.2.3 Known Plaintext Attack.

The chosen ciphertext attack from Section 2.4.2.2 can be easily transformed into a known plaintext attack which requires 256 known plaintexts. Among the 256 plaintexts, only 256 · 2−27 = 229 are expected to contain b. We can discard all the plaintexts which do not contain b and use the rest as inputs for the attack of Algorithm 2.5 (Section 2.4.2.2). Note that the known plaintexts must be the result of decrypting the corresponding ciphertexts on the faulty processor. The attack will not work

27

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 29 29 1. Choose 2 random ciphertexts Cj (1 ≤ j ≤ 2 ) containing the word b, and ask for their decryptions Mˆ j using the faulty processor. 2. For i = log n down to 1 do

′ (a) Denote∑ the value of the known bits of d by d = log n k−(i+1) k=i+1 2 dk. 2d′ (b) For each ciphertext Cj compute Xj = Cj mod n.

(c) Consider all ciphertexts Cj such that Xj contains a: ˆ e i. If for all such ciphertexts Cj it holds that Mj mod n = Cj then set di = 0. ˆ e ̸ ii. If for all such ciphertexts Cj it holds that Mj mod n = Cj then set di = 1. iii. If there are no such ciphertexts try the rest of the attack for both di = 0 and di = 1. ˆ e iv. If for some of these ciphertexts Cj, Mj mod n = Cj and for ˆ e ̸ others Mj mod n = Cj (i.e., a previously set value of one of the bits is wrong) then backtrack.

3. Set d0 = 1.

Algorithm 2.5: Chosen-Ciphertext Attack Against RSA with LTOR

28

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 if the given plaintext-ciphertext pairs are obtained by encrypting plaintexts (either on the attacker’s computer or on the target computer).

2.4.3 Bug Attacks on OAEP

Since RSA has many mathematical properties such as multiplicativity, it is often protected by padding schemes. The most popular scheme is OAEP [7], which provides provable security. We show here that although OAEP pro- tects against “standard” attacks on RSA, it provides only limited protection against bug attacks, since it was not designed to with errors during the computation. OAEP adds randomness and redundancy to messages before encrypting them with RSA, and rejects ciphertexts which do not display the expected redundancy when decrypted. Random ciphertexts are not expected to dis- play such a redundancy, and are likely to be rejected by the receiver with overwhelming probability. To choose valid ciphertexts with certain desired characteristics (e.g., contains the word a, or such that some intermediate value of the decryption contains a or b) we choose random plaintexts and encrypt them using the proper OAEP padding, until we get a ciphertext that has the desired structure by chance (since OAEP is a randomized ci- pher, we can also try to encrypt the same message with different random values, and thus can control the result of the decryption). Our main obser- vation is that the structure we need in our attack (such as the existence of a certain word in the ciphertext) has a relatively high probability regard- less of how much redundancy is added to the plaintext by OAEP, and the knowledge that a correctly constructed ciphertext was rejected suffices to conclude that some computational error occurred. We are thus exploiting the OAEP countermeasure itself in order to mount the new bug attack. The attacks we present on RSA-OAEP are very similar to the attacks on RSA from Section 2.4.2, with some minor modifications. The same attacks are also applicable to OAEP+ [77].

2.4.3.1 Adaptive Chosen Ciphertext Attack.

Unlike the attack of Algorithm 2.4 (Section 2.4.2.1), OAEP stops us from directly choosing ciphertexts C which contain b, and thus in Step 1b we must choose random messages (on our own computer) until b “appears” in C at

29

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 random. As explained in Section 2.2.4, the probability that this happens ′ and also C2d mod n contains a is 2−54. As mentioned above, computation errors are identified in Step 1e of the attack on OAEP by the mere rejection of the ciphertext, and there is no need to know the actual value which was rejected. The attack requires the decryption of log n chosen ciphertexts, and thus its total time complexity for 1024-bit n’s is 264.

2.4.3.2 Chosen Ciphertext Attack.

The (non-adaptive) chosen ciphertext attack on RSA from Section 2.4.2.2 (Algorithm 2.5) can also be used in the case of OAEP. For a random message, the probability that the ciphertext contains b is 2−27. In order to find 229 messages with a ciphertext which contains b (as required by the attack), we have to try about 256 random messages. Therefore, the attack requires the decryption of 229 chosen ciphertexts, plus 256 pre-computation time on the attacker’s own computer. Once the decryptions of the chosen ciphertexts are available, the key can be retrieved in 229 additional time.

2.5 Bug Attacks on RTOL Exponentiations

In this section we present attacks against Pohlig-Hellman, RSA, and RSA- OAEP, where exponentiations are performed using the RTOL exponentia- tion algorithm. In RTOL, the value of the variable y is squared in every iteration of the exponentiation algorithm, regardless of the bits of the secret exponent. Any error introduced into the value of y undergoes the squaring transformation in every subsequent iteration, and is propagated to the value of z if and only if the corresponding bit of the exponent is set. Consequently, every set bit in the binary representation of the exponent introduces a differ- ent error into the value of z, while zero bits do not introduce any errors. This allows us to mount efficient non-adaptive attacks, and to retrieve more than one bit from each chosen ciphertext, as described in the attacks presented in this section.

2.5.1 Bug Attacks on Pohlig-Hellman

We present a chosen ciphertext attack against Pohlig-Hellman implemen- tation in which exponentiations are performed using RTOL. The attack is

30

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 1. For i = log p − (log p mod r) down to 0 step −r

− (a) Compute C = X1/2i 1 mod p. ′ (b) Denote∑ the value of the known bits of d by d = log p k−(i+r) k=i+r 2 dk. (c) Ask for the decryption Mˆ = C⟨d⟩ mod p on the faulty processor. (d) Obtain the correct decryption M = Cd mod p. ′ (e) Find an r-bit value u such that M/Mˆ = β2rd +u mod p (0 ≤ u < 2r).

(f) Denote the bits of u by ur−1ur−2 . . . u1u0.

(g) Conclude that di+k = uk, ∀ 0 ≤ k < r.

Algorithm 2.6: Chosen-Ciphertext Attack Against Pohlig-Hellman with RTOL

aimed at retrieving the bits of the secret exponent d. As in Section 2.4.1, an identical chosen plaintext attack can retrieve the bits of the secret exponent e.

2.5.1.1 Chosen Ciphertext Attack.

We present a (non-adaptive) chosen ciphertext attack which retrieves the secret key when the exponentiation is performed using RTOL. Let X be a value which contains the words a and b, and let β = X2/X⟨2⟩. Unlike the improved attack on Pohlig-Hellman of Section 2.4.1.2, it does not help if X⟨2⟩ also contains a and b (on the contrary, it makes the analysis slightly more complicated). Each chosen ciphertext is used to retrieve r bits of the secret exponent d, where r is a parameter of the attack. The reader is advised to consider first the simplest case of r = 1. The attack is presented in Algorithm 2.6. Consider the decryption of C in Step 1c, for some i. Exponentiation by the RTOL algorithm sets y = C, and squares y repeatedly. After squaring it i − 1 times, the value of y becomes X, which contains both a and b. When y is squared again, a multiplicative error factor of β is introduced

31

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 into its computed value (compared to its bug-free value). If di = 1 then z is multiplied by y, and thus the same multiplicative error factor of β is also propagated into the value of z. After the next squaring of y, it contains an 2 error factor of β , which is propagated into the value of z only if di+1 = 1. In each additional iteration of the exponentiation the previous error in y is squared, and the error affects the result if and only if the corresponding bit of d is set. At the end of the exponentiation, the error factor in the final result is: log p ( ) ∏ d ∑ M 2k−i k log p 2k−id ≡ β ≡ β k=i k (mod p). ˆ M k=i Since only r bits of the exponent are unknown, they can be easily retrieved by performing 2r − 1 modular multiplications. As in the attacks of Section 2.4.1, all the error-free decryption queries in Step 1d can be replaced by the decryption of one additional ciphertext on the faulty processor: The attacker can request the decryption M 3 of C3 mod p (or any other power of C which is not expected to cause a decryption error), and then in Step 1e can find an r-bit value u such that

[ ]3 3 log∏p ( ) M k−i dk r ′ ≡ β2 ≡ β3(2 d +u) (mod p). ˆ 3 M k=i

The attack requires 2⌈(log p + 1)/r⌉ decryptions of chosen ciphertexts, and all of them can be pre-computed by log p modular square roots (Step 1a of the attack). Once the decryptions are available, each execution of Step 1e finds r bits of d using 2r − 1 multiplications, which is equivalent to about 2r/ log p modular exponentiations. Since Step 1e is executed ⌈(log p + 1)/r⌉ times, the total time complexity is about 2r/r. For small values of r this time complexity is negligible compared to the time of the pre-computation. For r ≥ 12, however, this computation takes longer, and there is a tradeoff between the time complexity and the data complexity.

2.5.2 Bug Attacks on RSA

Unlike the case of Pohlig-Hellman, there is no known efficient algorithm for extracting roots modulo a composite n with unknown factors. The cho- sen ciphertext attack presented in this section circumvents this problem by

32

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 1. Choose a random ciphertext C0, and let t = 0.

2. While t ≤ log n and Ct does not contain both a and b do: (a) t = t + 1. 2 (b) Compute Ct = Ct−1 mod n.

⟨2⟩ 3. Let X = Ct and let X be the result of squaring X on a faulty processor.

4. Let β = X2/X⟨2⟩ mod n.

5. For i = log n − (log n mod r) down to 0 step −r

(a) Ask for the decryption Mˆ of C = Ct−i using the faulty processor, ⟨d⟩ M = Ct−i mod p. ′ (b) Denote∑ the value of the known bits of d by d = log n k−(i+1) k=i+r 2 dk. (c) Compute Cˆ = Mˆ e mod n. ( ) ′ e (d) Find an r-bit value u such that C/Cˆ ≡ β2rd +u (mod n).

(e) Denote the bits of u by ur−1ur−2 . . . u1u0.

(f) Conclude that di+k = uk, ∀ 0 ≤ k < r.

Algorithm 2.7: Chosen-Ciphertext attack against RSA with RTOL

choosing random ciphertexts until a suitable ciphertext is found.

2.5.2.1 Chosen Ciphertext Attack.

The attack in this case is similar to the attack on RTOL modulo a prime p (Section 2.5.1.1, Algorithm 2.6), except for some necessary adaptations to the case of RSA. The attack requires a pre-computation to find a value X − which contains both a and b, and such that all the values X1/2i 1 for 1 ≤ i ≤ log n are known (Step 2 in Algorithm 2.7). The parameter r represents the number of bits retrieved in each iteration. Algorithm 2.7 describes the attack.

33

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 A random ciphertext contains a and b with probability 2−54, and there- fore the pre-computation of Step 2 is expected to take time corresponding to 254 modular multiplications (which is equivalent to 244 modular expo- nentiations when log n = 1024). In each iteration of the attack, r bits are retrieved by performing 2r −1 modular multiplications, which are equivalent to about (2r − 1)/ log n modular exponentiations. Thus, once the decrypted ciphertexts are available, the attack requires a time equivalent to about ⌈ ⌉ r − r − log n 2 1 ≈ 2 1 r log n r

modular multiplications. As in the attack of Section 2.5.1, this attack re- quires ⌈log n/r⌉ decryptions of pre-computed chosen ciphertexts. Step 5d finds r bits of the secret exponent d using 2r − 1 multiplications, and thus (as in the attack from Section 2.5.1.1) for large values of r there is a tradeoff between the time complexity and the data complexity.

2.5.3 Bug Attacks on OAEP Implementations that use RTOL

2.5.3.1 Adaptive Chosen Ciphertext Attack.

We present an adaptive chosen ciphertext attack for the case of RSA-OAEP when exponentiations are performed using RTOL. The presented attack re- sembles the attack on RSA-OAEP from Section 2.4.3, but it identifies the bits of d starting from the least significant bit. The details are provided in Algorithm 2.8. After i iterations of the decryption exponentiation algorithm, the value ′ of the variable z is Cd mod n, and the value of the variable y is C2i mod n. The ciphertext C is chosen such that one of these values contains a and the

other contains b. Therefore, if these values are multiplied (di = 1), then the result of the decryption is expected to be wrong, and the ciphertext is rejected. Otherwise, no errors are expected to occur, and the decryption is

expected to succeed (di = 0). The complexity of finding the ciphertext in Step 2b is 254, and the com- plexity of the entire attack for 1024-bit n’s is 264 exponentiations on the attacker’s computer. The attack requires log n chosen ciphertexts, which are decrypted on the target machine.

34

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 1. Set d0 = 1. 2. For i = 1 to log n ∑ ′ i−1 k (a) Denote the value of the known bits of d by d = k=0 2 dk mod n. (b) Repeatedly encrypt random messages M until C = E(M) = ′ (OAEP(M))e satisfies that C2i mod n contains a and Cd mod n contains b. (c) Ask for the decryption of C using the faulty processor.

(d) If the decryption succeeds conclude that di = 0, otherwise con- clude that di = 1.

Algorithm 2.8: Adaptive Chosen-Ciphertext Attack Against RSA-OAEP with RTOL

2.6 Bug Attacks Using the Legendre Symbol and Square Roots

In this section we describe techniques which use the Legendre Symbol to identify incorrect decryptions (in the case of exponentiations with RTOL), and help identify the bits of the secret exponent (in the case of exponen- tiations with LTOR). Using these techniques, we are able to mount known plaintext bug attacks on the Pohlig-Hellman scheme, which were not possible with the techniques of Sections 2.4 and 2.5.

2.6.1 Bug Attacks on Pohlig-Hellman Implementations that use RTOL

Exponentiation by an odd exponent modulo a prime p preserves the Legen- dre symbol of the input. In the case of Pohlig-Hellman, since the decryption exponent d is odd, for every ciphertext C: ( ) ( ) C Cd = . p p

We observe that if a bug occurs when exponentiating with RTOL, the

35

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 Legendre symbol of the (faulty) decrypted message may be different than the Legendre symbol of the ciphertext, and thus the Legendre symbol can be used to ascertain that a bug had occurred. Consider the exponentiation of a ciphertext C with RTOL: The least significant bit of d is one, thus after the first iteration of the exponentiation z = C, and y = C2 mod p. From now on the Legendre symbol of y modulo p is always one (y is the result of a square operation, and thus is a quadratic residue), and since z can only be multiplied by y, the Legendre symbol of z modulo p does not change in the remaining iterations. However, if as a result of the execution of the buggy instruction the result of squaring y has a Legendre symbol −1, then the Legendre symbol of the decrypted message will be flipped. We first present a chosen ciphertext (or chosen plaintext) attack against Pohlig-Hellman where exponentiations are performed with RTOL, and then extend it to a known plaintext attack.

2.6.1.1 Chosen Ciphertext Attack.

The chosen ciphertext attack presented in this section uses the technique described above to find the bits of the secret exponent d. As in the attacks on Pohlig-Hellman from Sections 2.4.1 and 2.5.1, a similar chosen plaintext attack can retrieve the bits of the secret exponent e. In the following attack, the bits of d are retrieved from the least significant bit to the most significant (note that in principle the bits of d may be retrieved in any order by this attack, but this order may be implemented more efficiently than others with fewer square root calls). The attack uses a value X which contains both a and b, such that the Legendre symbol of X⟨2⟩ mod p is −1. Such a value can be easily obtained by selecting random numbers which contain a and b and checking the Legendre ⟨2⟩ ∗ symbol of X mod p. Since half the numbers in Zp have a Legendre Symbol of −1, a suitable value is expected to be found after two attempts on average. The attack is presented in Algorithm 2.9. The attack is as follows: The attack requires decryption of log p cipher- texts, log p extractions of modular roots and log p computations of Legendre symbol.

36

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 ( ) X⟨2⟩ − 1. Choose X such that X contains a and b, and p = 1.

2. Set d0 = 1. 3. For i = 1 to log p

(a) Compute C = X1/2i mod p (any 2i’th root is accepted). (b) Ask for the decryption Mˆ = C⟨d⟩ mod p on the faulty processor. ( ) ( ) C Mˆ (c) If p = p set di = 0, otherwise set di = 1.

Algorithm 2.9: Chosen-Ciphertext Attack Against Pohlig-Hellman with RTOL

2.6.1.2 Known Plaintext Attack.

Using the Legendre symbol to identify incorrect decryptions an attacker can retrieve the bits of the secret exponent d (if the plaintexts were obtained by decrypting the ciphertexts on a faulty processor) or the secret exponent e (in case the ciphertexts are the result of encrypting the plaintexts on the faulty processor). Without loss of generality we describe our attack against the former case. The attack requires 257 plaintexts and ciphertexts. For every 0 ≤ i ≤ log p we expect that for about eight ciphertexts, the value C2i contains both a and b, and that about half of them also have a Legendre symbol −1 when squared modulo p on the buggy processor. These ciphertexts can be used in an attack similar to the one described in Algorithm 2.9 (Section 2.6.1.1). The bits of d can be retrieved in any order, thus if there are no suitable ciphertexts to retrieve a specific bit, an attacker can continue to retrieve the rest of the bits, and guess the value of the missing bit at the end. More than 98% (1 − e−4) of the bits are expected to be successfully retrieved by this method with this number of ciphertexts (e.g., for 512-bit moduli only about 10 remain to be tried at the end). The detailed attack is described in Algorithm 2.10. In addition to the previous method, another method can be used to

37

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 1. Set d0 = 1. 2. For each M,Cˆ of the 257 message-ciphertext pairs

(a) For i = 1 to log p ( ) 2i+1 i. If C2i contains a and b, and C = −1 ( ) ( ) p C Mˆ A. If p = p set di = 0. B. Otherwise set di = 1. 3. Exhaustively search for the values of all the bits of d that were not found in Step 2.

Algorithm 2.10: Known-Plaintext Attack Against Pohlig-Hellman with RTOL

retrieve the bits of d, using the same data. Unlike the first method, this second method requires retrieving the bits of d in a specific order, from the least significant to the most significant. We expect that among the 257 ciphertexts, for every 0 ≤ i ≤ log p there are about eight ciphertexts such that after i iterations of exponentiation the value of the variable z (of RTOL algorithm) contains a, and the value of the variable y contains b (as in the( attack) ( from) Section 2.5.3.1).( We) expect( ) that about half of them z⊙y ̸ C Mˆ ̸ C satisfy p = p , and thus also p = p . When d0, d1, . . . , di−1 are known, these ciphertexts can be easily identified. If at least one such

ciphertext exists, the bit di can also be identified by this observation. Both described methods can be applied together using the same data in order to reduce the probability of an error (or slightly reduce the data complexity).

2.6.2 Bug Attacks on Pohlig-Hellman Implementations that use LTOR

In this section we describe techniques that use extraction of modular square roots in order to identify the bits of the secret exponent of Pohlig-Hellman. We open this section with a general discussion of long multiplications in the presence of a multiplication bug.

38

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 ∈ ∗ Let p be a large prime, and let C Zp be some number which contains ˆ ∗ → ∗ · b. Consider the functions f, f : Zp Zp defined by f(X) = X C mod p and fˆ(X) = X ⊙ C mod p. While f is an automorphism, fˆ is not. Due to the buggy multiplication there are some values X,X∗ such that fˆ(X) = fˆ(X∗) (for example, since X∗ contains a and X does not). As a result, there are ∗ some values in Zp which cannot be the result of a buggy multiplication by C. We show that given a number V , it is possible to estimate how many preimages fˆ has for V by inverting the buggy multiplication (assuming at most one occurrence of the bug). Let V = f(X) = X·C and Vˆ = fˆ(X) = X⊙C, for some number X which contains a, and recall that a multiplication of two big numbers is performed by multiplying every word of X by every word of C, and summing up the results with the appropriate left shifts. ˆ ∈ ∗ ˆ Given some V Zp , it is easy to check whether V can be the result of a bug-free multiplication by C, simply by computing X = (Vˆ · C−1) mod p. If X does not contain a, then no bugs are expected to occur when multiplying X · C mod p, and fˆ(X) = Vˆ (and also with f(X) = Vˆ ). An important −27 ∗ conclusion of this discussion is that all but about 2 of the numbers in Zp can be images of fˆ which are obtained with no executions of the bug (this value was computed using our standard parameters). It is also possible to check whether Vˆ can be an image of fˆ which is obtained by a multiplication with exactly one occurrence of the bug. The additive error δ = Vˆ − V mod p introduced into the product is a function of s, t ∈ {0, 1,..., ⌊(log p)/w⌋}, the word locations of the words a and b in X and C, respectively, where w is the size of the word and where the least significant word is considered as location 0. The additive error as a function of s, t is w(s+t) δ = δs,t = (a ⊙ b − a · b)2 mod p. (2.1) Furthermore, s + t is limited to the range ∈ {0, 1,..., 2⌊(log p)/w⌋} (there are fewer possibilities if s and/or t are known). We conclude that there are at most 2⌊(log p)/w⌋ + 1 possible values for δ. Given Vˆ and C, and assuming only one occurrence of the bug, there are ⌊(log p)/w⌋ + 1 possible values of s (t is known because C is known). For each of the possible values −1 of s the corresponding values of δs,t, V and X = Z · C mod p can be easily computed. The correct values of s, δs,t, V and X can now be easily

39

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 identified, since only the correct value of X is expected to contain the word a in location s. It follows from this discussion that a fraction of 2−27 of the ∗ numbers in Zp can be results of multiplication by C with one execution of the buggy instruction. We denote the set of those numbers by W . Let β = Vˆ /V mod p be the multiplicative error in the computation of V due to the bug. The relation between β and δ is given by: ( ) Vˆ δ ≡ V − 1 ≡ V (β − 1) (mod p). (2.2) V

We now use these observations to mount chosen plaintext (or chosen ci- phertext) and known plaintext attacks on Pohlig-Hellman implementations, that use the LTOR algorithm for decryption.

2.6.2.1 Chosen Ciphertext Attack.

The following chosen ciphertext attack has two parts. The first part uses the techniques described above to identify some of the 1’s of the secret exponent d, while the second part searches for the values of the rest of the bits. A similar chosen plaintext attack can retrieve the bits of the secret exponent e. The attack requests the decryption of 224 ciphertexts which contain b.

In the first part of the attack, each incorrect decryption reveals a bit dj of d with value dj = 1. In the (log n−j)-th iteration of the LTOR algorithm 2 d 2 (the iteration that computes z ← z C j ), if dj = 1 and z contains a, then we expect that z2 · C ≠ z2 ⊙ C. Let V = z2 · C and Vˆ = z2 ⊙ C, and let β = Vˆ /V and δ = Vˆ − V (all the computations are performed modulo p). The values of V , Vˆ , β and δ are related by (2.2). The LTOR algorithm ends after j additional iterations. On a faulty processor its result is expected to be Mˆ = Vˆ 2j C∗ mod p, for some C∗ which depends on the value of C and on 2j ∗ dj−1, . . . , d1, d0, while the correct result is M = V C mod p (for the same value of C∗). Let Q ≡ M/Mˆ ≡ β2j (mod p). Given any incorrect decryption Mˆ = C⟨d⟩ mod p and the corresponding correct decryption M = Cd mod p of a ciphertext C containing b in location t, this attack uses Q ≡ M/Mˆ ≡ β2j (mod p) to search for the combination of j and δ which corresponds to the computation error. For each possible combination of j and s, we first extract the 2j-th roots of Q (there are gcd(2j, p−1) such roots) to determine possible

40

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 values for β. For each candidate for β we use δs,t to compute the value of V according to (2.2):

−1 V ≡ δs,t(β − 1) (mod p).

Once we try the correct combination of β and s, we find that z2 = VC−1 mod p

indeed contains a in location s. We can thus conclude that dj = 1, and save the tuple (C, z2, j) for a later stage. The probability that VC−1 will contain a in location s for an incorrect combination of β and s is 2−w. However, the probability that this will occur during the entire course of the attack is bounded from above by gcd(2⌈log p⌉, p − 1) · log2 p/ (w · 2w), which is usually small. For example, for p ≡ 3 (mod 4) and our standard parameters, the probability of an error in the course of the attack is bounded from above by 2−16. In the second part of the attack, after identifying some of the 1’s in d, we search for the values of the other bits, using the information gathered in the

first part. For every dj = 1 that was found in the first part of the attack, we 2 also learnt the intermediate value of z after j iterations of decrypting Cj with the LTOR algorithm. We sort the tuples (C, ?, j) in descending order of j, and get intervals which start and end with bits of d with value 1 (with unknown bit values in between them). We then analyze those intervals of unknown bits in order from left to right. We recover the values of the bits in each interval by exhaustively searching for their value until the correct intermediate value of the exponentiation is received. For example, if n = 10

and in the first phase of the attack we identified that the value of bits d6 and d3 is 1, then in the second phase we first try all values of d9, d8, d7 and find the correct ones, then search for the values of d5, d4 and finally the values of d2, d1, d0. The complexity of the search depends on the length of the intervals. Assuming that the indices of the k bits found in the first part of the attack are uniformly distributed, the average distance between them is r = log p/(k+1) bits, so the search of each interval is expected to take about 2log p/(k+1) modular multiplications. There is a tradeoff between the data complexity of the attack and the time complexity of the second part. By increasing the data complexity, we expect to find more bits in the first part of the attack (larger k), which allows us to search for the values of fewer bits at a time in

41

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 1. Choose 224 random ciphertexts which contain b, and ask for their decryption on the buggy machine.

2. Obtain the correct decryptions of the chosen ciphertexts.

3. For every incorrectly decrypted ciphertext C do

(a) Let t be the location of b in C. (b) Denote the correct decryption of C by M and the incorrect de- cryption by Mˆ . (c) Set Q = M/Mˆ . (d) For j = 0 to log p do i. For every 2j-th modular root β of Q modulo p and every 0 ≤ s ≤ ⌊log p/w⌋ do −1 −1 A. Compute X = δs,t(β − 1) C mod p. B. If X contains the word a in location s then set dj = 1, save the tuple (C,X,j) and proceed to the next ciphertext.

4. For every saved tuple (C,X,j) in descending order of j do:

(a) Complete the unknown values among dlog p, . . . , dj+2, dj+1, such that the intermediate value of z2 after j iterations of exponenti- ating C is X.

Algorithm 2.11: Chosen-Ciphertext Attack Against Pohlig-Hellman with LTOR

the second part, and vice versa. The attack is presented in Algorithm 2.11, where Step 1–3 describe the first part of the attack and Step 4 describes the second part. Using our standard parameters, if we request the decryption of 224 ci- phertexts which contain b, then about 224 ·210 ·2−27 = 27 of the intermediate values of j are expected to contain a. About half of them are expected to appear in an iteration for which the corresponding bit of d is 1, and there- fore k = 26, and the average length r of the intervals is approximately 24 bits. The time complexity of the second part of the attack is thus about

42

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 216 · 26 = 222. Additional 224 ciphertexts are required for Step 2 of the attack, and thus the total data complexity is 225.

2.6.2.2 Known Plaintext Attack.

The following known plaintext bug attack is based on extracting square roots to reverse the LTOR algorithm, and find the bits of the secret exponent from the LSB to the MSB (this is the reverse order of the order in which they are used in the exponentiation algorithm).

The first step of the attack discards all ciphertexts which do not contain the word b, as they are less likely to cause the execution of the buggy instruc- tion. Unlike the chosen ciphertext model, we cannot use the multiplicative properties of the cryptosystem in order to identify the incorrect decryp- tions, and thus we analyze all the remaining ciphertexts and use statistical methods to identify the bits of the secret exponent.

When i (i ∈ {0, 1,..., log p}) least significant bits of the secret expo- nent d are already known, we can reverse the last i iterations of the LTOR algorithm and compute the value of the variable z after log p−i iterations of the exponentiation (since this process involves extracting square roots, we get up to r = gcd(2i, p − 1) candidates for the value of z). We expect that if −27 di = 1, then for a fraction of 2 of the ciphertexts the value of z is a result of a buggy multiplication (they form a fraction of 2−27/r of all candidates). −27 Also, both in the case of di = 0 and in the case of di = 1, a fraction of 2 of the candidates are values which may be the results of buggy multiplications,

with one execution of the buggy instruction. Therefore, if di = 0 then only a fraction of 2−27 of the candidates is expected to be in the set W , while if 1 −27 di = 1, a fraction of (1+ r )2 of the candidates is expected to be in W . In order to distinguish between these two distributions with high probability, 4r2227 ciphertexts C that contain a are sufficient. Since only a fraction of 2−27 of all the ciphertexts in the available data are expected to contain a, we require a total of 4r2254 known ciphertexts for the attack. The attack is presented in Algorithm 2.12.

43

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 1. Discard all the ciphertexts that do not contain the word b. They are not used in the attack.

2. Set d0 = 1. 3. For i = 1 to log p

(a) Reverse the last i iterations of the LTOR exponentiation algo- rithm for all the ciphertexts, and obtain r = gcd(2i, p − 1) pos- sible values from each ciphertext. Denote the set of all values retrieved by Y . (b) Compute |Y ∩ W | (using the method described in Sec- tion 2.6.2.1). (c) If |Y ∩ W | ≃ 2−27 |Y | set d = 0. ( ) i | ∩ | ≃ 1 −27 | | (d) If Y W 1 + r 2 Y set di = 1.

Algorithm 2.12: Known-Plaintext Attack Against Pohlig-Hellman with LTOR

2.7 Vulnerabilities of Other Kind of Schemes

In this section we consider other schemes that are likely to be vulnerable in the presence of a multiplication bug.

2.7.1 Elliptic Curve Schemes

In cryptosystems based on elliptic curves, exponentiations are replaced by multiplying a point by a constant. It should be noted that the implementa- tions of point addition (corresponding to multiplication in modular groups) and of point doubling (corresponding to squaring in modular groups) are different, but both of them use multiplications of large integers. Our bug attacks can be easily adapted in such a way that the bug is invoked only if two points are added (or alternatively, only if a point is doubled). The correctness or incorrectness of the result reveals the bits of the exponent.

44

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 2.7.2 Bug Attacks on Symmetric Primitives

Multiplication bugs can also be used to get information on the keys of symmetric ciphers which include multiplications, such as the block ciphers IDEA [47], MARS [21], DFC [31], MultiSwap [71], [83] and RC6 [68], the stream cipher Rabbit [19], the message authentication code UMAC [17], etc. In IDEA, MARS, DFC, MultiSwap and Nimbus, subkeys are multiplied by intermediate values. If an encryption (or decryption) result is known to be incorrect, an attacker may assume that one of the subkeys used for these multiplications is a, and the corresponding intermediate value is b. For example, by selecting a plaintext which contains b in a word that is multiplied by a subkey, the attacker can easily check if the value of that subkey is a. In Rabbit, a 32-bit value is squared to compute a 64-bit result, which is then used to update the internal state of the cipher. In faulty implemen- tations with word size 8 or 16 (likely word sizes for smart card implemen- tations), faults in the stream can give the attacker information about the internal state. Similarly, the block cipher RC6 uses multiplications of the form A·(2A+1) for 32-bit values A, and thus multiplication bugs may cause errors in faulty implementations with word size 8 or 16. This is, however, an unlikely scenario, since bugs in processors with small words are expected to cause frequent errors, and therefore can be easily discovered. The MAC function UMAC uses multiplications of two words, both of which depend on the authenticated message. If an incorrect MAC is com- puted on a faulty processor, an attacker can gain information on the inter- mediate values of the computation.

2.8 Summary and Countermeasures

We presented several attacks against exponentiation based public-key and secret-key cryptosystems, including Pohlig-Hellman and RSA. We described such attacks for the two most common implementations of exponentiation. We also discussed the applicability of these techniques to elliptic curve cryp- tosystems and symmetric ciphers. The attacks and their complexities are summarized in Table 2.1.

45

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 There are various countermeasures against bug attacks. Many protection techniques against fault attacks are also applicable to bug attacks, but we stress that due to the differences between the techniques, most of them have to be adapted to the new environment. As shown in Sections 2.4.3 and 2.5.3, and unlike the case of fault attacks, the mere knowledge that an error occurred suffices to mount an attack, even if the output of decryption is not available. Therefore, if a decryption is found to be incorrect, it can be dangerous to send out an error message, and the correct result must be computed by other means. Possible ways to compute the correct result include using a different exponentiation algorithm, or relying on the multiplicative property of the discussed schemes to blind the computations (the techniques for blinding RSA are based on [22]). When blinding is used, an attacker has no control over the exponentiated values, and they are not made available to her. Thus, even if faults occur during the exponentiation, no information is leaked. However, this method renders the system vulnerable to timing attacks, as the decryption of ciphertexts which trigger the bug takes longer than decryptions which succeed in the first attempt. In order to protect the implementation from timing attacks, the original exponentiations must be blinded, so that no unblinded exponentiations are performed at all. Another alternative is to exponentiate modulo n·r, where r is a small (e.g., 32-bit) prime unknown to the attacker, and reduce the result mod n only at the last step [73].

2.A Brief Descriptions of Several Cryptosystems

2.A.1 The Pohlig-Hellman Cryptosystem and Pohlig-Hellman- Shamir Protocol

The Pohlig-Hellman cryptosystem [66] is a symmetric cryptosystem. Let p be a large prime number. Alice and Bob share a secret key e, 1 ≤ e ≤ p − 2, gcd(e, p − 1) = 1. When Alice wants to encrypt a message m, she computes c = me mod p. Bob can decrypt c by computing its e-th root modulo p. In practice, the decryption is performed by computing cd mod p, where d is a decryption exponent such that d · e ≡ 1 (mod p − 1). Note that given the encryption exponent e, the decryption exponent d can be easily computed, and thus e must be kept secret.

46

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 Technion -Computer ScienceDepartment -Ph.D. ThesisPHD-2015-01 -2015

Scheme Exp. Attack Sec. Data Time Complexity Alg. Pre- for 32-bit for 64-bit Comp. Attack Words♢ Words♢ ⌈ ⌉ log p ∗ log p+2r ∗ 6 10 27 6 10 27 Pohlig- RTOL CP/CC 2.5.1.1 2 r log p r 2 /2 /2 2 /2 /2 Hellman RTOL CP/CC 2.6.1.1 log p log p log p 210/210/210 210/210/210 8·22w·w2 4·22w·w2 57 57 123 123 RTOL KP 2.6.1.2 log2 p – log2 p 2 /–/2 2 /–/2 LTOR ACP/ACC 2.4.1.1 log p – log p 210/–/210 210/–/210 log p ∗∗ ∗∗ log p ∗∗ 9 9 9 9 LTOR ACP/ACC 2.4.1.2 α α 2 +/–/2 + 2 /–/2 2·2w·w 2·2w·w 28 28 61 61 LTOR CP/CC 2.4.1.3 log p – log p 2 /–/2 2 /–/2 4k·2w·w ♯ 2k·2w·w log p/(k+1)♯ 25 25 56 56 LTOR CP/CC 2.6.2.1 log p – log p + (k + 1)2 2 /–/2 2 /–/2 4r222ww2 † 4r222ww2 † 2 56 2 56 2 122 2 56 LTOR KP 2.6.2.2 log2 p – log2 p r 2 /–/r 2 r 2 /–/r 2 RSA CRT CC 2.3 ⌈ 1 ⌉ – 1 1/–/1 1/–/1 log n ∗ 22ww2 log n+2r ∗ 5 54 27 5 120 27 RTOL CC 2.5.2.1 r log2 n r 2 /2 /2 2 /2 /2 LTOR ACC 2.4.2.1 log n – 2w · w 210/–/237 210/–/270 w w 47 4·2 ·w 4·2 ·w 29 29 62 62 LTOR CC 2.4.2.2 log n – log n 2 /–/2 2 /–/2 4·22w·w2 4·2w·w 56 29 122 62 LTOR KP 2.4.2.3 log2 n – log n 2 /–/2 2 /–/2 22w·w2 10 64 10 130 RSA RTOL ACC 2.5.3.1 log n – log n 2 /–/2 2 /–/2 22w·w2 10 64 10 130 with LTOR ACC 2.4.3.1 log n – log n 2 /–/2 2 /–/2 4·2w·w 4·22w·w2 4·2w·w 29 56 29 62 122 62 OAEP LTOR CC 2.4.3.2 log n log2 n log n 2 /2 /2 2 /2 /2

KP – Known Plaintext; CP – Chosen Plaintext; ACP– Adaptive Chosen Plaintext CC – Chosen Ciphertext; ACC– Adaptive Chosen Ciphertext w is the word size (in bits) of the faulty processor. ♢ Complexity is described in terms of data/pre-computation time/attack time, assuming log p = log n = 1024. ∗ r is a parameter of the attack. The presented numbers are for r = 25. ∗∗ α ∈ [1.5, 2] is a constant, which can be increased (within this range) by investing a longer pre-computation time. ♯ k is a parameter of the attack. The presented( numbers) are for k = 26. † In the context of this attack r = gcd 2⌊log p⌋+1, p − 1

Table 2.1: Summary of the Presented Bug Attacks The Pohlig-Hellman-Shamir [74] keyless protocol allows encrypted com- munication between two parties that do not have keys. The protocol is based on the commutative properties of the Pohlig-Hellman cipher. Let p be a large prime number. Alice and Bob each has a se-

cret encryption exponent (eA and eB, respectively) and a secret decryp- tion exponent (dA and dB, respectively) such that eA · dA ≡ eB · dB ≡ 1 (mod p − 1). When Alice wishes to send Bob an encrypted message m, she eA eB sends c1 = m mod p. Bob then computes c2 = c1 mod p and sends it back dA to Alice. Alice decrypts c2 and sends the decryption c3 = c2 mod p to Bob. dB Finally, Bob decrypts c3 to get the message m = c3 mod p. The proto- col is secure under standard computational assumptions (the Diffie-Hellman assumption), but not against man in the middle attacks.

2.A.2 The RSA Cryptosystem

RSA [69] is a public-key cryptosystem. Let n = pq be a product of two large prime integers. Bob has a public key (n, e) such that gcd(e, (p−1)(q−1)) = 1, and a private key (n, d) such that d·e ≡ 1 (mod (p−1)(q −1)). When Alice wants to send Bob an encrypted message m she computes c = me mod n. When Bob wants to decrypt the ciphertext he computes cd ≡ mde ≡ m (mod n). The security of RSA relies on the hardness of factoring n. If the factors of n are known, RSA can be easily broken.

2.A.3 RSA Decryption Using CRT

The modular exponentiations required by RSA are computationally expen- sive. Some implementations of RSA perform the decryption modulo p and q separately, and then use the Chinese remainder theorem (CRT) to compute the decryption cd mod n. Such an implementation speeds up the decryption by a factor of 4 compared to naive implementations. Given a ciphertext c, it is first reduced modulo p and modulo q. The d two values are exponentiated modulo p and q separately: mp = c p mod p, dq and mq = c mod q, where dp = d mod p − 1 and dq = d mod q − 1. Now m is computed using CRT, such that m ≡ mp (mod p) and m ≡ mq (mod q). This is done by computing m = (xmp + ymq) mod n, where x and y are

48

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 pre-computed integers that satisfy: { { x ≡ 1 (mod p) y ≡ 0 (mod p) and x ≡ 0 (mod q) y ≡ 1 (mod q) .

2.A.4 OAEP

Optimal Asymmetric Encryption Padding (OAEP) [7] and OAEP+ [77] are methods of encoding a plaintext before its encryption, with three ma- jor goals: adding randomization to deterministic encryption schemes (e.g., RSA), preventing the ciphertext from leaking information about the plain- texts, and preventing chosen ciphertext attacks. OAEP is based on two one-way functions G and H, which are used to create a two-round Feis- tel network, while OAEP+ uses three one-way functions. Only OAEP is described here. Let G : {0, 1}k0 → {0, 1}l+k1 , H : {0, 1}l+k1 → {0, 1}k0 be two one-way

functions, where l is the length of the plaintext, and k0, k1 are security parameters. When Alice wants to compute the encryption C of a plaintext M, she chooses a random value r ∈ {0, 1}k0 and computes

s = G(r) ⊕ (M||0k1 ), t = (H(s) ⊕ r), w = s||t, C = E(w),

where || denotes concatenation of binary vectors, and E denotes encryption with the underlying cipher. Decryption of c is performed by:

w = D(C),

s = w[0 . . . l + k1 − 1],

t = w[l + k1 . . . n − 1], r = H(s) ⊕ t, y = G(r) ⊕ s, M = y[0 . . . l − 1],

z = y[l . . . l + k1 − 1],

49

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 where D denotes decryption under the same cipher used in the encryption phase. If z ≠ 0k0 , then the ciphertext is rejected and no plaintext is provided. Otherwise, the decrypted plaintext is M.

2.B Known Hardware Bugs

In this appendix we give a partial list of known hardware bugs. A quick Internet search yields many more bugs, some of which were never officially acknowledged by the manufacturers. It is safe to assume that hardware manufacturers are aware of many more bugs which were never made publicly known or which were later corrected by firmware updates, and that there are many more hardware bugs waiting to be discovered.

• Pentium FDIV bug [36, 38]: This well-known bug in the FDIV instruction of the Pentium proces- sor was caused by missing entries in a lookup table. These entries were omitted due to a programming error. The bug caused inaccurate results in floating point division for some of the inputs. Byte maga- zine [36] assessed that the bug influenced about 1 in a billion floating point divisions (for random inputs).

• Intel Core 2 TLB bug [39]: Intel Core 2 memory management unit has a reported error in the TLB (Translation Lookaside Buffer – a unit responsible for translating virtual memory addresses to physical addresses). Global entries in the TLB may not be invalidated when the table is initialized, which may cause the processor to read data from incorrect memory addresses. This bug may cause the system to stop responding or crash.

• AMD Phenom 9700 / TLB system lockup bug [81]: Before its release, AMD found that the Phenom 9700 quad-core pro- cessor had a TLB bug which may cause the CPU to hang when all four cores are running at full load. The discovery of this bug caused AMD to delay the release of this model.

• Intel 80286 popf bug [62]: A bug in the popf instruction (which pops the flags off the stack)

50

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 allowed interrupts to be executed, even when they were supposed to be disabled. This bug is an example of a very simple instruction which changes the state of the CPU even when no change is needed.

• Intel Pentium f00f bug [40]: Under certain conditions, when a program tried to execute a specific invalid opcode, the entire system would hang instead of generating an “invalid opcode exception” (which would terminate the errant pro- gram).

• AMD Athlon/Duron with AGP bug [1]: This memory management bug caused Linux systems to hang when the system displayed AGP graphics. The bug was caused because of improper handling of extended paging (which supported large page sizes).

• MOS Technology 6502 bugs [84]: The 6502 model of the MOS processor introduced binary coded dec- imal (BCD) instructions for manipulating decimal numbers without first converting them to binary. If a hardware interrupt occurred when the processor was in BCD mode, it would not revert back to binary mode for the execution of the interrupt handler. Another bug in this processor caused the JMP instruction to read its destination address from the wrong memory address under certain conditions.

• Cyrix coma bug [6]: The bug in the Cyrix 6x86 series could cause the processor to stop responding to interrupts while executing an infinite loop. Because interrupts were ignored, there was no way to abort the loop, and the system would stop responding.

• Intel 80386 multiplication bug [59]: The first x86 with 32-bit architecture exhibited a bug in its 32-bit multiplication instruction. The bug could have caused the processor to stop responding. Even after the discovery of the bug, the buggy processors continued to be sold as “16 BIT S/W ONLY”.

• Intel Pentium Pro and Pentium II FPU bug [23]: This bug (regarded by Intel as the “flag erratum”) caused unexpected

51

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 behavior when trying to convert from floating point to integer, if the result was too large to fit in an integer variable.

52

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 Chapter 3

Efficient Reconstruction of RC4 Keys from Internal States

In this chapter we present an algorithm that is able to efficiently invert the key-scheduling of the stream cipher RC4. I.e., given the initial internal state of the cipher our algorithm can find the secret key. We offer some observations on the key scheduling of RC4 which we use as the basis for our algorithm. We also describe an efficient implementation of this algorithm and an empirical study of the success rate and running time of the algorithm for different key-sizes. This work was published in the proceedings of FSE 2008 [11]. It is a joint work with Prof. Eli Biham.

3.1 Introduction

The stream cipher RC4 was designed by Ron Rivest, and was first intro- duced in 1987 as a proprietary software of RSA DSI. The details remained secret until 1994, when they were anonymously published on an internet newsgroup [3]. RSA DSI did not confirm that the published algorithm is in fact the RC4 algorithm, but experimental tests showed that it produces the same outputs as the RC4 software. More than twenty-five years after its release, RC4 is still the most widely

53

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 used software stream cipher in the world. Among other uses, it is used to protect internet traffic as part of the SSL (Secure Socket Layer) and TLS (Transport Layer Security [25]) protocols, and to protect wireless networks as part of the WEP (Wired Equivalent Privacy) and WPA (Wi-Fi Protected Access) protocols. The state of RC4 consists of a permutation S of the numbers 0,...,N −1, and two indices i, j ∈ {0,...,N − 1}, where N = 256. RC4 is comprised of two algorithms: the Key Scheduling Algorithm (KSA), which uses the secret key to create a pseudo-random initial state, and the Pseudo Random Generation Algorithm (PRGA), which generates the pseudo-random stream.

3.1.1 Previous Attacks

Most attacks on RC4 can be categorized as distinguishing attacks or key- retrieval attacks. Distinguishing attacks try to distinguish between an out- put stream of RC4 and a random stream, and are usually based on weak- nesses of the PRGA. Key recovery attacks recover the secret key, and are usually based on weaknesses of the KSA. In 1994, immediately after the RC4 algorithm was leaked, Finney [26] showed a class of states that RC4 can never enter. This class consists of states satisfying j = i+1 and S[i+1] = 1. RC4 preserves the class of Finney states by transferring Finney states to Finney states, and non-Finney states to non-Finney states. Since the initial state (the output of the KSA) is not a Finney state (in the initial state i = j = 0) then RC4 can never enter these states. Biham et. al. [13] show how to use Finney states with fault analysis in order to attack RC4. Knudsen et al. [46] use a backtracking algorithm to mount a known plaintext attack on RC4. They guess the values of the internal state, and simulate the generation process. Whenever the output does not agree with the real output, they backtrack and guess another value. Goli´c[32] describes a linear statistical weakness of RC4 caused by a pos- itive correlation between the second binary derivative of the least significant bit and 1, and uses it to mount a . Fluhrer and McGrew [28] show a correlation between consecutive output bytes, and introduce the notion of k-fortuitous states (classes of states de- fined by the values of i, j, and only k permutation values, which can predict

54

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 the outputs of the next k iterations of the PRGA), and build a distinguisher based on that correlation. Mantin and Shamir generalize the notion of fortuitous states and define b-predictive k-states (states with k known permutation values which predict only b output words, for b ≤ k) and k-profitable states, which are classes of states in which the index j behaves in the same way for k steps of the PRGA. The predictive states cause certain output sequences to appear more often than expected in a random sequence, thus they are helpful in mounting a distinguishing attack on RC4. Mantin and Shamir [50] also show that the second word of the output is slightly more probable to be 0 than any other value. Using this bias they are able to build a prefix distinguisher for RC4, based on only about N short streams. In 2005 Mantin [49] observed that some fortuitous states return to their initial state after the index i leaves them. These states have a chance to remain the same even after a full cycle of N steps, and the same output of the state may be observed again. Mantin uses these states to predict, with high probability, future output bytes of the stream. In practical applications stream ciphers are used with a session key which is derived from a shared secret key and an Initial Value (IV, which is trans- mitted unencrypted). The derivation of the session key can be done in vari- ous ways such as concatenating, XORing, or hashing (in WEP, for instance, the secret key is concatenated after the IV). Many works try to exploit weaknesses of a specific method for deriving the session key. Fluhrer, Mantin, and Shamir [27] have shown a chosen IV attack on the case where the IV precedes the secret key. Using the first output bytes of 60l chosen IVs (l is the length of the secret key), they recover the secret key with high probability. They also describe an attack on the case where the IV follows the secret key, which reveals significant information about the internal state just after l steps of the KSA, thus reducing the cost of exhaustive search significantly. On March 2007, Klein [43] (followed by Tews et. al. [79]) showed a statis- tical correlation between any output byte and the value of S[j] at the time of the output generation. They use this correlation to retrieve the entire secret key using the first bytes of the output streams of about 40,000 known IVs (for the cases the IV is concatenated either before or after the secret

55

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 key). Vaudenay and Vuagnoux [82] improve the attacks of [27, 43] on the case of WEP (where the IV is concatenated before the secret key). They present the VX attack, which uses the sum of the the key bytes to reduce the dependency between the other bytes of the key, such that the attack may work even if the data is insufficient to retrieve one of the bytes. Several recent works including Sepehrdad et. al. [72] and Gupta et. al [34] present algorithmic methods to search for correlations in the output of RC4. They reveal correlations that have not been known before and use them to mount new attacks and to improve previous attack on the cipher. On June 2000 Grosul and Wallach [33] analyzed families of RC4 keys which differ by only one byte, and showed correlations in the produced keystreams. Biham and Dunkelman [12] investigated keys which differ in two consecutive bytes, such that the sum of the different key bytes in each key is the same. They show that in in this case there is also a high correlation between the produced keystreams. In 2009 Matsui [53] showed a way to find collisions of short RC4 keys (as explained is Section 3.2.1, is it easy to find a collision if we allow one of the colliding keys to be 256-byte long). Matsui’s technique uses a similar analysis to the one used by [12]. Paul and Maitra [65] use biases in the first entries of the initial permu- tation to recover the secret key from the initial permutation. They use the first entries of the permutation to create equations which hold with certain probability. They guess some of the bytes of the secret key, and use the equations to retrieve the rest of the bytes. The success of their algorithm relies on the existence of sufficiently many correct equations. After the publication of the contributions in this chapter a few papers improved the results presented here. First Akg¨unet. al. [2] performed a more careful analysis of the KSA biases, which takes into account some of our remarks from Section 3.4.4 about the values of the index j during the KSA execution. They adapt our algorithm and the algorithm of [65] to use this new information and devise a more efficient state recovery algorithm. Khazaei and Meier [41] use the equations we show here but find the key bytes in a bit-by-bit approach (starting with the LSB’s of the key bytes and ending with the MSB’s).

56

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 3.1.2 Outline of Our Contribution

In this chapter we present methods that allow us to obtain significantly better results than the algorithm of [65]. A major observation considers the difference between pairs of equations instead of analyzing each equation separately. We show that the probability that the difference of a pair of equations is correct is much higher in most cases than the probabilities of each of the individual equations. Therefore, our algorithm can rely on many more equations and apply more thorough statistical techniques than the al- gorithm of [65]. We also show two filtering methods that allow us to identify that some of the individual equations (used in [65]) are probably incorrect by a simple comparison, and therefore, to discard these equations and all the differences derived from them. Similarly, we show filtering techniques that discard difference equations, and even correct some of these equations. We also show how to create alternative equations, which can replace the original equations in some cases and allow us to receive better statistical information when either the original equations are discarded or they lead to incorrect values. We combine these observations (and other observations that we discuss in this chapter) into a statistical algorithm that recovers the key with a much higher success rate than the one of [65]. Our Algorithm also works if some of the bytes of the initial permutation are missing or contain errors. Such scenarios are likely results of side channel attacks, as in [35]. In these cases, our algorithm can even be used to reconstruct the full correct initial permutation by finding the correct key and then using it to compute the correct values. Details of an efficient implementation of the data structures and internals of the algorithm are also discussed. The algorithm we propose retrieves one linear combination of the key bytes at a time. In each step, the algorithm applies statistical considerations to choose the subset of key bytes participating in the linear combination and the value which have the highest probability to be correct. If this choice turns out to be incorrect, other probable choices may be considered. We propose ways to discover incorrect choices even before the entire key is recovered (i.e., before it can be tested by running the KSA), and thus we are able to save valuable computation time that does not lead to the correct key. Our algorithm is much faster than the algorithm of [65], and has much

57

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 better success rates for the same computation time. For example, for 40-bit keys and 86% success rate, our algorithm is about 10000 times faster than the algorithm of [65]. Additionally, even if the algorithm fails to retrieve the full key, it can retrieve partial information about the key. For example, for 128-bit keys it can give a suggestion for the sum of all the key bytes which has a probability of 23.09% to be correct, or give four suggestions such that with a probability of 41.44% the correct value of the sum of all the key bytes is one of the four.

3.1.3 Organization of the Chapter

This chapter is organized as follows: Section 3.2 describes the RC4 algo- rithms, gives several observations about the keys of RC4, and defines nota- tions which will be used throughout this chapter. Section 3.3 presents the bias of the first bytes of the initial permutation, and describes the attack of [65], which uses these biases to retrieve the secret key. Section 3.4 gath- ers several observations which are the building blocks of our key retrieval algorithm, and have enabled us to improve the result of [65]. Section 3.5 takes these building blocks and uses them together to describe the detailed algorithm. In Section 3.6 we give some comments and observations about an efficient implementation to our algorithm. Finally, Section 3.7 summa- rizes the chapter, presents the performance of our algorithm and discusses its advantages over the algorithm of [65].

3.2 The RC4 Stream Cipher

The internal state of RC4 consists of a permutation S of the numbers 0,...,N − 1, and two indices i, j ∈ {0,...,N − 1}. The permutation S and the index j form the secret part of the state, while the index i is public and its value at any stage of the stream generation is widely( known.) In RC4 8 · ≈ N = 256, and thus the secret internal state has log2 2 256! 1692 bits of information. Together with the public value of i there are about 1700 bits of information in the internal state. Variants with other values of N have also been analyzed in the cryptographic literature. RC4 consists of two algorithms: The Key Scheduling Algorithm (KSA), and the Pseudo Random Generation Algorithm (PRGA), both are presented

58

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 KSA(K) PRGA(S) Initialization: Initialization: For i = 0 to N − 1 i ← 0 S[i] = i j ← 0 j ← 0 Generation loop: Scrambling: i ← i + 1 For i = 0 to N − 1 j ← j + S[i] j ← j + S[i] + K[i mod l] Swap(S[i],S[j]) Swap(S[i],S[j]) Output S[S[i] + S[j]]

Algorithm 3.1: The RC4 Algorithms

in Algorithm 3.1. All additions in RC4 are performed modulo N. Therefore, in this chapter, additions are performed modulo 256, unless explicitly stated otherwise.

The KSA takes an l-byte secret key, K, and generates a pseudo-random initial permutation S. The l is bounded by N bytes, but is usually in the range of 5–16 bytes (40–128 bits). The bytes of the secret key are denoted by K[0],...,K[l − 1]. If l < N the key is repeated to form an N-byte key. The KSA initializes S to be the identity permutation, and then performs N swaps between the elements of S, which are determined by the secret key and the content of S. Note that because i is incremented by one at each step, each element of S is swapped at least once during the run of the KSA (possibly with itself). On average each element of S is swapped twice.

The PRGA generates the pseudo-random stream, and updates the in- ternal state of the cipher. In each iteration of the PRGA, the values of the indices are updated, two elements of S are swapped, and a byte of output is generated. During the generation of N consecutive output bytes each element of S is swapped at least once (possibly with itself), and twice on average.

59

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 3.2.1 Properties of RC4 Keys

There are 28·256 = 22048 possible keys (every key shorter than 256 bytes has an equivalent 256-byte key) but only about 21684 possible initial states of RC4. Therefore, every initial permutation has on average about 2364 256- byte keys which create it. Each initial permutation Sˆ has at least one, easy to find, 256-byte key: Since every byte of the key is used only once during the KSA, the key bytes are chosen one by one, where K[i] is chosen to set j to be the current location of Sˆ[i] (which satifies, by this construction j > i). Thus, the Swap(S[i],S[j]) operation on iteration i swaps the value Sˆ[i] = S[j] with S[i]. The value Sˆ[i] does not participate in later swaps, and thus remains there until the end of the KSA. The number of initial permutations which can be created by short keys, however, is much smaller. For example, the number of 16-byte keys is only 2128, and the total number of keys bounded by 210 bytes is about 28·210 = 21680, which is smaller than the total number of permutations.

3.2.2 Notations

We use the notation K[a, b] to denote the sum of the key bytes K[a] and K[b], i.e., K[a, b] , K[a mod l] + K[b mod l] mod N.

Similarly, K[a, b, c], K[a, b, c, d], etc., are the sums of the corresponding key bytes for any number of comma-separated arguments. We use the notation K[a . . . b] to denote the sum of the key bytes in the range a, a + 1, . . . , b, i.e.,

∑b K[a . . . b] , K[r mod l] mod N. r=a

We also use combinations of the above, for instance:

∑c K[a, b . . . c] , K[a mod l] + K[r mod l], r=b ∑b ∑d K[a . . . b, c . . . d] , K[r mod l] + K[r mod l]. r=a r=c

We use the notations Sr and jr to denote the values of the permutation S

60

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 and the index j after r iterations of the loop of the KSA have been executed.

The initial value of j is j0 = 0 and its value at the end of the KSA is jN . S0 is the identity permutation, and SN is the result of the KSA (i.e., the initial permutation that we study in this chapter). For clarity, from now on

the notation S (without an index) denotes the initial permutation SN .

3.3 Previous Techniques

In 1995 Roos [70] noticed that some of the bytes of the initial permutation have a bias towards a linear combination of the secret key bytes. Theo- rem 3.1 describes this bias (the theorem is taken from [70], but is adapted to our notations).

Theorem 3.1 The most likely value for S[i] at the end of the KSA is:

i(i + 1) S[i] = K[0 . . . i] + mod N. (3.1) 2 Only experimental results for the probabilities of the biases in Theo- rem 3.1 are provided in [70]. In 2007 Paul and Maitra [65] supplied an analytic formula for this probability, which corroborated the results given by [70]. Theorem 3.2 presents their result.

Theorem 3.2 (Corollary 2 of [65]) Assume that during the KSA the in- dex j takes its values uniformly at random from {0, 1,...,N − 1}. Then,

( ) ( ) ( ) i(i+1) +N i(i + 1) N − i N − 1 2 1 P S[i] = K[0 . . . i] + ≥ · + 2 N N N .

For any fixed value of i, the bias described by (3.1) is the result of a combination of three events that occur with high probability:

1. Sr[r] = r for r ∈ {0, . . . , i} (i.e., the value of S[r] was not swapped before the r-th iteration).

2. Si[ji+1] = ji+1.

3. jr ≠ i for r ∈ {i + 1,...,N − 1}.

61

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 i 0 1 2 3 4 5 6 7 Prob. .371 .368 .364 .358 .351 .343 .334 .324 i 8 9 10 11 12 13 14 15 Prob. .313 .301 .288 .275 .262 .248 .234 .220 i 16 17 18 19 20 21 22 23 Prob. .206 .192 .179 .165 .153 .140 .129 .117 i 24 25 26 27 28 29 30 31 Prob. .107 .097 .087 .079 .071 .063 .056 .050 i 32 33 34 35 36 37 38 39 Prob. .045 .039 .035 .031 .027 .024 .021 .019 i 40 41 42 43 44 45 46 47 Prob. .016 .015 .013 .011 .010 .009 .008 .008

Table 3.1: The Probabilities Given by Theorem 3.2

If the first event occurs then the value ji+1 is affected only by the key bytes and constant values:

∑i ∑i i (i + 1) j = (K[r] + S [r]) = (K[r] + r) = K[0 . . . i] + . i+1 r 2 r=0 r=0

If the second event occurs, then after i+1 iteration of the KSA Si+1[i] = ji+1. The third event ensures that the index j does not point to S[i] again, and therefore S[i] is not swapped again in later iterations of the KSA. If all three events occur then (3.1) holds since

∑i i (i + 1) SN [i] = Si+1[i] = ji+1 = (K[r] + Sr[r]) = K[0 . . . i] + . ↑ ↑ ↑ 2 3 2 r=0 1

The probabilities derived from Theorem 3.2 for the biases of the first 48 entries of S (S[0] ...S[47]) are given in Table 3.1 (also taken from [65]). It can be seen that this probability is about 0.371 for i = 0, and it decreases as the value of i increases. For i = 47 this probability is only 0.008, and for further entries it becomes too low to be used by the algorithm (the a-priori probability that an entry equals any random value is 1/256 ≈ 0.0039). The cause for such a decrease in the bias is that the first of the aforementioned events is less likely to occur for high values of i, as there are more constraints

62

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 on entries in S. Given an initial permutation S (the result of the KSA), each of its entries can be used to derive a linear equation of the key bytes, which holds with

the probability given by Theorem 3.2. Let Ci be defined as

i · (i + 1) C = S[i] − . i 2 Using (3.1) the i’th equation (derived from the entry S[i]) becomes:

K[0 . . . i] = Ci. (3.2)

The RecoverKey algorithm of [65] uses these equations in order to re- trieve the secret key of RC4. Let n and m be parameters of the algorithm, and recall that l is the length of the secret key in bytes. For each com- bination of m independent equations out of the first n equations of (3.2), the algorithm exhaustively guesses the value of l − m key bytes, and solves the m equations to retrieve the rest of the key bytes. The success of the RecoverKey algorithm relies on the existence of m correct and linearly in- dependent equations among the first n equations. The success probabilities and the running time of the RecoverKey algorithm for different key sizes and parameters, as given by [65], are presented in Table 3.2.1

3.4 Our Observations

Several important observations allow us to suggest an improved algorithm for retrieving the key from the initial permutation.

1We observe that the formula for the complexity given in [65] is incorrect, and the actual values should be considerably higher than the ones cited in Table 3.2. We expect that the correct values are between 25 and 28 times higher. The source for the mistake is two-fold: the KSA is considered as taking one unit of time, and the complexity analysis is based on an inefficient implementation of their algorithm. Given a set of l equations, their implementation solves the set of equations separately for every guess of the remaining l − m variables, while a more efficient implementation would solve them only once, and only then guess the values of the remaining bytes. Our complexities are even lower than the complexities given in [65], and are much lower than the correct complexities. We also observe that the complexities given by [65] for the case of 16-byte keys do not match the formula they publish (marked by ∗ in Table 3.2). The values according to their formula should be 282, 279, 273 and 269 rather than 260, 263, 264 and 264, respectively. Their mistake is possibly due to an overflow in 64-bit variables.

63

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 l n m Time PSuccess 5 16 5 218 0.250 5 24 5 221 0.385 8 16 6 234 0.273 8 20 7 229 0.158 8 40 8 233 0.092 10 16 7 243 0.166 10 24 8 240 0.162 10 48 9 243 0.107 12 24 8 258 0.241 12 24 9 250 0.116 16 24 9 260 ∗ 0.185 16 32 10 263 ∗ 0.160 16 32 11 264 ∗ 0.086 16 40 12 264 ∗ 0.050 ∗ Incorrect entries — see footnote 1.

Table 3.2: Success Probabilities and Running Time of the RecoverKey Al- gorithm of [65]

3.4.1 Subtracting Equations

Let i2 > i1. As we expect that K[0 . . . i1] = Ci1 and K[0 . . . i2] = Ci2 , we also expect that

− − K[0 . . . i2] K[0 . . . i1] = K[i1 + 1 . . . i2] = Ci2 Ci1 (3.3)

holds with the product of the probabilities of the two separate equations. However, we observe that this probability is in fact much higher. If the following three events occur then (3.3) holds (compare with the three events described in Section 3.3):

1. Sr[r] = r for r ∈ {i1 +1, . . . , i2} (i.e., the value of S[r] was not swapped before the r-th iteration).

2. Si1 [ji1+1] = ji1+1 and Si2 [ji2+1] = ji2+1.

3. jr ≠ i1 for r ∈ {i1+1,...,N −1}, and jr ≠ i2 for r ∈ {i2+1,...,N −1}.

64

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 If the first event occurs then the index j is affected in iterations i1 + 1 through i2 only by the key bytes and constant values:

∑i2 ∑i2 − ji2+1 ji1+1 = (K[r] + Sr[r]) = K[i1 + 1 . . . i2] + r r=i1+1 r=i1+1

If the second event occurs, then after i1 + 1 iteration of the KSA Si1+1[i1] =

ji1+1, and after i2 + 1 iteration Si2+1[i2] = ji2+1. The third event ensures that the index j does not point to S[i1] or S[i2] again, and therefore S[i1] and S[i2] are not swapped again in later iterations. If all three events occur then (3.3) holds since

− − − SN [i2] SN [i1] = Si2+1[i2] Si1+1[i1] = ji2+1 ji1+1 = ↑ ↑ 3 2 ∑i2 ∑i2 = (K[r] + Sr[r]) = K[i1 + 1 . . . i2] + r = ↑ r=i1+1 1 r=i1+1 i (i + 1) i (i + 1) = K[i + 1 . . . i ] + 2 2 − 1 1 , 1 2 2 2 and therefore − K[i1 + 1 . . . i2] = Ci2 Ci1 . Theorem 3.3 states the exact bias of such differences.

Theorem 3.3 Assume that during the KSA the index j takes its values

uniformly at random from {0, 1,...,N −1}, and let 0 ≤ i1 < i2 < N. Then, P (C − C = K[i + 1 . . . i ]) ≥ [( i2 ) i1( 1 ) 2( ) ( )] 2 − i N−i −1 ∏ − − − i2 · − i2 i1+2 1 · − 2 2 · i1 i2 1 − r+2 1 1 N 1 N 1 N r=0 1 N + N .

The proof of Theorem 3.3 is based on the discussion which precedes it, and is similar to the proof of Theorem 3.2 given in [65]. The proof is based on the analysis of the probabilities that the values of j throughout the KSA are such that the three events described earlier hold. As a result of Theorem 3.3 our algorithm has many more equations to rely on. We are able to use the difference equations which have high enough prob- ability, and furthermore, we can now use data which was unusable by the algorithm of [65]. For instance, according to Theorem 3.2, the probability

that K[0 ... 50] = C50 is 0.0059, and the probability that K[0 ... 52] = C52

65

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 is 0.0052. Both equations are practically useless by themselves, but accord-

ing to Theorem 3.3 the probability that K[51 ... 52] = C52 − C50 is 0.0624, which is more than ten times the probabilities of the individual equations. Moreover, the biases given by Theorem 3.2 and used by the RecoverKey

algorithm of [65] are dependent. If Sr[r] ≠ r for some r, then the first event described in Section 3.3 is not expected to hold for any i > r, and we do not expect equations of the form (3.2) to hold for such values of

i. However, equations of the form (3.3) for i2 > i1 > r may still hold under these conditions, allowing us to handle initial permutations which the RecoverKey algorithm cannot.

3.4.2 Using Counting Methods

Since every byte of the secret key is used more than once, we can obtain sev- eral equations for the same sum of key bytes. For example, all the following equations:

C1 = K[0 ... 1]

Cl+1 − Cl−1 = K[l . . . l + 1] = K[0 ... 1]

C2l+1 − C2l−1 = K[2l . . . 2l + 1] = K[0 ... 1]

suggest values for K[0] + K[1]. If we have sufficiently many suggestions for the same sum of key bytes, the correct value of the sum is expected to appear more frequently than other values. We can assign a weight to each sugges- tion, use counters to sum the weights for each possible candidate, and select the candidate which has the highest weight. We may assign the same weight to all the suggestions (majority vote) or a different weight for each sugges- tion (e.g., according to its probability to hold, as given by Theorems 3.2 and 3.3). We demonstrate the use of counters using the previous example.

Assume that C1 = 178, Cl+1 − Cl−1 = 210 and C2l+1 − C2l−1 = 178 are the only suggestions for the value of K[0 ... 1], and assume that all three sug- gestions have equal weights of one. Under these conditions the value of the counter of 178 will be two, the value of the counter of 210 will be one, and all other counters will have a value of zero. We guess that K[0 ... 1] = 178, since it has the highest total weight of suggestions. A simple algorithm to retrieve the full key would be to look at all the

66

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 suggestions for each of the key bytes, and choose the most frequent value for each one. Unfortunately, some of the bytes retrieved by this sort of algorithm are expected to be incorrect. We can run the KSA with the retrieved key to test its correctness, but if the test fails (the KSA does not produce the expected initial permutation), we get no clue to where the mistake is. However, we observe that we do not need to limit ourselves to a single key byte, but rather consider all candidates for all possible sums of key bytes suggested by the equations, and select the combination with the highest total weight. Once we fix the chosen value for the first sum, we can continue to another, ordered by the weight, until we have the entire key. There is no need to consider sequences which are linearly dependent in prior sums. For example, if we have already fixed the values of K[0] + K[1] and K[0], there is no need to consider suggestions for K[1]. Therefore, we need to set the values of exactly l sums in order to retrieve the full key. Moreover, each value we select allows us to substantially reduce the number of sums we need to consider for the next step, as it allows us to merge the counters of some of the sums (for example, if we know that K[0] + K[1] = 50 then we can treat suggestions for K[0] = 20 together with K[1] = 30). A natural extension to this approach is trying also the value with the second highest counter, in cases where the highest counter is found wrong. More generally, once a value is found wrong, or a selection of a sequence is found unsatisfactory, backtracking is performed. We denote the number of

attempts to be tried on the t-th guess by λt, for 0 ≤ t < l. This method can be thought of as using a DFS algorithm to search an ordered tree of

height l + 1, where the degree of vertices on the t-th level is λt and every leaf represents a key.

3.4.3 The Sum of the Key Bytes

Denote the sum of all l key bytes by s, i.e.,

∑l−1 s = K[0 . . . l − 1] = K[r]. r=0

The value of s is especially useful. The linear equations derived from the initial permutation give sums of sequences of consecutive key bytes. If we know the value of s, all the suggestions for sequences longer than l bytes

67

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 can be reduced to suggestions for sequences which are shorter than l bytes. For example, from the following equations:

C1 = K[0 ... 1]

Cl+1 = K[0 . . . l + 1] = s + K[0 ... 1]

C2l+1 = K[0 ... 2l + 1] = 2s + K[0 ... 1]

we get three suggestions C1, Cl+1 − s, and C2l+1 − 2s for the value of K[0] + K[1].

After such a reduction is performed, all the remaining suggestions reduce

to sums of fewer than l key bytes, of the form K[i1 . . . i2], where 0 ≤ i1 < l and i1 ≤ i2 < i1 + l − 1. Thus, there are only l · (l − 1) possible sequences of key bytes to consider. Furthermore, the knowledge of s allows us to unify every two sequences which sum up to K[0 . . . l − 1] = s (as described in Section 3.4.2), thus reducing the number of sequences to consider to only l · (l − 1)/2 (without loss of generality, the last byte of the key, K[l − 1], does not appear in the sequences we consider, so each sum we consider is of the

form K[i1 . . . i2], for 0 ≤ i1 ≤ i2 < l−1). In turn, there are more suggestions for each of those unified sequences than there were for each original sequence.

Fortunately, besides being the most important key byte sequence, s is also the easiest sequence to retrieve, as it has the largest number of sug- gestions. Any sum of l consecutive bytes, of the form K[i + 1 . . . i + l] =

Ci+l − Ci, for any i, yields a suggestion for s. In a similar way, we can consider sequences of 2l bytes for suggestions for 2s, and we can continue to consider sequences of αl consecutive bytes, for any integer α. However, for common key lengths, the probability of a correct sum with α > 2 is too low.

As discussed in Section 3.4.2, we may want to consider also the second highest counter and perform backtracking. Our experimental results for the success probabilities of retrieving s are presented in Table 3.3. For each of the key lengths in the table, we give the probability that the value of s is the value with the highest counter, second highest, third highest, or fourth highest. The data in the table was compiled by testing 1,000,000,000 random keys for each of the key lengths, and considering all suggestions with a probability higher than 0.01.

68

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 Key Length Highest Second Third Fourth Counter Highest Highest Highest 5 0.8022 0.0618 0.0324 0.0195 8 0.5428 0.1373 0.0572 0.0325 10 0.4179 0.1604 0.0550 0.0332 12 0.3335 0.1618 0.0486 0.0287 16 0.2309 0.1224 0.0371 0.0240

Table 3.3: Probabilities that s is Among the Four Highest Counters

3.4.4 Adjusting Weights and Correcting Equations During the run of the algorithm, we can improve the accuracy of our guesses based on previous guesses. Looking at all suggestions for sequences we have already established, we can identify exactly which of them are correct and which are not, and use this knowledge to gain information about interme- diate values of j and S during the execution of the KSA. We assume that − if a suggestion Ci2 Ci1 for K[i1 + 1 . . . i2] is correct, then all three events described in Section 3.4.1 occur with a relatively high probability. Namely, we assume that:

• Sr[r] = r for i1 + 1 ≤ r ≤ i2 (follows from event 1 from Section 3.4.1). • S[i1] = ji1+1 and S[i2] = ji2+1 (together, follow from events 2 and 3 from Section 3.4.1.

This information can be used to better assess the probabilities of other − suggestions. When considering a suggestion Ci4 Ci3 for a sum of key bytes K[i3 + 1 . . . i4] which is still unknown, if we have an indication that one of the three events described in Section 3.4.1 is more likely to have occurred than predicted by its a-priori probability, the weight associated with the suggestion can be increased. Example 3.1 demonstrates a case in which such information is helpful.

Example 3.1 Assume that the following three suggestions are correct:

1. K[0 ... 9] = C9,

2. K[12 ... 16] = C16 − C11,

69

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 3. K[7 ... 14] = C14 − C6, and assume that for each of them the three events described in Section 3.4.1 hold during the execution of the KSA. From the first suggestion we conclude

that j10 = S[9], from the second suggestions we learn that j12 = S[11], and the third suggestion teaches us that Sr[r] = r for 7 ≤ r ≤ 14 (and in particular for r=10 and r=11). It can be inferred from the last three observations and according to the explanation in Section 3.4.1 that under

these assumptions K[10 ... 11] = C11 − C9. Since the probabilities that the assumptions related to K[10 ... 11] = C11 − C9 hold are larger than the a- priory probability (due to the relation to the other suggestions, which are known to be correct), the probability that this suggestion for K[10 ... 11] is correct is increased.

Similarly, we can gain further information from the knowledge that sug- gestions are incorrect. Consider values of r for which there are many incor- − rect suggestions that involve Cr, either with preceding Ci1 (Cr Ci1 , i1 < r) − or with succeeding Ci2 (Ci2 Cr, i2 > r). In such cases we may assume that SN [r] is not the correct value of jr+1, and thus all other suggestions involving Cr are also incorrect. Consider r’s for which there are many incorrect suggestions that pass − ≤ over r, i.e., of the form Ci2 Ci1 where i1 < r i2. In this case, we may assume that during the KSA Sr[r] ≠ r, and thus all other suggestions that pass over r are also incorrect. All suggestions that pass over r for which

Sr[r] ≠ r is the only event (of the three events described in Section 3.4.1) − − that does not hold, must have the same error ∆ = Ci2 Ci1 K[i1 + 1 . . . i2] (which is expected to be ∆ = Sr[r] − r). Thus, if we find that for some r several suggestions that pass over r have the same error ∆, we can correct other suggestions by ∆.

3.4.5 Refining the Set of Equations We observe that some of the equations can be discarded based on the values of the initial permutation, and some others have alternatives. This obser- vation is also applicable to the equations used by the algorithm of [65], and could have improved its running time and success probabilities. If S[i′] < i′ for some i′, then the equation derived from S[i′] should be discarded, since x = S[i′] is not expected to satisfy (3.1). In this case, even

70

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 i j

S ... x i' x (a) j i

S ... x i' x (b)

Figure 3.1: Two Probable Alternatives to the Positions of the Indices i and j Right Before the Assignment S[i′] ← x Occurred

if Event 1 and Event 3 (of the three events described in Section 3.3) hold, it is clear that Event 2 does not, as the number x has already been swapped in a previous iteration (when i = x), and is not likely to be in location S[i′] after i′ iterations of the KSA.

If S[i′] > i′ for some i′, then an alternative equation may be derived from x = S[i′], in addition to the equation derived by the algorithm of [65]. The equations used by [65] assume that the assignment S[i′] ← x occurred with i = i′, and j = S[j] = x (Figure 3.1(a)). However, in this case, another likely possibility is that the assignment S[i′] ← x occurred with i = S[i] = x, ′ ′ and j = i (Figure 3.1(b)). In the latter case, jx+1 = i , and the following equation holds with a high probability:

x(x + 1) i′ = K[0 . . . x] + . 2 It can be shown that this equation holds with a probability slightly higher than the probability given by Theorem 3.2 for i = x. We now have two likely ′ possibilities for the value of jx+1, i and S[x], which yield two alternative equations. Let C¯x be defined as:

x(x + 1) C¯ = S−1[x] − . x 2

71

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 Using this notation, the proposed alternative equation is

K[0 . . . x] = C¯x .

Every time Cx is used to create a suggestion (by subtracting equations), the value C¯x (if exists) can replace it to create an alternative suggestion for the ¯ − same sum of key bytes. It can be shown that the probabilities that Cx2 Cx1 , − ¯ ¯ − ¯ Cx2 Cx1 and Cx2 Cx1 hold are slightly higher than the probability that − Cx2 Cx1 holds (for any x1 < x2). Note that we do not expect that many equations have such alternatives, because under the assumption that j takes

its values uniformly at random, it is much more likely that ji+1 > i for small values of i. Given the two alternatives it is possible to run the algorithm twice, while on each run consider only suggestions derived from the set of equations with one of the two alternatives. However, due to our use of counting methods, both equations can be added to the same set of equations, such that suggestions derived from both alternatives are counted together, in the same counting process.

3.4.6 Heuristic Pruning of the Search

In Section 3.4.2 we have described the backtracking approach to finding the key as a DFS search on an ordered tree. Once a guessed value is found wrong (the keys obtained from it fail to create the requested permutation) we go back and try the other likely guesses. Naturally, by trying more guesses we increase our chances to successfully retrieve the key, but we increase the computation time as well. If we can identify an internal node as representing a wrong guess, we can skip the search of the entire subtree rooted from it, and thus reduce the computation time. Section 3.4.2 also describes the merging of counters of different sequences according to previous guesses, which allows us to consider fewer key se- quences, with more suggestions for each. If the guesses that we have already made are correct, we expect that after such a merge the value of the highest counter is significantly higher than other counters. If the former guesses are incorrect, we do not expect to observe such behavior, as the counters of different sequences will be merged in a wrong way.

Let µt for 0 ≤ t < l be a threshold for the t-th guess. When considering candidates for the t-th guess, we only consider the ones with a counter

72

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 value of at least µt. The optimal values of the thresholds can be obtained empirically, and depend on the key length (l), the weights given to the

suggestions, and the number of attempts for each guess (λt’s). Even if the use of these thresholds may cause correct guesses to be aborted, the overall success probability may still increase, since the saved computation time can be used to test more guesses.

3.5 The Algorithm

The cornerstones of our method have been laid in Section 3.4. In this section we gather all our previous observations, and formulate them as an algorithm to retrieve the secret key from the initial permutation S. The FIND KEY algorithm (presented in Algorithm 3.2) starts the search by finding s, and calls the recursive algorithm REC SUBROUTINE (Algorithem 3.3). Each recursive call guesses another value for a sum of key bytes, as described in the previous section.

The optimal values of the parameters λ0, . . . , λl−1, µ1, . . . , µl−1 used by the algorithm and the weights it assigns to the different suggestions can be empirically estimated so that the success probability of the algorithm and/or the average running time are within a desired range.

3.6 Efficient Implementation

Recall that on each iteration of the algorithm some of the sums of the key bytes are already known (or guessed). The suggestions for the unknown sums are counted using a set of N counters, one counter for each possible value of that sum. In Section 3.4.2 we stated that according to the prior guesses, the suggestions for several sums of key bytes may be counted together (i.e., after a new guess is made, some of the counters may be merged with counters of other sums). This section describes an efficient way to discover which counters should be merged, and how to merge them. The known bytes induce an equivalence relation between the unknown sums of the key bytes. Two sums are in the same equivalence class if and only if the value of each of them can be computed from the value of the other and the values of known sums. We only need to keep a set of N counters for each equivalence class, as all suggestions for sums which are in the same

73

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 FIND KEY(S)

1. Build the equations: Compute the values of {Ci} and {C¯i}, for the indices i where they exist (described in Sections 3.3 and 3.4.5).

2. Sum the weights of suggestions for each of the N candidates for s (described in Section 3.4.3).

3. For x = 1 to λ0 do:

(a) Find a candidate for s with the highest counter, w0, which has not been checked yet, and set s = K[0 . . . l − 1] = w0.

(b) Mark the correct suggestions for s = w0, adjust weights and correct remaining suggestions accordingly (described in Sec- tion 3.4.4).

(c) Initialize N counters for each sequence of key bytes K[i1 . . . i2] such that 0 ≤ i1 ≤ i2 < l − 1, and sum the weights of suggestions for each of them (described in Sections 3.4.1, 3.4.2 and 3.4.3). (d) Call REC SUBROUTINE(1) to retrieve the rest of the key. If the correct key is found, return it.

4. Return FAIL.

Algorithm 3.2: The FIND KEY Algorithm

74

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 REC SUBROUTINE(t)

1. If t = l, extract the key from all the l guesses made so far, and verify it. If the key is correct, return it. Otherwise, return FAIL.

2. For y = 1 to λt do: (a) Find a combination of key sequence and a candidate for its sum, with the highest counter among the sum of sequences that has not already been guessed. Denote them by K[i1 . . . i2] and wt, respectively, and denote the value of that counter by h.

(b) If h < µt, return FAIL (described in Section 3.4.6).

(c) Set K[i1 . . . i2] = wt.

(d) Mark the correct suggestions for K[i1 . . . i2] = wt, adjust weights and correct remaining suggestions accordingly (described in Sec- tion 3.4.4). (e) Merge the counters which may be unified as a result of the guess from Step 2a (described in Section 3.4.2). (f) Call REC SUBROUTINE(t + 1). If the correct key is found, return it. Otherwise, cancel the most recent guess (revert any changes made during the current iteration, including the merging of the counters).

3. Return FAIL.

Algorithm 3.3: The Recursive REC SUBROUTINE Algorithm

75

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 equivalence class should be counted together. When we merge counters, we actually merge equivalence classes. We represent our knowledge about the values of the sums as linearly independent equations of the key bytes. After r key sums are guessed, there are r linear equations of the form

∑l−1 ai,jK[j] = bj, i=0

for 1 ≤ j ≤ r, where 0 ≤ ai,j < N. The equations are represented as a triangular system of equations, in which the leading coefficient (the first non- zero coefficient) of each equation is one. These r equations form a basis of a linear subspace of all the sums we already know. In this representation the

equivalence class of any sum of key bytes K[i1 . . . i2] can be found efficiently: We represent the sum as a linear equation of the key bytes, and apply the Gaussian elimination process, such that the system of equations is kept triangular, and the leading coefficient of each equation is one. Sums from the same equivalence class give the same result, as they all extend the space spanned by the r equations to the same larger space spanned by r + 1 equations. The resulting unique equation can be used as an identifier of the equivalence class. When the counters are merged after a guess of a new value, the same process is applied — we apply Gaussian elimination to the equation representing the current equivalence class in order to discover the equivalence class it belongs to on the next level, and merge the counters. Note that as a result of the Gaussian elimination process we also learn the exact linear mapping between the counters of the current equivalence classes, and the counters of the classes of the next step.

3.7 Discussion

In this chapter we presented an efficient algorithm for the recovery of the secret key from the initial state of RC4, using the first bytes of the permu- tation. The presented algorithm can also work if only some of the bytes of the initial permutation are known. In this case, suggestions are derived only from the known bytes, and the algorithm is only able to retrieve val- of sums of key bytes for which suggestions exist. However, as a result

76

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 Time of ∗ Key Length Time PSuccess Improved [65] [sec] 5 0.02 0.8640 366 8 0.60 0.4058 2900 10 1.46 0.0786 183 10 3.93 0.1290 2932 12 3.04 0.0124 100 12 7.43 0.0212 1000 16 278 0.0005 500 ∗ Our rough estimation for the time it would take an im- proved version of the algorithm of [65] achieve the same 1 PSuccess (see footnote ). The time of the algorithm of [65] is much slower.

Table 3.4: Empirical Results of The Proposed Attack

of the reduced number of suggestions the success rates are expected to be lower. The algorithm can also work if some of the bytes contain errors, as the correct values of the sums of key bytes are still expected to appear more frequently than others. Since changes to the internal state during the stream generation (PRGA) are reversible, our algorithm can also be applied given an internal state at any point of the stream generation phase. Like in [65], our algorithm is also

applicable given an intermediate state during the KSA, i.e., Si (i < N), instead of SN . We tested the running times and success probabilities of our algorithm for different key lengths, as summarized in Table 3.4. The tests were per- formed on a Pentium IV 3GHz CPU. The running times presented in the table are averaged over 10000 random keys. We have assigned a weight of two to suggestions with probability higher than 0.05, a weight of one to suggestions with probability between 0.008 and 0.05 and a weight of zero to

all other suggestions. The values of the parameters λ0, . . . , λl−1, µ1, . . . , µl−1 were chosen in an attempt to achieve the best possible success probability with a reasonable running time. As can be seen in the table, our algorithm is much faster than the one of [65] for the same success rate, and in par- ticular in the case of 5-byte keys, it is about 10000 times faster. Note that

77

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 with the same computation time, our algorithm achieves about four times the success rate compared to [65] in most presented cases. Another important advantage of our algorithm over the algorithm of [65] is that when the algorithm of [65] fails to retrieve the key, there is no way to know which of the equations are correct, nor is it possible to retrieve partial information about the key. However, in our algorithm, even if the algorithm fails to retrieve the full key, its first guesses are still likely to be correct, as those guesses are made based on counters with high values. This difference can be exemplified by comparing the success rates of obtaining the sum of key bytes s (Table 3.3) with the success rates of obtaining the entire key (Table 3.4).

78

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 Chapter 4

An Improvement of Linear Cryptanalysis with Addition Operations with Applications to FEAL-8X

This chapter presents improvements to the cryptanalysis of the block cipher FEAL. It focuses on a new technique that improves the linear cryptanalysis of ciphers that use the addition operation. Other attacks that are faster than exhaustive search are also presented. The contribution described in this chapter was accepted for publication in the proceedings of SAC 2014 [9]. This is a joint work with Prof. Eli Biham.

4.1 Introduction

FEAL [76] was introduced in 1987 as a fast encryption algorithm which combines the simplicity of software-based operations with an improved se- curity over prior designs. Over the years FEAL inspired the development of many cryptanalytic techniques, including differential and linear cryptanal- ysis [14, 52]. The best known attacks on FEAL required (until recently) a

The contribution described in this chapter was accepted for publication in the pro- ceedings of SAC 2014 [9].

79

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 few hundreds of chosen plaintexts [15] or 16 million known plaintexts [8, 54]. In CRYPTO 2012 Mitsuru Matsui announced a year-long challenge [54] for developing improved attacks on FEAL-8X [57], and an award which will be given to the best attack capable of recovering the key of given sets of known plaintexts with various amounts of data. The attack recover- ing the key using the smallest number of known plaintexts would be de- clared the winner. In the course of this year we developed an improved attack capable of recovering the key of FEAL-8X, and three weeks before the deadline we submitted our solution for the challenge set with a mil- lion known plaintexts, and were the first to submit a correct solution. A few days later another group submitted a solution for a smaller set of 215 known plaintexts. It took us another two weeks to finalize our program with all the additional tricks and to submit the solution for the set of 214 known plaintexts, which became the winning solution. The secret key is 5681891EEC34CE1241ED0F52C9C23F65. In this chapter we present the cryptanalytic attacks that we developed for this challenge, and the techniques that we used to improve linear cryptanal- ysis. We first describe a linear attack which uses a 6-round approximation and analyzes both the first and last rounds simultaneously, recovering 37 subkey bits in total. We then describe how running it a second time with a different approximation can reduce the number of required plaintexts and find 44 bits of the subkeys. We describe the rest of the steps needed in order to recover the remaining subkeys and show how the FEAL-8X key can be reconstructed from those subkeys. The above mentioned techniques can find the FEAL-8X key given 215 known plaintexts in about 26 hours on our computer. We then present our main contribution – a new partitioning method that can amplify the bias of a linear approximation of addition. The data is partitioned into two sets such that in one of the sets the bias of the linear approximation is stronger than it is when all the messages are considered. Interestingly, we cannot tell in advance which of the two sets is the one with the increased bias, and therefore we try both of them. The amplified bias allows us to reduce the number of plaintexts needed for the attack while keeping the analysis time per plaintext the same. Due to the smaller number of required plaintexts the attack time when using this method even decreases. Incorporating this technique with our previous methods allowed

80

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 us to find the key given 214 known plaintexts in about 14 hours. In the summary of this chapter (Section 4.7) we discuss the differences between our technique and Partitioning Cryptanalysis [37]. In addition to the practical attacks we also discuss attacks that can find the key with fewer plaintexts faster than exhaustive search. We describe an attack that can recover the key given 210 known plaintexts in time of 262 FEAL-8X encryptions. In addition, we describe attacks in which given only 11–21 known or chosen plaintexts the FEAL-8X key can be recovered with complexity about 280 and given 2 or 3 known plaintexts the FEAL-8X key can be recovered with complexity about 296. These attacks combine linear cryptanalysis and differential cryptanalysis with exhaustive search of many subkeys, as well as meet in the middle attacks. These attacks exploit the fact that the total size of the subkeys is not sufficiently larger than the size of the key. The structure of the chapter is as follows: In Section 4.2 we describe FEAL-8X, give two equivalent descriptions of the cipher, and define nota- tions. In Section 4.3 we describe the linear attack that recovers the key given 215 known plaintexts. In Section 4.4 we present the new partitioning method and how to recover the key given 214 known plaintexts. In Sec- tion 4.5 we extend the methods from the previous sections, and describe an attack on 210 known plaintexts faster than exhaustive search. Finally, in Section 4.6 we describe the attacks that require only a few known cipher- texts or a few chosen plaintexts. In Appendix 4.A we show how to find the key of FEAL-8X given the subkeys that are found by our attacks, and in Appendix 4.B we describe an efficient implementation of our attacks, that is able to save a factor of about 26 in the attack time.

4.2 The Cipher FEAL-8X

The block size of FEAL-8X is 64 bits and the key size is 128 bits. The key processing algorithm of FEAL-8X (outlined in Figure 4.6) takes the 128-bit key and generates 16 subkeys, denoted by K0–Kf, each of length 16 bits. This algorithm is described in more detail in Appendix 4.A.1. FEAL-8X is an 8-round . Before the first round the plain- text is mixed with a 64-bit whitening subkey (K89ab) which is followed by XORing the left half of the data into the right half. The inverse of this

81

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 P (K89,Kab)

K0 AF a K1 f0 f1 f2 f3 BF b K2 CF c Ki0 Ki1 K3 DF d w0 w1 w2 w3 K4 S1 EF e S K5 0 FF f S0 S1 K6 GF g F F F F K7 0 1 2 3 HF h

(Kcd,Kef) T Si(x, y) = ROL2(x + y + i (mod 256))

Figure 4.1: The outline of FEAL-8 and of the F -function

operation is performed after the last round, i.e., the left half of the data is XORed into the right half and the result is mixed with a 64-bit whitening key (Kcdef). In each round a function F is computed on the right half of the data and a 16-bit subkey (one of K0–K7), and the output is XORed into the left half. The two halves are then swapped.

The function F takes four bytes as input, and starts by XORing the first and last bytes into the two middle bytes, and then XORs the subkey into the same bytes. It then applies four S-boxes in the order described in Figure 4.1. Each S-box adds two bytes and an index (0 or 1) and rotates the output by two bits to the left. FEAL-8X and the F -function are outlined in Figure 4.1.

82

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 Subkeys of Equivalent description without Equivalent description without FEAL-8X whitening at the beginning whitening at the end K89ab 0 (K89 ⊕ Kcd ⊕ Kef, Kab ⊕ Kef) K0 EK0 = mw(K0,K89 ⊕ Kab) DK0 = mw(K0, Kcd) K1 EK1 = mw(K1,K89) DK1 = mw(K1, Kcd ⊕ Kef) K2 EK2 = mw(K2,K89 ⊕ Kab) DK2 = mw(K2, Kcd) K3 EK3 = mw(K3,K89) DK3 = mw(K3, Kcd ⊕ Kef) K4 EK4 = mw(K4,K89 ⊕ Kab) DK4 = mw(K4, Kcd) K5 EK5 = mw(K5,K89) DK5 = mw(K5, Kcd ⊕ Kef) K6 EK6 = mw(K6,K89 ⊕ Kab) DK6 = mw(K6, Kcd) K7 EK7 = mw(K7,K89) DK7 = mw(K7, Kcd ⊕ Kef) Kcdef (K89 ⊕ Kab ⊕ Kcd, Kab ⊕ Kef) 0

Table 4.1: The Subkeys of FEAL-8X and the Actual Subkeys of the Equiv- alent Descriptions

4.2.1 An Equivalent Description of FEAL-8X

In order to simplify the analysis we prefer to eliminate the whitening keys. This is possible on one end of the cipher by extending the size of the sub- keys to 32 bits in each round and by XORing the eliminated whitening key information into all the subkeys. We consider two equivalent descriptions of the cipher. In the first we eliminate the whitening at the beginning of the cipher, and in the second we eliminate the whitening at the end (this latter version is outlined in Figure 4.2). The 32-bit subkeys of the equiv- alent description are called actual subkeys. We call the actual subkeys of the version with eliminated whitening key at beginning encryption actual subkeys and denote them by EK0–EK7, while we call the actual subkeys of the version with eliminated whitening key at the end decryption actual subkeys, and denote them by DK0–DK7. To simplify the description we define the function

mw(X,Y ) = (Y0,Y0 ⊕ Y1 ⊕ X0,Y2 ⊕ Y3 ⊕ X1,Y3)

where X is a 16-bit value, Y is a 32-bit value, and X0,X1,Y0,Y1,Y2,Y3 are their individual bytes. Note that mw(X,Y ) is just the first part of the F -function before the S-boxes (see Figure 4.1). The mapping between the subkeys of all three descriptions of the cipher (the subkeys of FEAL and the two equivalent descriptions) is summarized in Table 4.1. In our attacks when we analyze the last rounds of the cipher we assume the whitening at the end is zero, and therefore retrieve the bits of the de- cryption actual subkeys DK. Similarly, when we analyze the first rounds of

83

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 P (DK89,DKab)

DK0 AF a DK1 f0 f1 f2 f3 BF b DK2 CF c DKi0 DKi1 DKi2 DKi3 DK3 DF d w0 w1 w2 w3 DK4 S1 EF e S DK5 0 FF f S0 S1 DK6 GF g F F F F DK7 0 1 2 3 HF h

0 T Si(x, y) = ROL2(x + y + i (mod 256))

Figure 4.2: Equivalent Description of FEAL-8X Without Whitening at the End

the cipher we retrieve the bits of the encryption actual subkeys EK. Note that since there is a linear relation between the subkeys of all three descriptions of FEAL it is possible to target actual subkeys of different descriptions in the same linear attack. For example, the attack presented in Section 4.3.2 targets both EK0 and DK7.

4.3 First Attack – Finding the Key Using 215 Known Plaintexts

In this section we describe a linear attack that requires 215 known plaintexts and finds the key in about 26 hours on a server with an Intel(R) Xeon(R) X5650 2.67GHz processor with 12 cores. We first describe a 6-round linear

84

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 approximation and then the basic attack which performs the analysis on both ends of the cipher simultaneously. We then describe how to use it with reduced number of plaintexts, and how to recover the rest of the actual subkeys and the full key.

4.3.1 The Linear Approximations

In [61, 4] eight 7-round linear approximations with a bias of about 2−9 were presented. The attack we present in this chapter uses two 6-round approx- imations with bias of about 2−6, which we got by truncating two of the 7-round approximations of [61, 4] by one round. These approximations are outlined in Figure 4.3 and Figure 4.4. and Approximation 2 is in Figure 4.4.

4.3.2 The Basic Attack

The attack we present targets both the encryption actual subkey of the first round (EK0), and the decryption actual subkey of the last round (DK7). The six-round linear approximation covers the six middle rounds of the cipher (rounds 1–6), while the first and last rounds are used for analysis. We found that when using Approximation 1 there are only 37 bits of EK0 and DK7 that affect the parity of the bits in the approximation: 22 bits in the last actual subkey (DK7, given by the mask 03 FF FF 0F), and 15 bits of the first actual subkey (EK0, given by the mask 00 7F 7F 00 and the parity of the two bits 00 80 80 00). The remaining 27 bits of EK0 and DK7 have no impact on the parity of the bits in the linear approximation of the six middle rounds. It is therefore that this basic attack finds the 37 bits of the two actual subkeys. The attack is described in Algorithm 4.1. We also observed that not all 37 bits of the subkeys have the same impact on the bias. While some bits completely throw off the observed bias if guessed incorrectly, others have only a minor impact. We can take advantage of this observation to reduce the running time of the attack by excluding a few such bits with a minor impact on the bias, and to search for them only when the rest of the bits are already known. For example, instead of guessing 15 bits of EK0 with a bias of about 2−6, we may guess only 13 bits (the 12 bits whose mask is 00 6E 7F 00, and the parity of the two bits 00 80 80 00) with a slightly lower expected bias of 2−6.5, and save a factor of 4 in computation time.

85

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 (1) λP = 00 01 05 04 04 03 10 04x

00 01 05 04 00 00 01 00 p = 1/2  2−1 x F x

04 03 11 04 00 01 05 04 p = 1/2  2−2 x F x

0 0 p = 1/2  2−1 F

04 03 11 04 00 01 05 04 p = 1/2  2−2 x F x

00 01 05 04 00 00 01 00 p = 1/2  2−1 x F x

04 03 10 04 10 11 55 54 p = 1/2  2−4 x F x

(1) λT = 04 03 10 04 10 10 50 50x

Figure 4.3: Approximation 1 – A six-round approximation with bias 2−6

86

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 (2) λP = 04 01 00 00 1D 00 04 00x

04 01 00 00 01 00 00 00 p = 1/2  2−1 x F x

1C 00 04 00 04 01 00 00 p = 1/2  2−2 x F x

0 0 p = 1/2  2−1 F

1C 00 04 00 04 01 00 00 p = 1/2  2−2 x F x

04 01 00 00 01 00 00 00 p = 1/2  2−1 x F x

1D 00 04 00 54 11 10 10 p = 1/2  2−4 x F x

(2) λT = 1D 00 04 00 50 10 10 10x

Figure 4.4: Approximation 2 – A six-round approximation with bias 2−6

87

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 1. For each of the 215 candidates for the 16 bits of EK0:

(a) For each of the 222 candidates for the 22 bits of DK7: i. For each known plaintext P,C: A. Decrypt C by one round using DK7. B. Encrypt one round of P using EK0. C. Compute the parity of the approximated bits. ii. Count the number of messages for which the linear approxi- mation holds and compute the bias.

2. The correct key is expected to be the one with the highest bias.

Algorithm 4.1: Basic Attack on FEAL-NX with 215 messages

Clearly, the more data we have at our disposal the more accurate the results are (since it is easier to detect the linear bias). If the available data is a lot larger than required in order to detect the bias then we have more freedom to exclude such minor-impact bits (as the measurement of the bias is only slightly inaccurate). As the number of known plaintexts decreases, the identification of the correct key becomes harder (as the bias is harder to detect), and in this case we usually cannot afford to reduce the bias in return for speeding up the attack.

4.3.3 Matching Subkeys from the Backward and Forward Di- rections

As noted above, the basic attack does not suffice to find the correct key using 215 known plaintexts. In this section we apply the basic attack twice: once in the forward direction, and once in the backward direction. We first generate a list L1 of the N (for some parameter N) keys which exhibit the highest bias according to Approximation 1 in the forward di- rection, as described in Section 4.3.2. Recall that for each such key we get 15 bits of the first encryption actual subkey EK0, and 22 bits of the last decryption actual subkey DK7. We now run the attack again in the backward direction, i.e., we use the

88

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 reverse of Approximation 1. In this run we guess 22 bits of EK0 and 15 bits of DK7. We generate a second list L2 of the N keys that exhibit the highest biases. There is an overlap of 15 bits between the bits we guess in EK0 in both runs, and similarly, an overlap of 15 bits in DK7. Seven bits of EK0 are available only in L2, and seven bits of DK7 are available only in L1. The correct value of these 30 overlapping bits is expected to be in both lists. In such a case, we can easily find the correct value of 30 + 7 + 7 = 44 bits of the actual subkeys as the (usually single) value that has a match in those 30 bits in both lists. As we noted earlier, some of the bits of the key only have a minor impact on the measured bias if they are guessed incorrectly. If we cannot find a match between an entry in L1 and an entry in L2, we can try looking for entries that have a low Hamming distance in the overlapping bits, and between these prefer entries that differ in bits that are known to have a minor impact on the bias. This is the most time-consuming part of our attack. When we ran it1 on the server mentioned above it found the 44 bits of the actual subkeys within 24 hours using 215 known plaintexts (12 hours for each call to the basic attack). The correct key bits were among the top N = 3200 keys in each list.

4.3.4 Retrieving the Rest of the Subkeys

In the previous section we found 44 bits of the actual subkeys. In this section we briefly describe additional steps for finding the rest of the bits of EK0 and DK7, as well as the rest of the actual subkeys. The steps are described in the order in which they are performed, as each step assumes knowledge of the subkey bits that are retrieved in the preceding steps.

Finding 8 additional bits of EK0 and DK7

This step is similar to the attack presented in Section 4.3.2, but uses Ap- proximation 2 instead of Approximation 1. Since the linear approximation is different, there are also different bits of the subkeys of Rounds 0 and 7 that affect the parity. The bits of EK0 that affect the parity are given

1With the implementation improvement described in Appendix 4.B.

89

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 by the mask 3F FF FF 00 and the bits of DK7 are given by the mask 0F FF FF 03. Most of those bits are already known, except for eight bits. The correct values of these remaining eight bits can be identified by stan- dard linear cryptanalysis techniques, similarly to the attack of Section 4.3.2. After this step is performed we know 26 bits in each of EK0, DK7, a total of 52 actual subkey bits.

Finding 4 additional bits of DK7 and 15 bits of DK6

At this point there are still 6 bits missing in the subkey DK7, which are difficult to retrieve by analyzing Round 7. We therefore move on to analyze Round 6 by using a shorter linear Approximation. We use the first five rounds of Approximation 1 with a bias of 2−3, and use it to cover rounds 1– 5. In order to compute the parity of the approximated bits in Round 5 we need to guess the values of four more bits of DK72, and 15 bits of DK6. After this step is performed we know a total of 30 bits of DK7 (given by the mask 7F FF FF 7F) and 15 bits of DK6 (given by the mask 00 7F 7F 00 and in addition the parity of bits 00 80 80 00).

Finding 7 additional bits of DK6

This step is similar to the previous step, but this time we use a 5-round approximation obtained from the last five rounds of Approximation 1, which covers Rounds 1–5. There are 22 bits in DK6 that affect the parity of this linear approximation. We already found 15 of them in the previous step, and we should now search for the remaining seven.

Finding 4 additional bits of DK6

We use a 5-round approximation comprised of the last five rounds of Ap- proximation 2, which covers Rounds 1–5 We can obtain four more bits of DK6, and get a total of 26 bits of DK6.

2The value of the two remaining bits of DK7 can only be determined when we analyze round 3. Until then those bits have only a linear effect on the parity of the approxima- tion, and therefore cannot be discovered by methods of linear cryptanalysis. The linear properties of those two bits are used later in the attacks of Section 4.6.

90

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 Finding the rest of the subkeys DK1–DK7

In a similar way, we can attack the rest of the rounds until we have all the actual subkeys DK1–DK7. Note that as we progress in the attack, analyz- ing each additional round becomes easier for two main reasons: First, we use shorter approximations with higher biases, which significantly decrease the chances of errors. Second, since the actual subkeys DK0, DK2, DK4 and DK6 have 16 bits in common (and similarly for DK1, DK3, DK5 and DK7) there are only 16 bits to retrieve in each of those actual subkeys once DK6 and DK7 are fully known.

Finding EK0–EK6

Once we finish recovering the decryption actual subkeys, we can repeat the entire process in the reverse direction in order to find the encryption actual subkeys EK0–EK6.3 These actual subkeys depend on the whitening key of the plaintext, and are needed in order to retrieve the FEAL-8X key.

Finding The Key Itself

Given DK1–DK7 and EK0–EK6, we apply the algorithm of Appendix 4.A and find the key within a fraction of a second.

4.4 Our Partitioning Technique – Finding The Key Using 214 Known Plaintexts

In this section we describe a new technique that can reduce the number of known plaintexts by a factor of 3.1 compared to the algorithm of Sec- tion 4.3.2. In this technique we partition the data into several sets, such that the bias of the approximation in some of them is higher than when measured across all the data, with a ratio that overcomes the smaller num- ber of messages in those sets. Therefore, fewer messages are required in order to detect the amplified bias. This technique can be used in other addition-based ciphers to gain a similar improvement.

3We note that instead of searching for EK0–EK6, we can continue the analysis in the decryption direction and retrieve the the actual subkey DK0 and the whitening key. Once all the decryption actual subkeys DK0–DK7 and the whitening key are known, the encryption actual subkeys EK0–EK7 can easily be computed (see Table 4.1).

91

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 10x 11x 55x 54x

01x 01x

DK60 DK61 DK62 DK63

11x 55x 11x S1

44x 44x 46x S0

01x 01x 02x 11x 01x 01x S0 S1

04x 03x 10x 04x Figure 4.5: The Approximation of the Seventh Round

4.4.1 A Simplified Example

We apply this technique to Round 6 of the cipher in the inner loop of the algorithm, after the output of the last F -function is already (partially) com- puted. It is therefore that most bits of the inputs to the S-boxes of Round 6 are known up to an XOR with DK6. At Round 6 we approximate the first S-box by 11 11 → 44 (see Fig- ure 4.5). The input mask 11 11 is approximated to the output mask 44 through the addition operations in the S-boxes (and the rotation), and there- fore the quality of the approximation is determined by the carry bits from lower bits into the approximated bits. We are interested in improving our control over the carry bits, which in turn will improve our approximations. For that we identified that some of the bits in the inputs to this S-box

(denoted by w1 and w2 in Figure 4.2) in this round are known up to an XOR with the actual subkey DK6 (as mentioned above). The approximation 11 11 → 44 approximates two bits through the ad- dition operations. One of them involves the addition of the least significant

bits of the inputs (mask 01 01 or w1,0 + w2,0 = F1,2, where wi are the input bytes to the S-boxes and Fi is the output, as denoted in Figure 4.2, and wi,j is bit j of wi). The approximation of this bit has probability 1, as there cannot be a carry into the LSB. The other approximates Bit 4 of both inputs

92

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 (mask 10 10), the carry to which involves Bit 3 of both inputs (w1,3 and w2,3, identified by the mask 08 08). If we would know in advance that the unknown values of these two bits are both 0 then it is certain that there cannot be any carry into Bit 4, which would ensure that this approximation will also have probability 1 (bias +0.5). Similarly, when both bits are 1, a carry from this bit to the next one is guaranteed, and therefore we would also be able to make the approximation with probability 1 (knowing that the carry always flips the approximated output, thus the bias is −0.5). In

the other cases (where the bits w1,3 and w2,3 are either 0,1 or 1,0), we have no idea what the carry is, but we expect that it would occur in about half of the inputs, which would cause the bias to be much closer to zero. We

refer to the four possible cases by the values of w1,3, w2,3 as cases 00, 11, 01, and 10, respectively. The bias of the S-box (on all inputs) is close to 0.25, and therefore the bias of the entire Approximation 1 is 0.25α, for some α that depends on the other parts of the approximation. If we could choose only plaintexts of cases 00 and 11 and run the attack only on these plaintexts, we would need fewer messages due to the larger

bias. Unfortunately, the values of w1,3 and w2,3 are only known up to a XOR with two missing bits of DK6 (see Figure 4.2):

w1,3 = f0,3 ⊕ f1,3 ⊕ DK61,3, w2,3 = f2,3 ⊕ f3,3 ⊕ DK62,3,

and therefore they clearly cannot be chosen or known directly. Nevertheless,

the corresponding bits f0,3, f1,3, f2,3 and f3,3 in the input of the F -function are all known as the result of the partial guess of the actual subkey DK7. We observe that we can still partition all the data into the same four sets

according to f0,3 ⊕ f1,3 and f2,3 ⊕ f3,3, instead of w1,3 and w2,3, but we do not know which of the four sets have the amplified biases. Though we cannot identify the two sets with an amplified bias, we can run this inner part of the attack four times, once on each of the sets. We expect the following results: In each set we would have about a quarter of the known plaintexts but in two of them we would have a bias twice as large as we had originally (meaning 0.5α).4 Therefore the number of required plaintexts in these sets is about 4 times (0.52/0.252) smaller than would

4For the purpose of this simplified example we assume that the linear approximation of this S-box is independent of the rest of Approximation 1. We will see later that this is not the case.

93

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 have been needed without applying this technique. A more careful analysis shows that we can merge the two sets with bias 0.5 (with an appropriate sign coefficient) and partition the plaintexts only to two sets. This merges the sets of cases 00 and 11 into one set, and the sets of cases 01 and 10 into another set according to the parity of the

two bits f0,3 ⊕ f1,3 and f2,3 ⊕ f3,3. Denote the number of known plaintexts required for the original attack by m. As discussed above, the amplified bias can be detected with m/4 plaintexts. Since each of the two unified sets has about half of the plaintexts, we deduce that m/2 known plaintexts suffice for the partitioning technique.

4.4.2 The Attack

The attack follows the lines of the above example, but considers that the details of the approximation of the S-boxes are more complicated than de- scribed so far. While for a single S-box and appropriate independence as- sumptions the technique would work as described, in practice there is a correlation between the approximation of the two middle S-boxes of F . We give the combination of both middle S-boxes the name T-box (marked by a rectangle in Figure 4.5). The joint approximation of the two S-boxes in the T-box cannot be described as a combination of two independent approxima- tions since the input bits to the second S-box are all either inputs of the first S-box or its output. Therefore, a closer examination of the joint distribution is in order. We computed the joint approximation of the S-boxes (the T-box) with the approximation 11 55 → 02 11 and observed that the partition to two sets (by the value of f0,3 ⊕ f1,3 ⊕ f2,3 ⊕ f3,3) has the following effect: In the cases 01 and 10 the bias is increased by a factor of about 2.49 compared to the original bias, while the absolute value of the bias in the other cases (00 and 11) is halved. It is therefore that the number of known plaintexts needed by the attack is reduced by a factor of about 2.492/2 ≈ 3.1. We also note that there are other possible partitions (by other control bits) that yield an increased bias in one or more of the sets, that can be used for alternative implementations of this technique. We applied this improvement to the attack of Section 4.3 and successfully reduced the number of required known plaintexts from 215 to 214. Applying

94

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 this technique did not add a noticeable overhead to the running time of the attack. In fact, the time it took to recover the 44 bits of the actual subkeys using 214 plaintexts was 12 hours – which is about half the time that was required using 215 plaintexts (without using this technique). The rest of the attack took about two more hours, and the key was found after 14 hours of computation. The key that was found for the challenge with 214 plaintexts is 5681891EEC34CE1241ED0F52C9C23F65.

4.5 Attacking FEAL-8X Using 210 Known Plain- texts with Complexity 262

The methods we described in the previous sections can be used to break FEAL-8X with even fewer known plaintexts in time which is still faster than exhaustive search. In particular, the key can be found given 210 known plaintexts in time of about 262 FEAL encryptions. To justify the above claim, we describe an attack on seven rounds of FEAL, which is based on the attack of Section 4.3.2, and then extend it to 8 rounds by exhaustively searching for the subkey of the last round. The attack on seven rounds of FEAL uses the first five rounds of Approx- imation 1, with a bias of 2−3. Similarly to the attack of Section 4.3.2, the approximation covers the five middle rounds and the analysis is performed on the first and last rounds. In each of the first and last rounds there are 15 bits that we need to guess in order to compute the parity of the linear approximation, and therefore the attack requires encrypting/decrypting an equivalent of 215 · 215 · 210 · 2 = 241 rounds of FEAL. In order to extend the attack to eight rounds, we also guess 30 bits of the actual subkey DK7 of the last round (recall that two of the 32 bits have no effect on the parity of the linear approximation). For each candidate for these 30 bits of DK7 we decrypt the last round of all the inputs, and then apply the above attack to the remaining seven rounds. The attack on seven rounds is performed 230 times, and therefore the total time complexity is equivalent to computing 230 · 241 = 271 rounds of FEAL (or 268 encryptions of the full cipher), which is much faster than exhaustively searching for the 128-bit key. When applying the optimization improvements described in Appendix 4.B

95

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 we get an even lower complexity of about 262 FEAL encryptions.5

4.6 Attacks with a Few Known or Chosen Plain- texts

In this section we describe several attacks that require only a few (even 2 or 3) known or chosen plaintexts, which are based on linear cryptanalysis or differential cryptanalysis combined with exhaustive search of most subkeys, as well as meet in the middle attacks.

4.6.1 Differential and Linear Exhaustive Search Attacks

During the work on FEAL we noticed that the actual subkeys of FEAL-8X are mixed very slowly through the encryption function. In particular, we observed that only 112 bits of the actual subkeys are needed in order to de- crypt a ciphertext by 5 rounds and compute the data after the third round of the cipher from the ciphertext. In addition, we recalled that there are four independent 3-round linear approximations with probability 1 (creating a total of 15 non-trivial approximations) and two independent 3-round differ- ential characteristics with probability 1 (creating a total of 3 characteristics). These approximations and characteristics can be found in [8, 15]. In the case of the linear approximations with probability 1, each allows us to test one parity bit of the data after the third round and to compare to a parity bit of the plaintext. Therefore, a total of 4 bits can be tested on each plaintext (except for the first known plaintext to whose parities we compare). The details of the attack (given 5 known plaintexts) are given in Algorithm 4.2. The complexity of this attack is 2112, taking into consideration that the various decryptions need not be computed several times (once by 5 rounds, then by 6, then by 7), but that the intermediate values can be cached to save computation time. A careful implementation would require an average computation of only two rounds in each guess for each of the three guessing loops. Thus the total complexity is about 3 · 2 · 2112 round computations = 0.75 · 2112 encryption of FEAL-8X.

5Recall that the key size of FEAL-8X is 128-bit.

96

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 1. For each value of the set of subkeys DK3, DK4, DK5, DK6, DK7 (in total these 160 bits only contain 112 independent bits). (a) For each plaintext-ciphertext pair (P,C) decrypt the ciphertext by 5 3 ⊕ rounds to D3 and compute the parity of each approximation P λP 3 3 → 3 D3λT , where λP λT is the mask of the linear approximation in use. (b) Discard any guess for which the five results (each of 4 bits, one for each approximation) are not the same. (c) Note that at this point only about 296 of the guesses of the subkeys remain. (d) For each value of the subkey DK2 (16 more bits) i. Note that at this point we have about 2112 guesses of the subkeys. 2 → 2 ii. We will now use four 2-round approximations λP λT which are based on the last two rounds of the prior ones. iii. For each plaintext-ciphertext pair (P,C) decrypt the ciphertext by 6 rounds to D2 and compute the parity of each approximation 2 ⊕ 2 P λP D2λT . iv. Discard any guess for which the five results are not the same. v. Note that at this point we are left again with only about 296 guesses of the subkeys. vi. For each value of the subkey DK1 (16 more bits) A. Note that at this point we have about 2112 guesses of the sub- keys. B. For each plaintext-ciphertext pair (P,C) decrypt the cipher- text by 7 rounds to D1 and compute the XOR of both halves of the whitening key DK89 ⊕ DKab (32 bits in total). C. Discard any guess for which the five results are not the same. D. Note that at this point we expect that only the correct values of all the above guesses remain. E. Complete the rest of the subkeys by guessing DK0 and com- paring the resulting DK89 in 216 time. F. Recover the original key by the algorithm of Appendix 4.A (note that given all the decryption actual subkeys and the whitening key it is easy to compute the encryption actual sub- keys needed by that algorithm).

Algorithm 4.2: Breaking FEAL in 2112 Time and Only 5 Known Plaintexts

97

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 A similar attack that uses the 3-round differential characteristics with probability 1 requires only three chosen plaintexts (whose plaintexts differ by the two plaintext differences of the two characteristics). Since each differ- ential characteristic predicts 64 bits of the intermediate difference, we have a much better elimination of wrong guesses, and thus we need only three chosen plaintexts. The complexity of the attack is also 2112.

4.6.2 Meet in the Middle Attacks

The attack that requires the least number of known plaintexts is a meet in the middle attack. We observe that the number of (independent) bits of the actual subkeys that are required to partially encrypt (or decrypt) four rounds of the cipher is 96. Therefore, a meet in the middle attack using two (or three) known plaintexts computes 296 4-round partial encryptions of two blocks plus 296 4-round partial decryptions of two blocks. This attack also requires 296 memory words of size 128 bits (or even 96 bits). The list of about 264 (or 296) colliding values should then be checked by auxiliary techniques, and be completed to a full key with the same known plaintexts. An improvement of this attack may reduce the complexity to 280, by encrypting or decrypting only three rounds from each end, using 11 known plaintexts. This improvement considers that the F -function in the fourth round can be approximated by the four independent linear approximations with probability 1 (each one is represented by a single parity bit in the output of encryption and a single parity bit in the output of decryption). The fifth round can be approximated similarly. This way, each known plaintext contributes 8 bits to the colliding values (except for the first, whose 8 parity bits are XORed into the parity bits of all the other ones), and thus in order to collide on 80 bits, we need 11 known plaintexts. Each of the 280 colliding values can then be checked by auxiliary techniques, and be completed to the full key. We also note that these meet in the middles attacks can be transformed to memoryless meet in the middle attacks by standard techniques [58, 67]. The simplest implementation of the former encrypts/decrypts three blocks at a time, each encrypted or decrypted by four rounds, resulting in a collision on 192 intermediate data bits, which ensures that the real value of the subkeys are easily identified in time 296. The simplest implementation of the latter

98

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 encrypts/decrypts 21 blocks at a time, each encrypted or decrypted by three rounds, resulting in a collision on 160 intermediate data bits, which ensures that the real value of the subkeys be easily identified in time 21·(3/8+3/8)· 280 ≈ 284.

4.7 Summary

We presented the techniques which allowed us to break FEAL-8X with only 214 known plaintexts and recover the secret key. This is an improvement of the best known-plaintext attacks prior to this work. Our attack is based on a few improvements and optimizations to linear cryptanalysis, the most important of which is the new partitioning technique which allowed us to reduce the amount of known plaintexts needed for the attack. In addition to the practical attacks on FEAL-8X we also presented a few attacks which are based on linear and differential cryptanalysis in combina- tion with meet-in-the-middle techniques. Those attacks can find the secret key given only a few messages in time which is faster than exhaustive search. We wish to discuss the similarities and differences between our parti- tioning technique and partitioning cryptanalysis [37]. They both partition the data into several sets based on functions that take the plaintexts or ciphertexts and guessed key bits, where each set of the input-partition is related to some linear approximation and expected biases. In that sense, our technique is a variant of partitioning cryptanalysis. However, in par- titioning cryptanalysis the expected biases are known in advance for each input block of the partition, and thus the attacker can select the best block and choose all the chosen plaintexts to be in that block. In our case we succeed (in the particular case of the addition operation) to take one step further and divide to partitions such that we do not know which set should have which bias. The identification of the sets is part of the attack, and it is therefore that our technique is a known plaintext attack. But perhaps the most significant improvement of our technique stems directly from the motivation that is the basis of our partition – we use the partition in order to discard (or rather ignore) messages that do not contribute to the linear bias. By doing so the bias in the remaining set is higher, which allows us to reduce the number of messages needed for the attack. We also note that our technique may in some cases be applied both on the plaintext side and on

99

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 K0 K1 K2 K3 K4 K5 K6 K7 Z0,0Z0,1 Z0,2Z0,3 Z1,0Z1,1 Z1,2Z1,3 Z2,0Z2,1 Z2,2Z2,3 Z3,0Z3,1 Z3,2Z3,3 K8 K9 Ka Kb Kc Kd Ke Kf Z4,0Z4,1 Z4,2Z4,3 Z5,0Z5,1 Z5,2Z5,3 Z6,0Z6,1 Z6,2Z6,3 Z7,0Z7,1 Z7,2Z7,3

Table 4.2: A mapping between the standard notation for FEAL subkeys and the notation used in this appendix

the ciphertext side simultaneously, and gain the extra factor in cases that partitioning cryptanalysis may not.

4.A Retrieving The FEAL-8X Key from the Ac- tual Subkeys

In [15] an algorithm for retrieving the 64-bit key of FEAL-8 from the actual subkeys is described. This algorithm uses only the subkeys which are found by the differential attack of [15] (i.e., the decryption actual subkeys DK1– DK7). Since the key processing algorithm of FEAL-8X differs from the one of FEAL-8 by several details, the most visible being the increased size of the key, we cannot use the algorithm of [15] as is. In this appendix we improve this algorithm to also use the additional information of the encryption actual subkeys EK0–EK6 in order to find all 128 bits of the FEAL-8X key.

4.A.1 The Key Processing Algorithm

Recall that in each round of the algorithm four bytes of subkeys are gener- ated. The subkeys K0–K7 generated in rounds 0–3 of the key processing algorithm are used as round keys, while the subkeys K8–Kf generated in rounds 4–7 are used as whitening keys. The key processing algorithm is depicted in Figure 4.6. For the purposes of this appendix, we use the no-

tation Zi,j (0 ≤ i ≤ 7, 0 ≤ j ≤ 3) for the bytes of the FEAL-8X subkeys. Zi,j denotes Byte j of the 32-bit output of Round i of the key processing algorithm. Table 4.2 shows the mapping between the notation described in this appendix and the standard notation.

The 16 bytes of the FEAL-8X key (denoted Key0–Key15), are used by the key processing algorithm in the following way: The first eight bytes

Key0–Key7 are mixed with the inputs of the first three rounds of the algo-

100

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 Key (K)

a0 a1 a2 a3 Z-1 Z-2 Z-3 =0 q0 Fk Z =(K0, K1) q1 0 b0 Fk S1 q2 Z1 =(K2, K3) b1 S Fk 0 q b Z2 =(K4, K5) 3 2 Fk q b3 Z3 =(K6, K7) 4 S0 S1 Fk q Z4 =(K8, K9) 5

F0 F1 F2 F3 Fk q Z5 =(Ka, Kb) 6 Fk q Z6 =(Kc, Kd) 7 Fk Z7 =(Ke, Kf)

Figure 4.6: The Key Processing Algorithm and the Fk Function

rithm. For convenience we denote

(Z−3,0,Z−3,1,Z−3,2,Z−3,3) = 0

(Z−2,0,Z−2,1,Z−2,2,Z−2,3) = (Key0, Key1, Key2, Key3)

(Z−1,0,Z−1,1,Z−1,2,Z−1,3) = (Key4, Key5, Key6, Key7)

This notation simplifies the recursive equations we present later. The second

half of the key, Key8–Key15 is used to construct qi, 0 ≤ i ≤ 7, which are used in the corresponding rounds of the algorithm. qi is given by:   (Key8 . . . Key11) ⊕ (Key12 . . . Key15) i = 0, 3, 6 qi = (Key . . . Key ) i = 1, 4, 7  8 11 (Key12 . . . Key15) i = 2, 5

We use the notation qi,j, 0 ≤ j ≤ 3, to denote Byte j of qi. As explained in Section 4.2 and demonstrated in Table 4.1, the actual subkeys that we found are linear combinations of the FEAL-8X subkeys.

101

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 Byte 0 1 2 3 DK1 Z6,0 ⊕ Z7,0 Z6,0 ⊕ Z7,0 ⊕ Z6,1 ⊕ Z7,1 ⊕ Z0,2 Z6,2 ⊕ Z7,2 ⊕ Z6,3 ⊕ Z7,3 ⊕ Z0,3 Z6,3 ⊕ Z7,3 DK2 Z6,0 Z6,0 ⊕ Z6,1 ⊕ Z1,0 Z6,2 ⊕ Z6,3 ⊕ Z1,1 Z6,3 DK3 Z6,0 ⊕ Z7,0 Z6,0 ⊕ Z7,0 ⊕ Z6,1 ⊕ Z7,1 ⊕ Z1,2 Z6,2 ⊕ Z7,2 ⊕ Z6,3 ⊕ Z7,3 ⊕ Z1,3 Z6,3 ⊕ Z7,3 DK4 Z6,0 Z6,0 ⊕ Z6,1 ⊕ Z2,0 Z6,2 ⊕ Z6,3 ⊕ Z2,1 Z6,3 DK5 Z6,0 ⊕ Z7,0 Z6,0 ⊕ Z7,0 ⊕ Z6,1 ⊕ Z7,1 ⊕ Z2,2 Z6,2 ⊕ Z7,2 ⊕ Z6,3 ⊕ Z7,3 ⊕ Z2,3 Z6,3 ⊕ Z7,3 DK6 Z6,0 Z6,0 ⊕ Z6,1 ⊕ Z3,0 Z6,2 ⊕ Z6,3 ⊕ Z3,1 Z6,3 DK7 Z6,0 ⊕ Z7,0 Z6,0 ⊕ Z7,0 ⊕ Z6,1 ⊕ Z7,1 ⊕ Z3,2 Z6,2 ⊕ Z7,2 ⊕ Z6,3 ⊕ Z7,3 ⊕ Z3,3 Z6,3 ⊕ Z7,3

Table 4.3: Relation Between the Bytes of the Decryption Actual Subkeys and the Subkeys of the Cipher

Byte 0 1 2 3 EK0 Z4,0 ⊕ Z5,0 Z4,0 ⊕ Z5,0 ⊕ Z4,1 ⊕ Z5,1 ⊕ Z0,0 Z4,2 ⊕ Z5,2 ⊕ Z4,3 ⊕ Z5,3 ⊕ Z0,1 Z4,3 ⊕ Z5,3 EK1 Z4,0 Z4,0 ⊕ Z4,1 ⊕ Z0,2 Z4,2 ⊕ Z4,3 ⊕ Z0,3 Z4,3 EK2 Z4,0 ⊕ Z5,0 Z4,0 ⊕ Z5,0 ⊕ Z4,1 ⊕ Z5,1 ⊕ Z1,0 Z4,2 ⊕ Z5,2 ⊕ Z4,3 ⊕ Z5,3 ⊕ Z1,1 Z4,3 ⊕ Z5,3 EK3 Z4,0 Z4,0 ⊕ Z4,1 ⊕ Z1,2 Z4,2 ⊕ Z4,3 ⊕ Z1,3 Z4,3 EK4 Z4,0 ⊕ Z5,0 Z4,0 ⊕ Z5,0 ⊕ Z4,1 ⊕ Z5,1 ⊕ Z2,0 Z4,2 ⊕ Z5,2 ⊕ Z4,3 ⊕ Z5,3 ⊕ Z2,1 Z4,3 ⊕ Z5,3 EK5 Z4,0 Z4,0 ⊕ Z4,1 ⊕ Z2,2 Z4,2 ⊕ Z4,3 ⊕ Z2,3 Z4,3 EK6 Z4,0 ⊕ Z5,0 Z4,0 ⊕ Z5,0 ⊕ Z4,1 ⊕ Z5,1 ⊕ Z3,0 Z4,2 ⊕ Z5,2 ⊕ Z4,3 ⊕ Z5,3 ⊕ Z3,1 Z4,3 ⊕ Z5,3

Table 4.4: Relation Between the Bytes of the Encryption Actual Subkeys and the Subkeys of the Cipher

Tables 4.3 and 4.4 show the relations between the actual subkeys that our attack finds (DK1–DK7, EK0–EK6) and the subkeys generated by the key

processing algorithm (Zi,j). For example, Byte 1 of the decryption actual subkey DK2 is a linear combination of three subkey bytes: Z6,0, Z6,1 and Z1,0.

4.A.2 Finding the Key

From Tables 4.3 and 4.4 it is clear that given the actual subkeys we can

immediately recover the values of Z4,0,Z5,0,Z6,0,Z7,0,Z4,3,Z5,3,Z6,3,Z7,3 (listed in the tables in bold). We use the information from the tables and equations derived from the

key processing algorithm to retrieve the rest of the values Zi,j and recover the key. Based on the round function of the key processing algorithm and

the structure of Fk as depicted in Figure 4.6 we can write the following four equations:

Zi,0 = S0(Zi−2,0,Zi,1 ⊕ Zi−1,2 ⊕ Zi−3,2 ⊕ qi,2) (4.1)

Zi,1 = S1(Zi−2,1 ⊕ Zi−2,0,Zi−2,2 ⊕ Zi−2,3 ⊕ Zi−1,0 ⊕ Zi−3,0 ⊕ qi,0) (4.2)

Zi,2 = S0(Zi−2,2 ⊕ Zi−2,3,Zi,1 ⊕ Zi−1,1 ⊕ Zi−3,1 ⊕ qi,1) (4.3)

Zi,3 = S1(Zi−2,3,Zi,2 ⊕ Zi−1,3 ⊕ Zi−3,3 ⊕ qi,3) (4.4)

102

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 We describe here how to use Equation (4.4) in order to derive four bytes of the FEAL-8X key. The rest of the bytes are derived in a similar way from the other three equations (4.1), (4.2) and (4.3).

We start by guessing the value of Z6,2 ⊕ Z7,2 (we iterate over all 256 possible values), and then use the actual subkeys DK1,DK3,DK5 and DK7

to find Z0,3,Z1,3,Z2,3 and Z3,3, respectively. We also guess the values of Key11 and Key15 (the value of those two key bytes determines the values of 24 qi,3 for every i). We test a total of 2 cases. We can use the computation of the third byte (Equation (4.4)) in rounds 3–7 to get:

−1 ⊕ ⊕ ⊕ ⊕ Z3,2 = S1 (Z3,3,Z1,3) Z2,3 Z0,3 Key11 Key15 −1 ⊕ ⊕ ⊕ Z4,2 = S1 (Z4,3,Z2,3) Z3,3 Z1,3 Key11 −1 ⊕ ⊕ ⊕ Z5,2 = S1 (Z5,3,Z3,3) Z4,3 Z2,3 Key15 −1 ⊕ ⊕ ⊕ ⊕ Z6,2 = S1 (Z6,3,Z4,3) Z5,3 Z3,3 Key11 Key15 −1 ⊕ ⊕ ⊕ Z7,2 = S1 (Z7,3,Z5,3) Z6,3 Z4,3 Key11

Once we know Z3,2, we can use DK7 to compute Z6,1 ⊕ Z7,1 (see Table 4.4), and then, once Z6,1 ⊕ Z7,1 is known, compute Z2,2, Z1,2 and Z0,2 from DK5, DK3 and DK1, respectively. At this stage we can also identify and discard some wrong guesses of values we have guessed before. We can check that

Z6,2 ⊕ Z7,2 matches the value that we initially guessed, and we can check that Z4,2 ⊕ Z5,2 is consistent with this value and the actual subkeys. We therefore expect that only about 28 of the 224 cases still remain. Applying Equation (4.4) to Rounds 2 and 3 of the key processing algo- rithm, we can obtain two key bytes:

−1 ⊕ ⊕ ⊕ Key7 = S1 (Z2,3,Z0,3) Z1,3 Z2,2 Key15 −1 ⊕ ⊕ ⊕ Key3 = S1 (Z1,3, Key7) Z0,3 Z1,2 Key11

We can also apply this equation to the first round of the key processing algorithm and verify that our guesses were correct. If the equation

−1 ⊕ ⊕ ⊕ ⊕ 0 = S1 (Z0,3, Key3) Z0,3 Key7 Key11 Key15

does not hold then we know that at least one of the values we guessed was wrong, and therefore discard this guess.

103

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 Note that so far we have guessed the values of three bytes, but also used three conditions to eliminate wrong guesses, so on average we should have only one wrong guess at this point, in addition to the correct guess. We perform the rest of the steps for each of the remaining cases (that were not discarded).

We obtained so far four bytes of the FEAL-8X key: Key3, Key7, Key11, Key15. Equations (4.1), (4.2) and (4.3) can now be used to derive the rest of the key bytes in a similar way. Finally, we test the remaining candidates for the key by encrypting one of the known plaintexts and verifying that it is indeed the correct key. This algorithm is very efficient and finds the key in less than a second of computation.

4.B Efficient Implementation

We describe an optimization to the implementation of the attack of Sec- tion 4.3.2 which saves a factor of about 26 in the computation time of the attack. This optimization can also be applied to other attacks presented in this chapter that are based on the attack of Section 4.3.2. Recall that in the attack of Section 4.3.2 (Algorithm 4.1) we iterate over 215 possible values for (16 bits of) the encryption actual subkey of the first round (EK0), and 222 possible values for (22 bits of) the decryption actual subkey of the last round (DK7). For each of the 237 combinations, two rounds of FEAL are encrypted/decrypted for each known plaintext. We denote the number of known plaintexts by m. We observe that given a known plaintext-ciphertext pair P,C, the parity

of the approximated bits can be written as bP ⊕ bC , where bP is a bit that depends only on the plaintext and the actual subkey of the first round, and

bC is a bit that depends only on the ciphertext and the actual subkey of the last round. The efficient attack is desctibed in Algorithm 4.3. Assuming a processor with a word size of 64 bits, this optimization lets us compute the parity of 64 plaintexts at the same time, and therefore saves a factor of about 26 in the attack. We note that this optimized implementation also works with the par- titioning technique described in Section 4.4. In Step 2a of the algorithm

above, in addition to generating the vector BC we generate a third vector

104

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 1. For each of the 215 candidates for the 16 bits of EK0 :

(a) Compute a vector BP of length D bits, where (BP )i = bPi . Save all the vectors in a table.

2. For each of the 222 candidates for the 22 bits of DK7:

(a) Compute a vector BC of length m bits, where (BC )i = bCi . (b) For each of the 215 candidates for the 16 bits of EK0, get the vector BP from the table, and compute the number of plaintexts for which the parity of the approximations is 1 by H(BP ⊕ BC ), where H is the Hamming weight function. (c) Compute the bias for approximation.

3. The correct key is expected to be the one with the highest bias.

Algorithm 4.3: Efficient Implementation of the Attack in Algorithm 4.1

W . The i-th bit of W determines to which set of the partition the i-th plain- text belongs. We can compute the number of plaintexts with a parity of 1 in

the bits of the approximation in each of the sets as H((BP ⊕ BC )&W ) and H((BP ⊕ BC )&W ), where & is the bitwise-and operator, and W denotes the binary complement of W .

105

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 Bibliography

[1] AMD, Linux Kernel Issue with Systems Using AGP Graphics – Application Note, August 2002. http://www.amd.com/us-en/assets/content_type/ white_papers_and_tech_docs/26698.

[2] M. Akg¨un, P. Kavak and H. Demirci, New Results of the Key Scheduling of RC4, proceedings of INDOCRYPT 2008, LNCS 5365, pp. 40–52, Springer-Verlag, 2008.

[3] Anonymous, RC4 Source Code, CypherPunks mailing list, September 9, 1994. Available at http://cypherpunks.venona.com/date/1994/09/msg00304.html.

[4] K. Aoki, K. Ohta, S. Moriai and M. Matsui, Linear Cryptanal- ysis of FEAL, IEICE Transactions on Fundamental of Electron- ics, Communications and Computer Science, Vol. E81-A No. 1, pp. 88–97, 1998.

[5] J. Appelbaum, J. Horchert and C. St¨ocker, Shopping for Spy Gear: Catalog Advertises NSA Toolbox, Der Spiegel, 29 Decem- ber 2013. Online edition: http://www.spiegel.de/international/world/catalog- reveals-nsa-has-back-doors-for-numerous-devices-a-940994.html

[6] A.D. Balsa, The Cyrix 6x86 Coma Bug, http://www.tux.org/~balsa/linux/cyrix/index.html

[7] M. Bellare and P. Rogaway, Optimal Asymmetric Encryption – How to encrypt with RSA (Extended Abstract), Advances in

106

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 Cryptology, proceedings of EUROCRYPT’94, LNCS 950, pp. 92– 111, Springer-Verlag, 1995.

[8] E. Biham, On Matsui’s Linear Cryptanalysis, Advances in Cryp- tology – EUROCRYPT’94, LNCS 950, pp. 341–355, 1995.

[9] E. Biham and Y. Carmeli, An Improvement of Linear Crypt- analysis with Addition Operations with Applications to FEAL-8X, proceedings of Selected Areas in Cryptography 21, LNCS 8761, pp. 59–76, Springer-Verlag, 2014.

[10] E. Biham, Y. Carmeli and A. Shamir, Bug Attacks, Advances in Cryptology, proceedings of CRYPTO’08, LNCS 5157, pp. 221– 240, Springer-Verlag, 2008.

[11] E. Biham and Y. Carmeli, Efficient Reconstruction of RC4 Keys from Internal States, proceedings of Fast Software Encryption 15, LNCS 5086, pp. 270–288, Springer-Verlag, 2008.

[12] E. Biham and O. Dunkelman, Differential Cryptanalysis in Stream Ciphers, Technical Report CS-2007-10, Department of Computer Science, Technion, 2007. Available at https://www.cs.technion.ac.il/users/wwwb/cgi-bin/ tr-get.cgi/2007/CS/CS-2007-10.pdf.

[13] E. Biham, L. Granboulan and P.Q. Nguyˆen,˜ Impossible Fault Analysis of RC4 and Differential Fault Analysis of RC4, pro- ceedings of Fast Software Encryption 12, LNCS 3557, pp. 359– 367, Springer-Verlag, 2005.

[14] E. Biham and A. Shamir, Differential Cryptanalysis of DES-like Cyptosystems, Journal of Cryptology, Vol 4, No. 1, 1991.

[15] E. Biham and A. Shamir, Differential Cryptanalysis of Feal and N-Hash, Advances in Cryptology – EUROCRYPT’91, LNCS 541, pp. 1–16, 1991.

[16] E. Biham and A. Shamir, Differential fault analysis of secret key cryptosystems, Advances in Cryptology – CRYPTO’97, LNCS 1294, pp. 513–525, 1997.

107

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 [17] J. Black, S. Halevi, H. Krawczyk, T. Krovetz and P. Rogaway, UMAC: Fast and Secure Message Authentication, Advances in Cryptology, proceedings of CRYPTO’99, LNCS 1666, pp. 215- 233, Springer-Verlag, 1999.

[18] M. Briceno, I. Goldberg and D. Wagner, A pedagogical implemen- tation of the GSM A5/1 and A5/2 ”voice privacy” encryption al- gorithms, October 1999, available at: http://cryptome.org/gsm-a512.htm.

[19] M. Boesgaard, M. Vesterager, T. Pedersen, J. Christiansen and O. Scavenius, Rabbit: A New High Performance Stream Cipher, proceedings of Fast Software Encryption 10, LNCS 2887, pp. 307– 329, Springer-Verlag, 2004.

[20] D. Boneh, R.A. DeMillo and R.J. Lipton, On The Importance of Checking Cryptographic Protocols for Faults, Advances in Cryp- tology, proceedings of EUROCRYPT’97, LNCS 1233, pp. 37–51, Springer-Verlag, 1997.

[21] C. Burwick, D. Coppersmith, E. D’Avignon, R. Gennaro, S. Halevi, C. Jutla, S.M. Matyas Jr., L. O’Connor, M. Peyra- vian, D. Safford and N. Zunic, MARS: A Candidate Cipher for AES, AES—The First Advanced Encryption Standard Candidate Conference, Conference Proceedings, 1998.

[22] D. Chaum, Blind Signatures for Untraceable Payments, Advances in Cryptology, Proceedings of CRYPTO’82, pp. 199–203, Plenum Press, 1983.

[23] R.R. Collins, Inside the Pentium II Math Bug, Dr. Dobb’s Portal, August 1997. http://www.ddj.com/184410254

[24] J. Daemen and V. Rijmen, The Design of Rijndael: AES - The Advanced Encryption Standard, Springer-Verlag, 2002.

[25] T. Dierks and C. Allen, The TLS Protocol, Version 1.0, Internet Engineering Task Force, January 1999. Available at ftp://ftp.isi.edu/in-notes/rfc2246.txt.

108

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 [26] H. Finney, An RC4 Cycle That Can’t Happen, Usenet newsgroup sci.crypt, September 1994.

[27] S. Fluhrer, I. Mantin and A. Shamir, Weaknesses in the Key Scheduling Algorithm of RC4, proceedings of Selected Areas in Cryptography 8, LNCS 2259, pp. 1–24, Springer-Verlag, 2001.

[28] S.R. Fluhrer and D.A. McGrew, Statistical Analysis of the Al- leged RC4 Keystream Generator, proceedings of Fast Software Encryption 7, LNCS 1978, pp. 19–30, Springer-Verlag, 2001.

[29] D. Genkin, A. Shamir and E. Tromer, RSA Key Extraction via Low-Bandwidth Acoustic Cryptanalysis, Advances in Cryptology, proceedings of CRYPTO’14, LNCS 8616, pp. 444-461, Springer- Verlag, 2014.

[30] D. Genkin, I. Pipman and E. Tromer, Get Your Hands Off My Laptop: Physical Side-Channel Key-Extraction Attacks on PCs, proceedings of CHES 2014, to appear.

[31] H. Gilbert, M. Girault, P. Hoogvorst, F. Noilhan, T. Pornin, G. Poupard, J. Stern and S. Vaudenay, Decorrelated Fast Ci- pher: An AES Candidate, AES—The First Advanced Encryption Standard Candidate Conference, Conference Proceedings, 1998.

[32] J.Dj. Goli´c, Linear Statistical Weakness of Alleged RC4 Keystream Generator, Advances in Cryptology, proceedings of EUROCRYPT’97, LNCS 1233, pp. 226–238, Springer-Verlag, 1997.

[33] A.L. Grosul and D.S. Wallach, A Related-Key Cryptanalysis of RC4, Technical Report TR-00-358, Department of Computer Sci- ence, Rice University, June 2000. Available at http://cohesion.rice.edu/engineering/computerscience/ tr/TR_Download.cfm?SDID=126.

[34] S.S. Gupta, S. Maitra, W. Meier, G. Paul and S. Sarkar Depen- dence in IV-related bytes of RC4 key enhances vulnerabilities in WPA , proceedings of Fast Software Encryption 21, to appear.

109

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 [35] J.A. Halderman, S.D. Schoen, N. Heninger, W. Clark- son, W. Paul, J.A. Calandrino, A.J. Feldman, J. Appel- baum and E.W. Felten, Lest We Remember: Cold Boot Attacks on Encryption Keys, February 2008. Available at http://citp.princeton.edu/pub/coldboot.pdf.

[36] T.R. Halfhill, The Truth Behind the Pen- tium Bug, BYTE magazine, March 1995. http://www.byte.com/art/9503/sec13/art1.htm

[37] C. Harpes and J.L. Massey. Partitioning Cryptanalysis, proceed- ings of Fast Software Encryption 4, LNCS 1267, pp. 13-27, 1997.

[38] Intel, FDIV Replacement Program – Statistical Analysis of Floating Point Flaw: Intel White Paper, July 2004. http://support.intel.com/support/processors/pentium/ sb/CS-013007.htm

[39] Intel, Intelr CoreTM2 Duo Proces- sor E8000 and E7000 Series, July 2004. http://www.intel.com/design/processor/specupdt/318733.pdf

[40] Intel, Intelr Pentiumr Processor – Invalid In- struction Erratum Overview, November 1997. http://www.intel.com/support/processors/pentium/ppiie/

[41] S. Khazaei and W. Meier, On Reconstruction of RC4 Keys from Internal States, Mathematical Methods in Computer Science, LNCS 5393, pp. 179–189, Springer-Verlag, 2008.

[42] S.T. King, J. Tucek, A. Cozzie, C. Grier, W. Jiang and Y. Zhou, Designing and Implementing Malicious Hardware, presented in LEET 08. http://www.usenix.org/events/leet08/tech/full_papers/ king/king.pdf

[43] A. Klein, Attacks on the RC4 Stream Cipher, 2007. Available at http://cage.ugent.be/∼klein/RC4/RC4-en.ps.

110

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 [44] P.C. Kocher, Timing Attacks on Implementations of Diffie- Hellman, RSA, DSS, and Other Systems, Advances in Cryptology – CRYPTO’96, LNCS 1109, pp. 104–113, Springer-Verlag, 1996.

[45] P. Kocher, J. Jaffe and B. Jun, Differential , Ad- vances in Cryptology – CRYPTO’99, LNCS 1666, pp. 388–397, Springer-Verlag, 1999.

[46] L.R. Knudsen, W. Meier, B. Preneel, V. Rijmen and S. Ver- doolaege, Analysis Methods for (Alleged) RC4, Advances in Cryp- tology, proceedings of ASIACRYPT’98, LNCS 1514, pp. 327–341, Springer-Verlag, 1998.

[47] X. Lai and J.L. Massey and S. Murphy, Markov Ciphers and Dif- ferential Cryptanalysis, Advances in Cryptology, proceedings of EUROCRYPT’91, LNCS 547, pp. 17–38, Springer-Verlag, 1992.

[48] I. Mantin, Analysis of the Stream Cipher RC4, Master Thesis, The Weizmann Institute of Science, Israel, 2001. Available at http://www.wisdom.weizmann.ac.il/∼itsik/RC4/Papers/Mantin1.zip.

[49] I. Mantin, Predicting and Distinguishing Attacks on RC4 Keystream Generator, Advances in Cryptology, proceedings of EUROCRYPT’05, LNCS 3494, pp. 491–506, Springer-Verlag, 2005.

[50] I. Mantin and A. Shamir, A Practical Attack on Broadcast RC4, proceedings of Fast Software Encryption 8, LNCS 2355, pp. 152– 164, Springer-Verlag, 2002.

[51] J. Markoff, F.B.I. Says the Military Had Bogus Computer Gear, New York Times, May 9, 2008. http://www.nytimes.com/2008/05/09/technology/09cisco.html

[52] M. Matsui, Linear Cryptanalysis Method for DES Cipher, Ad- vances in Cryptology – EUROCRYPT’93, LNCS 765, pp. 386– 397, 1994.

[53] M. Matsui, Key Collisions of the RC4 Stream Cipher, proceedings of Fast Software Encryption 16, LNCS 5665, pp. 38–50, Springer- Verlag, 2009.

111

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 [54] M. Matsui, Celebrating the 25th year of FEAL – A new prize problem, rump session of CRYPTO’12. http://crypto.2012.rump.cr.yp.to/19997d5a295baee62c05ba73534745ef.pdf

[55] M. Matsui and A. Yamagishi, A New Method for Known Plain- text Attack of FEAL Cipher, Advances in Cryptology – EURO- CRYPT’92, LNCS 658, pp. 81–91, 1992.

[56] A.J. Menezes, P.C. van Oorschot and S.A. Vanstone, Handbook of Applied Cryptography, CRC Press, 1996.

[57] S. Miyaguchi, News on FEAL Cipher, talk at the rump session at CRYPTO’90, 1990.

[58] H. Morita, K. Ohta and S. Miyaguci, A Switching Closure Test to Analyze Cryptosystems (Extended abstract), Advances in Cryp- tology – CRYPTO’91, LNCS576, pp. 183–193, 1992.

[59] S. Mueller, Upgrading and Repairing PCs, Eighth edition, Que Publishing, 1998. http://www.informit.com/content/downloads/que/upgrading/ fourteenth_edition/DVD/PCs8th.pdf

[60] National Bureau of Standards, , U.S. Department of Commerce, FIPS pub. 46, January 1977.

[61] K. Ohta and K. Aoki, Linear Cryptanalysis of Fast Data Enci- pherment Algorithm, Technical report of IEICE, 1994.

[62] L. Osterman, Remembering old CPU bugs, Larry Osterman’s WebLog, February, 2007. http://blogs.msdn.com/larryosterman/archive/2007/02/06/ remembering-old-cpu-bugs.aspx

[63] D.A. Osvik, A. Shamir and E. Tromer, Cache Attacks and Coun- termeasures: The Case of AES, proceedings of CT-RSA 2006, LNCS 3860, pp. 1–20, Springer-Verlag, 2006.

[64] D. Page, Theoretical Use of Cache Memory as a Cryptana- lytic Side-Channel, technical report CSTR-02-003, Department of Computer Science, University of Bristol, 2002.

112

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 [65] G. Paul and S. Maitra, RC4 State Information at Any Stage Reveals the Secret Key, proceedings of Selected Areas in Cryp- tography 14, LNCS 4876, pp. 360–377, Springer-Verlag, 2007.

[66] S.C. Pohlig and M.E. Hellman, An Improved Algorithm for Computing Logarithms Over GF(p) and Its Cryptographic Signif- icance, IEEE Transactions on Information Theory, Vol. 24 No. 1, pp. 106–111, 1978.

[67] J.J. Quisquater and J.P. Delescaille, How Easy is Collision Search, New Results and Application to DES, Advances in Cryp- tology – CRYPTO’89, LNCS435, pp. 408–413, 1990.

[68] R.L. Rivest, M.J.B. Robshaw, R. Sidney and Y.L. Yin, The RC6 Block Cipher, AES—The First Advanced Encryption Standard Candidate Conference, Conference Proceedings, 1998.

[69] R.L. Rivest, A. Shamir and L. Adleman, A Method for Obtaining Digital Signatures and Public-Key Cryptosystems, Communica- tions of the ACM, Vol. 21 (2), pp. 120–126, 1978.

[70] A. Roos, A Class of Weak Keys in the RC4 Stream Cipher, 1995. Two posts in sci.crypt. Available at http://marcel.wanda.ch/Archive/WeakKeys.

[71] B. Screamer, Microsoft’s Digital Rights Manage- ment Scheme – Technical Details, October 2001. http://cryptome.org/ms-drm.htm

[72] P. Sepehrdad, S. Vaudenay and M. Vuagnoux, Discovery and Exploitations of New Biases in RC4, proceedings of Selected Ar- eas in Cryptography 18, LNCS 6544, pp. 74–91, Springer-Verlag, 2011.

[73] A. Shamir, RSA for Paranoids, CryptoBytes vol. 1, no. 3, pp. 1–4, 1995

[74] A. Shamir, R.L. Rivest and L.M. Adleman, Mental Poker, in The Mathematical Gardner, D.A. Klarner (ed.), pp. 37–43, Wadsworth, 1981.

113

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 [75] A. Shamir and E. Tromer, Acoustic Crypt- analysis: On Nosy People and Noisy Machines, http://people.csail.mit.edu/tromer/acoustic/.

[76] A. Shimizu and S. Miyaguchi, Fast Data Encipherment Algorithm FEAL, Advances in Cryptology – EUROCRYPT’87, LNCS304, pp. 267–278, 1988.

[77] V. Shoup, OAEP Reconsidered (Extended Abstract), Advances in Cryptology, proceedings of CRYPTO 2001, LNCS 2139, pp. 239– 259, Springer-Verlag, 2001.

[78] Spiegel Staff, Inside TAO: Documents Reveal Top NSA Hacking Unit, Der Spiegel, 29 December 2013. Online edition: http://www.spiegel.de/international/world/the-nsa-uses- powerful-toolbox-in-effort-to-spy-on-global-networks- a-940969-3.html

[79] E. Tews, R.P. Weinmann and A. Pyshkin, Breaking 104 Bit WEP in Less than 60 Seconds, 2007, Available at http://eprint.iacr.org/2007/120.pdf.

[80] US Department of Defense, Defense science board tas force on high performance microchip supply, February 2005. http://www.acq.osd.mil/dsb/reports/ 2005-02-HPMS_Report_Final.pdf

[81] T. Valich, AMD delays Phenom 2.4 GHz due to TLB errata, The Inquirer, November 2007. http://www.theinquirer.net/gb/inquirer/news/2007/11/18/ amd-delays-phenom-ghz-due-tlb

[82] S. Vaudenay and M. Vuagnoux, Passive-only Key Recovery At- tacks on RC4, proceedings of Selected Areas in Cryptography 14, LNCS 4876, pp. 344–359, Springer-Verlag, 2007.

[83] A.W. Machado, The Nimbus Cipher: A Proposal for NESSIE, NESSIE Proposal, September 2000.

114

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 [84] Wikipedia, MOS Technology 6502. http://en.wikipedia.org/wiki/MOS_Technology_6502

115

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 אתגר זה הוא שדחף אותנו לשיפורים המוצגים כאן.

אנו מתארים את ההתקפה שפיתחנו לצורך כך, ואת השיפורים לקריפטאנליזה ליניארית בהם השתמשנו כדי לשחזר את המפתח הסודי בהנתן 214 הודעות נתונות, תוך 14 שעות חישוב.

שיפור מעניין במיוחד לקירובים ליניאריים של boxes־S חיבוריים משתמש בחלוקת כל ההודעות המוצפנות למספר קבוצות, כך שההטייה הסטטיסטית (bias) של התכונה הליניארית בחלק מהן גדלה (גם אם אנחנו לא יודעים מראש באילו מהן). בעקבות חלוקה כזו אנחנו מצליחים להוריד הן את כמות הנתונים הדרושה להתקפה וכן את זמן החישוב הדרוש.

בנוסף להתקפות הפרקטיות על 8X־FEAL אנחנו גם מציגים התקפות שיכולות למצוא את המפתח הסודי מתוך מספר קטן יותר של הודעות נתונות, ושרצות בזמן קצר יותר מחיפוש ממצה. אנחנו מתארים התקפה שמוצאת את המפתח הסודי בהנתן 210 הודעות נתונות בזמן השקול ל־262 הצפנות, התקפות שמוצאות את המפתח בהנתן 21־11 הודעות נתונות שרצות בזמן 280 הצפנות, וכן התקפות שבהנתן 3־2 הצפנות מוצאות את המפתח בזמן השקול ל־296 הצפנות. התקפות אלה משלבות קריפטאנליזה דירפנציאלית ולינארית עם חיפוש ממצה והתקפות מפגש באמצע (Middle Attack־The־in־Meet). התקפות אלה מנצלות את העובדה שסך האורכים של כל תתי המפתחות של צופן זה אינו גדול מספיק בהשוואה לאורך המפתח.

ג

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 בזכרון המחשב דורש יותר ממילה אחת. תוקף שמצליח לשנות את הקוד של חבילה שכזו יכול להכניס דרכה את הבאג בפעולת הכפל.

במסגרת עבודה זו אנו מציגים התקפות על מספר צפנים וסכמות קריפטוגרפיות המנצלות באג כזה, ביניהם RSA (גם כאשר הוא מוגן ע"י OAEP), פוהליג־הלמן (Hellman־Pohlig), עקומים אליפטיים (Eillptic Cruves). ההתקפות שאנו מציגים מסוגלות לפעול כנגד שני המימושים הנפוצים של אלגוריתם ההעלאה בחזקה, וכן הן מתאימות למספר מודלי התקפה שונים הכוללים התקפות אדפטיביות (בהן התוקף יכול לבחור את ההודעות), התקפות שאינן אדפטיביות, והתקפות שבהן לתוקף אין שליטה על ההודעות שאותן הוא מקבל. לסיום אנחנו דנים בדרכים אפשריות להגן מפני התקפות באגים.

תרומתנו השנייה היא אלגוריתם יעיל שמשחזר את המפתח הסודי של צופן השטף RC4 מתוך המצב הפנימי ההתחלתי. הצופן RC4 נחשף לראשונה בשנת 1987 כאלגוריתם קנייני של חברת RSA. אלגוריתם ההצפנה נשמר בסוד עד שהתפרסם באינטרנט באופן אנונימי בשנת 1994. עד היום, 27 שנים לאחר פרסומו RC4 הוא צופן השטף הנפוץ ביותר בתוכנה, בין היתר הוא משמש להגנה על רשתות אלחוטיות במסגרת פרוטוקולי WEP ו־TIKP, ומגן על תעבורה באינטרנט כחלק מפרוטוקולי SSL ו־RC4 .TLS מתאפיין באלגוריתם קצר ופשוט ובמצב פנימי עצום בגודלו (2048 ביטים) שמתוכו מיוצר השטף הפסאודו־אקראי.

במסגרת עבודה זו אנחנו מציגים מספר אבחנות על דרך יצירת המצב הפנימי ההתחלתי מתוך המפתח. אבחנות אלה מאפשרות לנו לבנות את האלגוריתם ההופכי שמשחזר את המפתח מתוך המצב הפנימי. האלגוריתם שלנו יעיל בכמה סדרי גודל מהתוצאות שהיו ידועות עד כה. למשל, עבור מפתחות RC4 באורך 40 ביטים אלגוריתם זה משחזר את המפתח תוך 0.02 שניות בסיכוי הצלחה של 86.4%. גם במקרים בהם האלגוריתם אינו מצליח למצוא את כל המפתח, הוא עדיין מצליח למצוא מידע חלקי לגביו. בנוסף אלגוריתם זה יכול לשחזר את המפתח גם כאשר חלק מבתי המצב הפנימי חסרים או אינם נכונים (אך הדבר מקטין את סיכויי הצלחתו).

התרומה השלישית היא שיפור לקריפטאנליזה לינארית בצפנים המערבים את פעולת החיבור. אנו מדגימים את הטכניקה המשופרת על הצופן 8X־FEAL. מאז פרסומו לפני 27 שנים ל־FEAL היה תפקיד מרכזי בפיתוח של שיטות קריפטאנליזה, ובפרט קריפטאנליזה ליניארית וקריפטאנליזה דיפרנציאלית. לציון 25 שנים מאז פרסומו של FEAL הכריז מיצורו מצוי (Mitsuru Matsui), שפרסם לראשונה את הקריפטאנליזה הליניארית, על אתגר לשיפור ההתקפות הקיימות על הצופן. במסגרת אתגר זה פורסמו קבוצות בגדלים שונים של הודעות והצפנתן. המנצח הוא זה שיצליח למצוא במהלך השנה שעקבה להצגת האתגר את המפתח של הקבוצה הקטנה ביותר (מבין הקבוצות שהמפתח שלהן יימצא).

ב

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 תקציר

עבודה זו מביאה שלוש תרומות נפרדות ובלתי תלויות בתחום הקריפטאנליזה (אנליזה של חוזקם של צפנים).

התרומה הראשונה היא מודל התקפה חדש אותו אנו מכנים "התקפת באגים", שהוא סוג של התקפת ערוץ־צד (channel attack־side). התקפות ערוץ־צד מנצלות את הסביבה שבה מורצים החישובים הקריפטוגרפיים ומשתמשות בה כדי לתקוף את הצופן הרץ באותה הסביבה, בניגוד לשימוש הישיר בחולשות אלגוריתמיות גרידא. דוגמאות לערוצי־צד ידועים כוללות את מדידת צריכת הזרם של המעבד בעת הצפנה, הרעש אותו הוא מייצר, פרק הזמן שלוקח לו לבצע את החישוב, וכו'). ההתקפה שבה אנו דנים מנצלת באגים בחומרה על מנת לגלות נתונים על המפתחות הקריפטוגרפיים הסודיים.

הדוגמה המפורסמת ביותר לבאג בחומרה היא ככל הנראה הבאג בפעולת החילוק במעבד הפנטיום של אינטל, שגרם לכך שתוצאת החילוק היתה לא מדוייקת עבור מספר מועט של קלטים לא נפוצים. בעוד שעבור רוב היישומים באג שכזה מהווה מטרד שולי, עבור חישובים קריפטוגרפיים באג במעבד יכול להיות אסון אבטחה. אנו מראים שפענוח הודעות מוצפנות באמצעות מעבד שמחזיר תוצאה שגויה כאשר הוא מכפיל זוג מסוים של קלטים יכול להוביל לזליגה של המפתח הסודי, לעיתים באמצעות הודעה אחת בלבד שנבחרת ע"י התוקף.

הבאגים הניתנים לניצול ע"י התקפה שכזו אינם מוגבלים לבאגים שמקורם בטעות או בחוסר תשומת לב של יצרני החומרה. תוקף בעל אמצעים עלול לשתול בעצמו באגים שכאלה, למשל ע"י זיוף החומרה או (אם התוקף הוא ארגון ביון) בחסות החוק. אפשרות זו נראית מוחשית במיוחד בעקבות חשיפת מסמכי סנואדן (Snowden) שהודלפו מרשויות הביון האמריקאיות. במסמכים אלה יש עדויות לכך שהרשויות האמריקאיות מטפלות בחומרה בזמן המשלוח שלה מהיצרן לצרכן על מנת לשתול בה חורי אבטחה. בנוסף לזאת, מקורם של הבאגים עלול להיות גם בתוכנה. יישומים קריפטוגרפיים רבים משתמשים בחבילות צד־שלישי הממשות פעולות מתמטיות על מספרים גדולים שייצוגם

א

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 המחקר נעשה בהנחיית פרופ' אלי ביהם בפקולטה למדעי המחשב.

אני מודה לטכניון על התמיכה הכספית הנדיבה בהשתלמותי.

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 על באגים וצפנים: שיטות חדשות בקריפטאנליזה

חיבור על מחקר

לשם מילוי חלקי של הדרישות לקבלת התואר דוקטור לפילוסופיה

יניב כרמלי

הוגש לסנט הטכניון – מכון טכנולוגי לישראל מרץ ה'תשע"ה חיפה מרץ 2015

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015 על באגים וצפנים: שיטות חדשות בקריפטאנליזה

יניב כרמלי

Technion - Computer Science Department - Ph.D. Thesis PHD-2015-01 - 2015