Discovery and Exploitation of New Biases in RC4
Pouyan Sepehrdad, Serge Vaudenay and Martin Vuagnoux
EPFL CH–1015 Lausanne, Switzerland http://lasecwww.epfl.ch
Abstract. In this paper, we present several weaknesses in the stream cipher RC4. First, we present a technique to automatically reveal linear correlations in the PRGA of RC4. With this method, 48 new exploitable correlations have been dis- covered. Then we bind these new biases in the PRGA with known KSA weak- nesses to provide practical key recovery attacks. Henceforth, we apply a simi- lar technique on RC4 as a black box, i.e. the secret key words as input and the keystream words as output. Our objective is to exhaustively find linear correla- tions between these elements. Thanks to this technique, 9 new exploitable corre- lations have been revealed. Finally, we exploit these weaknesses on RC4 to some practical examples, such as the WEP protocol. We show that these correlations lead to a key recovery attack on WEP with only 9800 encrypted packets (less than 20 seconds), instead of 24200 for the best previous attack.
1 Introduction
RC4 is a stream cipher designed by Ronald Rivest in 1987. It had initially been a trade secret until the algorithm was anonymously posted to the Cypherpunks mailing list in September 1994. Nowadays, RC4 is still widely used: it is the default cipher of the SSL/TLS protocol and a cryptographic primitive of the WEP (Wired Equivalent Privacy) and WPA (Wi-Fi Protected Access) protocols. Its popularity probably comes from its simplicity and the low computational cost of the encryption and decryption operations. Due to its straightforwardness, RC4 sparked extensive research, revealing weaknesses in case of misuse. The most famous example is the attack on the WEP protocol used in IEEE 802.11. WEP was designed to provide confidentiality on wireless communications by using RC4. In order to simplify the key set up, WEP uses fixed keys. In wireless communi- cations, packets may be easily lost. Because of the lack of transport control at the link level, IEEE 802.11 designers chose to encrypt each packet independently. However, RC4 is a stream cipher: the same secret key must never be used twice. To prevent any key repetition, WEP concatenates an initialization vector (IV) to the key, where the IV is a 24-bit value which is publicly disclosed in the header of the protocol. This particu- lar use of RC4 is subject to many weaknesses. However, RC4 is not generally used with an IV and almost all the attacks concerning WEP cannot be applied to the plain RC4. Thus, RC4 is still believed to be secure, even if many weaknesses have been explored. 1.1 Description of RC4
The stream cipher RC4 consists of two algorithms: the Key Scheduling Algorithm (KSA) and the Pseudo Random Generator Algorithm (PRGA). The KSA generates an initial state from a random key K of ` words of n bits as described in Algorithm 1. It starts with an array {0,1,...,N − 1}, where N = 2n and swaps N pairs, depending on the value of the secret key K. At the end, we obtain the initial state SN−1.
Algorithm 1 RC4 Key Scheduling Algorithm (KSA) 1: for i = 0 to N − 1 do 2: S[i] ← i 3: end for 4: j ← 0 5: for i = 0 to N − 1 do 6: j ← j + S[i] + K[i mod `] mod N 7: swap(S[i],S[ j]) 8: end for
Algorithm 2 RC4 Pseudo Random Generator Algorithm (PRGA) 1: i ← 0 2: j ← 0 3: loop 4: i ← i + 1 mod N 5: j ← j + S[i] mod N 6: swap(S[i],S[ j]) 7: keystream word zi = S[S[i] + S[ j] mod N] 8: end loop
Once the initial state SN−1 is created, it is used by the second algorithm of RC4, the PRGA. Its role is to generate a keystream of words of n bits, which will be XORed with the plaintext to obtain the ciphertext. Thus, RC4 computes the loop of the PRGA each time a new keystream word zi is needed, according to Algorithm 2. Note that each time a word of the keystream is generated the internal state of RC4 is updated.
Notation. In this paper, we define all the operators such as addition, subtraction and multiplication in the group Z/NZ. Thus, x + y should be read as (x + y) mod N. The indices of the table respect the C-style programming reference. This means that the first entry of the table has the index 0. Let Si[k] denote the value of the array S at index k, after round i in KSA. S−1 denotes the array where S[i] = i, where i = 0,1,...,N − 1. −1 Let Si [p] be the index of the value p in the array S after round i in KSA. For example, −1 −1 0 Si [Si[k]] = k and Si[Si [p]] = p. Let ji (resp. ji) be the value of j during the round i of KSA (resp. PRGA) where the rounds are indexed with respect to i. Thus, the KSA 0 has rounds 0,1,...,N − 1 and the PRGA has rounds 1,2,.... Let Si denote the array S th 0 after the i round of the PRGA (i.e. S1 is equal to SN−1 with SN−1[1] and SN−1[SN−1[1]] 0 swapped). We also denote SN−1 = S0. In this paper, RC4 is always used with N = 256. Thus, instead of words we may use bytes, which are equivalent. The keystream zi is defined by
0 0 0 0 0 0 0 0 zi = Si[Si[i] + Si[ ji] mod N] = Si[Si−1[ ji] + Si−1[i] mod N] (1)
2iπ Let p be an integer and θ = e p . Unless otherwise mentioned, G denotes the Zp group. The Discrete Fourier transform (DFT) of a function f over Gs is defined as fˆ(c) = ∑ f (x)θ−c•x x∈Gs where • is the dot product.
Previous Work. There are two approaches in the study of cryptanalysis of RC4: attacks based on the weaknesses of the KSA and attacks based on the weaknesses of the PRGA. Concerning the KSA, one of the first weakness published on RC4 was discovered by 0 Roos [28] in 1995. This correlation binds the secret key bytes to the initial state S0. Roos [28] and Wagner [34] identified classes of weak keys which reveal the secret key if the first key bytes are known. This property has been largely exploited against WEP (see [5,9,2,15,16,32,30,29]). Another study of cryptanalysis of the KSA is the secret key 0 recovery when the initial state S0 is known [25,1]. On the PRGA, The analysis has been largely motivated by distinguishing attacks [7,6,19,21] or initial state reconstruction from the keystream bytes [8,31,14,22] with complexity of 2241 for the best state recov- ery attack. Relevant studies of the PRGA reveal biases in the keystream output bytes in [20,27]. Mironov in [23] recommends that the first 512 initial keystream bytes must be discarded to avoid these weaknesses. Jenkins published in 1996 on his website [11] two biases in the PRGA of RC4. These biases have been generalized by Mantin in his Master Thesis [18] as useful states. Paul, Rathi and Maitra [26] discovered in 2008 a biased output index of the first keystream word generated by the PRGA. Ultimately, the last bias on the PRGA has been experimentally discovered by Maitra and Paul [17]. In practice, key recovery attacks on RC4 must bind KSA and PRGA weaknesses to correlate secret key words to keystream words. Some biases on the PRGA [13,26,17] have been successfully bound to the Roos correlation [28] to provide known plaintext attacks.
Our Contributions. Since, almost all known PRGA correlations have been experimen- tally found, we propose a method to exhaustively reveal new weaknesses in the PRGA. From 4 known biases in the PRGA, we have found 48 additional new exploitable cor- relations thanks to this technique. To provide key recovery attacks on RC4, we must bind KSA and PRGA weaknesses, we show that some of these new correlations can be bound to KSA vulnerabilities and lead to new key recovery attacks. Subsequently, we present another technique which does not consider the KSA and the PRGA, but only RC4 as a black box with the secret key words as input and the keystream words as output. Similar to the previous technique, we exhaustively search for correlations to rediscover the 3 known biases. Thanks to this technique, we have discovered 9 addi- tional exploitable correlations between the secret key and the keystream words. Then, we exploit the known and new weaknesses against plain RC4 (with 48 first keystream words known by the attacker) and we obtain a key recovery attack with a complexity of 2122.06 instead of 2128 for RC4 with N = 256 and a key length of 16 bytes. We also show that some of these correlations can be applied to WEP, decreasing the number of required encrypted packets to 9800. The best previous key recovery attacks on WEP needed 24200 encrypted packets [32,29] for the same success probability. This permits to recover a WEP key of 104 bits with a passive ciphertext only attack in less than 20 seconds in practice. This new attack is the best key recovery on WEP to our knowledge.
Structure of the Paper. In Section 2 we briefly explore known correlations in the PRGA of RC4. Section 3 details the technique used to visually represent correlations in the PRGA. Then, we describe the new biases discovered with our technique. In Sec- tion 4 we bind some of these new PRGA correlations with known KSA weaknesses to provide practical attacks on RC4. In Section 5 we detail another kind of exhaustive search where RC4 is a black box with the secret key words as input and the keystream words as output. Then, we present the new correlations found with this technique. Sec- tion 6 briefly describes a practical application of these biases to RC4 and RC4 with an IV such as used by WEP and WPA. Finally, we conclude.
2 Known Correlations in the PRGA of RC4
Jenkins Correlation. In 1996, Robert Jenkins described in his website [11] two biased correlations experimentally found on the PRGA of RC4. The first correlation considers 0 0 0 0 th the case where Si[i] + Si[ ji] = ji. Thus, the i keystream byte given by Equation (1) becomes
0 0 0 0 0 0 0 0 zi = Si[Si[i] + Si[ ji]] = Si[ ji] = ji − Si[i] (2) which holds with probability of 2/N instead of 1/N with i = 1,2,.... The second corre- 0 0 0 th lation appears when Si[i] + Si[ ji] = i. In this case, the i byte of the keystream is given by
0 0 0 0 0 0 0 zi = Si[Si[i] + Si[ ji]] = Si[i] = i − Si[ ji] (3) which holds with probability of 2/N.
Mantin and Shamir Correlation. Mantin and Shamir [20] discovered a biased distri- bution for the second keystream word. 0 Theorem 1. Assume an initial state S0 is chosen randomly. The probability that the second keystream word z2 of RC4 is 0 is approximately 2/N instead of 1/N. Paul, Rathi and Maitra Correlation. In 2008, Paul, Rathi and Maitra [26] described a biased correlation between the three first words of the secret key and the first keystream word z1 of RC4. 0 Theorem 2. Assume that the initial state S0 is chosen uniformly at random from the set of all possible permutations of the set {0,1,...,N −1}. Then the probability distribution 0 0 0 0−1 of the output index S1[1]+S1[S0[1]] = S1 [z1] that selects the first byte of the keystream output is given by
1 for odd x N 0 0 0 1 − 2 for even x 6= P S1[1] + S1[ ji] = x = N N(N−1) 2 2 1 N − N(N−1) for even x = 2
3 Visual Representation of Correlations in the PRGA
In general, the methods used to find correlations in RC4 are either opportunistic or not given. Papers tend to describe the characteristics of the biases without revealing the techniques used to discover them. We propose to describe some simple but efficient techniques to highlight weaknesses in the PRGA through exhaustive search on a subset of elements. We define a set of linear equations which contains all the known biased correlations of the PRGA described in the previous section. Our objective is to highlight linear correlations between the internal values of a round of the PRGA and the keystream word 0 0 0 0 generated by this round i.e. the subset of elements {i, ji,Si[i],Si[ ji],zi}. The correlations previously discovered by Jenkins, Mantin and Shamir and Paul, Rathi and Maitra must be rediscovered with this method. Surprisingly, some new biases are found as well. We define the linear equations as
0 0 0 0 (a0 · i + a1 · ji + a2 · Si[i] + a3 · Si[ ji] + a4 · zi) mod N = b (4) where the ai’s are elements of Z/256Z and b is a fixed value in Z/256Z. This defines 248 linear equations. To reduce this number, we decompose these equations into 256 subgroups. Each of them corresponds to a specific round (i.e. i is fixed). Thus, both a0 · i and b can be merged into one value and Equation (4) becomes
0 0 0 0 (c0 · ji + c1 · Si[i] + c2 · Si[ ji] + c3 · zi) mod N = C (5) where C = (b − a0 · i) mod 256 and ci’s are elements of Z/256Z. Since the number of linear equations is still too large, we limit the coefficients set of the ci’s to {−1,0,1}. Indeed, this set is enough to include all the previously known biased correlation in the PRGA. We obtain 256 graphs of 81 linear equations. We compute 256 first rounds of the PRGA with 109 randomly chosen RC4 secret keys of 16 bytes and we verify all the linear equations described by Equation 5. For every equation, a counter is incremented when it holds. Subsequently, we represent these counters as a graph to visually illustrate potential biases. Human brains are very efficient to visually detect anomalies in uniform Counters
C Equations
Fig. 1: 3D representation of the equations of the second round of the PRGA according to C and the counters of the equations. The objective of this graph is to visually detect biased.
distributions (see Figure 1). Below we give the biased correlations found for the 256 first keystream bytes (from z1 to z256) generated by the PRGA. Every coefficient ci has been replaced by the corresponding element to provide an easier reading of the table 0 0 (i.e. ji must be read as c0, Si[i] as c1, etc.). Correlations with zi (i.e. c3 6= 0) are called New XXX and biases without zi (i.e. c3 = 0) are named New noz XXX. In Figure 2, we confirm the presence of known biases such as the Jenkins cor- relations. More interestingly, new biases in the PRGA appear. Some of them have a probability of success which depends on the value of i. Some rounds of the PRGA provide additional biased correlations. In Figure 3, we give extra biases which appear in round 1. Figure 4 depicts the additional correlations in the second round of the PRGA. Finally, Figure 5 describes further biased correlations in rounds 0 mod 16 of the PRGA. The probability of the biased correlations New 001 and New 002 depends on the round of the PRGA and the value C. In Figure 6 we show the success probability of New 001 according to C and the rounds 1, 16, 32, 64, 128, 192 and 256 of the PRGA. Similarly, we compute the evolution of the success probability of the biased correlations New 002 in Figure 8. Figure 7 gives the 3D representation of the same bias. Figure 9 depicts the same bias for all the first 256 rounds of the PRGA using a 3D representation.
3.1 Spectral Approach to Derive New Biases
Another systematic method to derive linear relations is to use Fourier transform of the type f of the distribution that some state bits of RC4 is following. We can use exactly the same approach to derive a linear relation between the main key bits and the key s stream. Deploying this method over Z256, we can derive some good linear relations. We call a linear relation good if the probability of Equation (4) occurring is much higher 1 than expected ( N ). We use the 4-tuple defined above as an example. In fact, we can use exactly the same method to derive biases for linear relations in Section 5.2 between the secret key and keystream. To deal with this problem, we assume G = Z256 and we 0 0 0 0 4 query RC4 for N vectors Vt ∈ ( ji,Si[i],Si[ ji],zi)t ∈ G for 1 ≤ t ≤ N and we define a 0 0 0 0 ji Si[i] Si[ ji] zi C Probability Remark 1 -1 0 -1 0 2/N Jenkins Equation (2) 0 0 1 1 i 2/N Jenkins Equation (3) 0 1 1 -1 0 1.9/N New 000 0 1 1 -1 1 0.89/N New 001 ...... 0 1 1 -1 255 1.25/N New 001 0 1 1 1 0 0.95/N New 002 ...... 0 1 1 1 255 0.95/N New 002 1 1 0 0 0 0.95/N New noz 000 ...... 1 1 0 0 255 0.95/N New noz 000 1 1 -1 0 i 2/N New noz 001 1 -1 1 0 i 2/N New noz 002 1 -1 0 0 1 0.9/N New noz 003 ...... 1 -1 0 0 255 1.25/N New noz 003 1 -1 0 0 0 1.9/N New noz 004 0 0 1 0 i + 1 1.36/N New noz 005 ...... 0 0 1 0 255 0.9/N New noz 005 0 0 1 0 i 2.34/N New noz 006
Fig. 2: correlations experimentally observed for rounds 1,2,3,...,256 of the PRGA. Note that probability of biases New 000, New noz 005 and New noz 006 decrease ac- cording to i. The probabilities given in this table correspond to round 3 (i.e. i = 3). New noz 004 and New noz 006 are not biased when i = 1.
0 0 0 0 ji Si[i] Si[ ji] zi C Probability Remark 0 1 0 -1 0 0.95/N New 003 0 1 1 0 2 1.95/N Paul, Rathi and Maitra 1 1 0 0 2 1.94/N New noz 014
Fig. 3: Additional biased correlations experimentally observed using Equation (5) in the first round of the PRGA (i.e. i = 1).
function f as follows.
1 N f (x) = (x) = counter ∑ 1Vt =x N t=1 0 0 0 0 ji Si[i] Si[ ji] zi C Probability Remark 0 0 0 1 0 2/N Mantin and Shamir [20] 1 -1 1 -1 0 2/N New 004 1 1 0 -1 even 1.0183/N New 005 1 1 0 -1 odd 1.0316/N New 006 1 0 1 0 6 2.37/N New noz 007 1 0 -1 0 255 0.75/N New noz 008 1 -1 1 0 0 2/N New noz 009 0 -1 1 0 0 0.95/N New noz 010
Fig. 4: Additional biased correlations experimentally observed using Equation (5) in the second round of the PRGA (i = 2). Note that the probability of success of correlations New 005 and New 006 decreases according to the value C.
0 0 0 0 ji Si[i] Si[ ji] zi C Probability Remark 0 0 0 1 -i 1.0411/N New 007 0 0 1 -1 i 1.0500/N New 008 0 0 1 1 -i 1.0338/N New 009 0 1 1 0 -i 1.1107/N New noz 011 0 1 0 0 -i 1.1276/N New noz 012 0 1 -1 0 -i 1.1067/N New noz 013
Fig. 5: Additional biased correlations experimentally observed using Equation (5) in rounds 0 mod 16 of the PRGA. Note that the probability of success of these correla- tions decreases according to the value i and become barely exploitable when i > 48. Probabilities given in this table come from round 16.
0.005 Round 001 0.0048 Round 016 0.0046 Round 032 Round 064 0.0044 Round 128 Round 192 0.0042 Round 256 0.004 0.0038
Probability of Success 0.0036 0.0034 0 50 100 150 200 250 300 C
Fig. 6: Probability of success of biased correlation New 001, according the value C for some rounds of the PRGA.
where x ∈ G4. In fact, f is the type of the distribution the 4-tuples are following. De- ploying DFT on f , we have fˆ(c) = ∑θ−c.x f (x) = ∑ ∑ θ−C f (x) = ∑θ−C. Pr[c . v = C] x C x:c.x=C C
Then, we follow this approach: we compute | fˆ(c)|2 for c’s such that this norm is high. Filtering those c’s yields ”good” c’s. Prob Prob Round
Round C C
Fig. 7: Probability of biased correlation New 001, according the value C for the first 256 rounds of the PRGA, using a 3D representation.
0.0041 Round 001 0.00405 Round 016 0.004 Round 032 Round 064 0.00395 0.0039 0.00385 0.0038 0.00375
Probability of Success 0.0037 0.00365 0 50 100 150 200 250 300 C
Fig. 8: Probability of success of biased correlation New 002, according the value C for some rounds of the PRGA.
C C
Round Prob
Round Prob
Fig. 9: Probability of biased correlation New 002, according the value C for the first 256 rounds of the PRGA, using a 3D representation.
We construct the table | fˆ(c)|2 of all linear masks and filter out c’s that leads to small value for | fˆ(c)|2. This remains us some c’s. Then, we can exhaustively search in the c’s left and find the good c0s. This method is much faster than exhaustive search. Assume 0 0 0 we consider the linear relation between the elements of the vector ( ji,Si[ ji],zi) instead 0 0 0 0 of ( ji,Si[i],Si[ ji],zi). This already gives us all biases quite fast. This method can be 0 easily generalized to the case when Si[i] is also involved and it is dramatically faster than exhaustive search. The complexity of this method for the triplet case is 3.232, while for exhaustive search the complexity reaches to 247 if N = 107, which is a reasonable number of sam- ples we found out experimentally. This will gain us all the biases for the triplet over 3 Z256. Using this method, we found out some new biases. We only list those which are not artifact of known biases and which can be bound with biases of KSA. They are listed 0 0 in Fig 10. Any time the coefficient of Si[ ji] is one in the table, we can use that equation to bind it like what would be explained in the next section for binding New 008 and New 009 biases. This method removes the restriction on coefficients to be only in the 4 set {−1,0,1}. Using these technique, we can recover all biases in Z256 in a reasonable time.
0 0 0 0 ji Si[i] Si[ ji] zi C i Probability Remark 0 0 1 1 240 16 1.04/N New 010 0 0 1 50 224 16 1.04/N New 011 0 0 1 68 192 16 1.05/N New 012 0 0 1 98 224 16 1.04/N New 013 0 0 1 148 192 16 1.05/N New 014 0 0 1 162 224 16 1.05/N New 015 0 0 1 186 96 16 1.03/N New 016 0 0 1 187 80 16 1.04/N New 017 0 0 1 251 80 16 1.04/N New 018 0 0 2 19 208 16 1.04/N New 019 0 0 2 127 16 16 1.04/N New 020 0 0 2 147 208 16 1.04/N New 021 0 0 2 255 16 16 1.04/N New 022 0 0 4 59 80 16 1.04/N New 023 0 0 4 123 80 16 1.04/N New 024 0 0 8 19 208 16 1.04/N New 025 0 0 8 55 144 16 1.03/N New 026 0 0 8 81 240 16 1.03/N New 027 0 0 8 215 144 16 1.03/N New 028 0 0 8 241 48 16 1.03/N New 029 0 0 8 243 208 16 1.04/N New 030 0 0 32 39 144 16 1.04/N New 031 0 0 32 191 16 16 1.04/N New 032
0 0 Fig. 10: correlations in PRGA derived using DFT. If coefficient of Si[ ji] is one, that equation can be bound with KSA biases.
After investigation, it seems that all the listed biases are artifact of a new conditional bias which is 0 0 Pr[S16[ j16] = 0|z16 = −16] = 0.038488 So far, we have no explanation about this new bias.
4 Binding PRGA and KSA Weaknesses
Since we have new PRGA biased correlations, we have to bind them with KSA weak- nesses to provide key recovery attacks.
4.1 Known Binding between KSA and PRGA Weaknesses
Already known bindings between KSA and PRGA have been exploited. In 2006, Klein [12,13] demonstrated that the Jenkins correlation of the PRGA and some weaknesses in the KSA can be combined.
0 0 Pj Si[ ji] = i − zi from Equation (3) with Pj = 2/N (6) 0 0 0 Si[ ji] = Si−1[i] step 6 of the PRGA (7) 0 0 P 0 N−2 Si−1[i] = Si[i] P = ((N − 1)/N) (8) Si[i] = Si−1[ ji] KSA (9)
ji = Si−1[i] + ji−1 + K[i] step 6 of the KSA (10)
From (6) with respectively (7), (8), (9) and (10) we have
PKlein −1 K[i] = Si−1 [i − zi mod N] − Si−1[i] − ji−1 mod N (11) which holds with probability ! 2 N − 1N−2 N − 2 N − 1N−2 1.36 P = · + · 1 − ≈ (12) Klein N N N(N − 1) N N
In 2007, Vaudenay and Vuagnoux [32] improved this attack by using the Roos correla- tion and the repetition of the secret key modulo `. Thus, the sum of the secret key bytes can be recovered with
i PKleinImproved i · (i + 1) ∑ K[y] = i − zi − mod N (13) y=0 2 with success probability of PKleinImproved(i) defined by