Cache Timing Analysis of RC4
Total Page:16
File Type:pdf, Size:1020Kb
Cache Timing Analysis of RC4 Thomas Chardin1, Pierre-Alain Fouque2, and Delphine Leresteux3 1 DGA Engineering and Integration, 7 rue des Mathurins, 92221 Bagneux Cedex 2 D´epartement d’informatique, Ecole´ normale sup´erieure, 45 rue d’Ulm, F-75230 Paris Cedex 05 3 DGA Information Superiority, BP7, 35998 Rennes Arm´ees [email protected], [email protected], [email protected] Abstract. In this paper we present an attack that recovers the whole internal state of RC4 using a cache timing attack model first introduced in the cache timing attack of Osvik, Shamir and Tromer against some highly efficient AES implementations. In this model, the adversary can obtain some information related to the elements of a secret state used during the encryption process. Zenner formalized this model for LFSR- based stream ciphers. In this theoretical model inspired from practical attacks, we propose a new state recovery analysis on RC4 using a belief propagation algo- rithm. The algorithm works well and its soundness is proved for known or unknown plaintext and only requires that the attacker queries the RC4 encryption process byte by byte for a practical attack. Depending on the processor, our simulations show that we need between 300 to 1,300 keystream bytes and a computation time of less than a minute. Keywords: cryptanalysis, stream cipher, RC4, cache timing analysis. 1 Introduction Some side channel attacks have been recently formalized in theoretic work by modelling powerful adversaries that can learn a bounded amount of arbitrary information on the internal state by Dziembowski and Pietrzak in [9]. Here we consider information coming from cache attacks which is of the same kind but more practical since they correspond to real attacks which have been exper- imented on AES implementation [18,7,4,22,3]. Concretely, when the cipher is looking for a value in a table, a whole line of cache is filled in, containing but not limited to the value looked for in the table. This mechanism allows to achieve better performance since in general when a program needs some data, it also requests the successive ones soon after. Osvik, Shamir and Tromer proposed in 2006 an attack on some AES implementations that use look-up tables to imple- ment the S-box and showed that the adversary can learn the high order bits of the index looked for, but neither the whole index itself nor the corresponding J. Lopez and G. Tsudik (Eds.): ACNS 2011, LNCS 6715, pp. 110–129, 2011. c Springer-Verlag Berlin Heidelberg 2011 Cache Timing Analysis of RC4 111 value of the table. These attacks are rather practical since they have been imple- mented [18,7] on classical implementations used in the OpenSSL library. Others cache attacks target DSA [1] or ECDSA [8] operations in the OpenSSL library due to branch prediction on instructions. To gain more information from cache monitoring, Osvik et al. propose to run a concurrent process at the same time as the encryption process. Attackers can evict data from the cache using the second process which begins by reading a large table to flush the cache. Then, the encryption process is run; the attacker finally tries to read again the elements of his table. If the element is in the cache, the access is fast (cache hit) and in the other case, the access is slow (cache miss) since the information has been evicted from the cache. Consequently, the adver- sary is not allowed to read the cache, but since the cache lines correspond to lines in the memory, if the adversary knows how the encryption process organizes the data in the memory (the address of the whole table for instance), the information of which cache line has been removed from the cache allows to recover the index (or a part of it) of the value looked for by the encryption process. Indeed, we do not recover the whole index since the cache is filled in line by line, so we know that the encryption process has read some element of the whole line but not exactly which element. Moreover, if the encryption process performs many table lookups, we do not have the order of the indexes since we perform timing on our own process which is run after the encryption process. These practical analyses allow us to consider such attacks on encryption schemes through a new secu- rity model. For example, Zenner et al. propose to study security of LFSR-based stream ciphers in [23,15]. RC4 is a stream cipher designed in 1987 by Ron Rivest and widely used in many standards such as in SSL, WEP, WPA TKIP, etc. The internal state of RC4 is composed of two indexes and of a permutation over F256. The initialization of the permutation table depends on the secret key (which size varies between 0 and 256 bits); the table is then updated during the generation of the keystream. Many attacks have been proposed on RC4 since its design was published in 1994 but none of them really breaks RC4. The bad initialization used in the WEP protocol and the key schedule algorithm of RC4 have been attacked by Fluhrer, Mantin and Shamir in [10]. Recent improvement has revealed new linear correlations in RC4 in order to mount key retrieve attacks on WEP and WPA [21]. Since then, from a cryptographic point of view, this scheme has not been broken despite many statistical properties. Finally, more powerful attacks have been taken into account, for instance fault attacks by Hoch and Shamir, Biham et al. in [12,6]. However, the number of faults is rather high, 216 for the most efficient attack. Previous Work. Our analysis is related to the one published in 1998 by Knud- sen et al. in [14], which try to recover the internal state from the keystream. Once the internal state is recovered, it is possible to run the algorithm backward and efficient algorithms allow to recover the key [5]. Though an improvement was proposed in 2000 [11] and another in 2008 [16], such attacks remain im- practical, having a time complexity of 2241 operations for the full RC4 version. The basic idea of the ”deterministic” attacks (section 4 of [14] and [16]) is to 112 T. Chardin, P.-A. Fouque, and D. Leresteux guess some values of the table and then check if these guesses are valid with the output keystream. These algorithms perform a clever search by guessing bytes when they need them and then use a backtracking approach when a contradic- tion appears. However, a huge number of values have to be guessed so that the complexity is relatively high in the end. This is basically the algorithm of [14]. Maximov and Khovratovich in [16] improve this algorithm by looking at the equations of RC4: it = it−1 +1 jt = jt−1 + St−1[it] St[it]=St−1[jt],St[jt]=St−1[it] Zt = St[k]wherek = St[it]+St[jt] In the algorithm of [14], the number of unknowns is 4 (j, S[i], S[j]andk = S[i]+S[j]) even if they are related). Maximov and Khovratovich solve these equations by noting that if j is known for different times t,thenS[i] also, and the number of unknowns is reduced to 2. Then, they show that it is possible to have the value of j for consecutive times t, and also to detect such patterns from the keystream. The attack begins by locating in the keystream a good pattern which gives information about the internal state and j, and then since the equations are simpler the complexity is lower. Solving such linear systems with non-linear terms has also been recently extended by Khovratovich et al. to more complex equations system in [13] in the context of differential trail for hash functions. Finally, Knudsen et al. propose a ”probabilistic” algorithm in section 5 of [14], which is different from the deterministic one since the idea is that the output keystream gives conditions on the internal secret state which leads to condi- tional probability distribution Pr(S[i]=v|Zt = z). Now, the internal state is represented with a probabilistic distribution table: to each element S[i]inthe table is associated a probability distribution on the 256 possible values. At the beginning, for all, i and v,Pr(S[i]=v)=1/256. Then according to the output keystream byte Zt, an a posteriori distribution is computed using Bayes rules and the previous values in the distribution table, and finally the algorithm ac- cordingly updates the distribution table. This probabilistic algorithm does not work if no more information is used. Knudsen et al. partially fulfill the table at the beginning with correct values. Their experiments show that they need 170 values so that the algorithm converges. They use the same idea (used later by Maximov and Khovratovich): they fulfill the table such that consecutive values of j can be found which makes the equations easier. The algorithm we propose here is different from the one described in [14]; however they have in common the manner of using the structure of PRGA, acronym of Pseudo Random Generation Algorithm, to propagate constraints on the values of elements of the secret state used by RC4. Our Results. RC4 is a good candidate to study cache timing analysis since it uses a rather large table and indexes of the lookups give information about Cache Timing Analysis of RC4 113 the table. In this paper, we present a probabilistic algorithm that recovers the current state of the permutation table.