<<

2013 IEEE Symposium on Security and

Lucky Thirteen: Breaking the TLS and DTLS Record Protocols

Nadhem J. AlFardan and Kenneth G. Paterson Information Security Group, Royal Holloway, University of London Egham, Surrey TW20 0EX, UK : {nadhem.alfardan.2009, kenny.paterson}@rhul.ac.uk

Abstract—The (TLS) protocol to build a for transporting application-layer aims to provide confidentiality and integrity of data in transit data. Other major components are the (D)TLS Handshake de facto across untrusted networks. TLS has become the secure Protocol, which is responsible for , session protocol of choice for Internet and mobile applications. DTLS is a variant of TLS that is growing in importance. In this establishment and ciphersuite negotiation, and the TLS paper, we distinguishing and plaintext recovery attacks Alert Protocol, which carries error and manage- against TLS and DTLS. The attacks are based on a delicate ment traffic. Setting aside dedicated authenticated timing analysis of decryption processing in the two protocols. algorithms (which are yet to see widespread support in TLS We include experimental results demonstrating the feasibility of or DTLS implementations), the (D)TLS Record Protocol the attacks in realistic network environments for several differ- ent implementations of TLS and DTLS, including the leading uses a MAC-Encode-Encrypt (MEE) construction. Here, the OpenSSL implementations. We provide countermeasures for plaintext data to be transported is first passed through a the attacks. Finally, we discuss the wider implications of our MAC algorithm (along with certain header ) to create attacks for the cryptographic design used by TLS and DTLS. a MAC tag. The supported MAC algorithms are all HMAC- Keywords-TLS, DTLS, CBC-mode encryption, timing attack, based, with MD5, SHA-1 and SHA-256 being the allowed plaintext recovery hash algorithms in TLS 1.2 [9]. Then an encoding step takes place. For the RC4 , this just involves I. INTRODUCTION concatenation of the plaintext and the MAC tag, while for CBC-mode encryption (the other possible option), the TLS is arguably the most widely-used secure communica- plaintext, MAC tag, and some encryption of a tions protocol on the Internet today. Starting life as SSL, the specified format are concatenated. In the encryption step, protocol was adopted by the IETF and specified as TLS 1.0 the encoded plaintext is encrypted with the selected cipher. [7]. It has since evolved through TLS 1.1 [8] to the current In the case where CBC-mode is selected, the version TLS 1.2 [9]. Various other RFCs define additional is DES, 3DES or AES (with DES being deprecated in TLS TLS cryptographic algorithms and extensions. TLS is now 1.2). Following [18], we refer to this MEE construction as used for securing a wide variety of application-level traffic MEE-TLS-CBC. We provide greater detail on its operation and has become a serious rival to IPsec for general VPN in the (D)TLS Record Protocol in Section II. usage. It is widely supported in client and server software The widespread use of TLS (and the increasing use and in cryptographic libraries for embedded systems, mobile of DTLS) makes the continued study of the security of devices, and web application frameworks. these protocols of great importance. Indeed, the evolution The DTLS protocol is a close relative of TLS, developed of the TLS Record Protocol has largely been driven by from TLS by making minimal changes so as to allow it cryptographic attacks that have been discovered against it, to operate over UDP instead of TCP [16]. This makes including those in [25], [5], [17], [3], [4], [10], [18], [1]. DTLS suitable for use where the costs of TCP connection Of particular interest lately have been attacks based on the establishment and TCP retransmissions are not warranted, use of chained initialisation vectors (IVs) for CBC-mode for example, in voice and gaming applications. DTLS exists in SSL and TLS 1.0, in particular, the so-called BEAST in two versions, DTLS 1.0 [20], which roughly matches TLS attack [10] which has its roots in [23], [17], [3], [4]. This 1.1 and DTLS 1.2 [21], which aligns with TLS 1.2. attack achieved full plaintext recovery against TLS, but Both TLS and DTLS are actually protocol suites, rather only in scenarios where an attacker can gain access to a than single protocols. The main component of (D)TLS that chosen plaintext capability, perhaps by inducing the to concerns us here is the Record Protocol, which uses sym- first download malicious code into his browser. metric key cryptography (block ciphers, stream ciphers and Despite this strong requirement, the BEAST attack attracted MAC algorithms) in combination with sequence numbers significant industry and media attention in 2011. Amongst The second author’s research was supported by an EPSRC Leadership the possible countermeasures are upgrading to TLS 1.1 or Fellowship, EP/H005455/1. 1.2, the inclusion of a dummy zero-length message prior to

1081-6011/13 $26.00 © 2013 IEEE 526 DOI 10.1109/SP.2013.42 each real TLS message, or the abandonment of CBC-mode The applicability of the attacks is also implementation- encryption in favour of RC4 or an authenticated encryption dependent, because of the manner in which different imple- algorithm. mentations interpret the RFCs. We have investigated several The other major line of attacks against the TLS Record different open-source implementations of TLS and DTLS, Protocol comprises [25], [5], [17], [18], [1] and relates and found all of them to be vulnerable to our new attacks to how the padding that is required in MEE-TLS-CBC is or variants of them (or even old attacks in one case). handled during decryption. The problems here all stem from We have implemented a selection of the attacks in an the fact that the padding is added after the MAC has been experimental setting. As with earlier attacks, completely computed and so forms unauthenticated data in the encoded breaking TLS is challenging because the attacks create plaintext. Taken altogether, the attacks in [25], [5], [17], [18], “broken” TLS records and so consume many TLS sessions. [1] show that handling padding arising during decryption Nevertheless, our basic attack can extract full plaintext processing is a delicate and complex issue for MEE-TLS- for the current OpenSSL implementation of TLS assuming CBC. the attacker is located, say, in the same LAN segment as It is the case that all these attacks on CBC-mode in the targeted TLS client or server, using roughly 223 TLS TLS could be avoided by adopting RC4 or a dedicated sessions to reliably recover a block of plaintext in a multi- authenticated encryption mode, or perhaps by redesigning session attack scenario like that considered in [5]. Such a (D)TLS to use only an Encrypt-then-MAC construction. scenario is applicable when, for example, an application pro- However, RC4 is not an option for DTLS, and not NIST- tocol performs automatic TLS reconnection and recommended for TLS [6]; meanwhile authenticated encryp- retransmission. Given its complexity, this basic attack would tion modes are only available in TLS 1.2, which is not seem to present only a theoretical threat. However, variants 1 yet widely supported. Redesigning (D)TLS would require of it are much more effective: even more radical changes than adopting TLS 1.2. So it would be fanciful to “wish away” MEE-TLS-CBC, and all • The distinguishing attacks against TLS are quite prac- the complexity that this entails: this is an option that is tical for OpenSSL, requiring just a handful of sessions firmly embedded in the TLS and DTLS RFCs, in widespread in order to reliably tell apart the of chosen use, and will remain so for the foreseeable future. On the messages. other hand, we might hope that after more than a decade of • Breaking DTLS implementations is fully practical even intensive study, we would have arrived at a point where we for a remote attacker, since we can exploit the fact that understand how to implement MEE-TLS-CBC securely. In DTLS errors are non-fatal to mount the attacks in a this paper, we show that this is not the case. single session, and reuse the amplification techniques from [1] to boost the delicate timing signals on which A. Our Results our attacks depend. We present a family of attacks that apply to CBC-mode • We also have more efficient partial plaintext recov- in all TLS and DTLS implementations that are compliant ery attacks on TLS and DTLS. For example, against with TLS 1.1 or 1.2, or with DTLS 1.0 or 1.2. They also OpenSSL TLS, an attacker who knows one of a apply to implementations of SSL 3.0 and TLS 1.0 that block in either of the last two byte positions can reliably incorporate countermeasures (imple- recover each of the remaining bytes in that block using mentations that do not are of course already vulnerable to 216 sessions. known attacks). • The complexity of all our attacks can be reduced using The attacks come in various distinguishing, partial plain- language models and sequential statistical techniques text recovery, and full plaintext recovery flavours. For the as in [5], [10]. As a simple example, if the plaintext plaintext recovery attacks, no chosen-plaintext capability is is base64 encoded, as is the case for HTTP basic needed, in contrast to the BEAST attacks: the attacks can be access authentication and cookies, then the number of mounted by a standard man-in-the-middle (MITM) attacker TLS sessions needed to recover a block reduces from who sees only and can inject of his roughly 223 to 219. own composition into the network. The details of which • In the web setting, our techniques can be combined specific attacks are possible depends on the exact size of with those used in the BEAST attack [10]: client- MAC tags output by the MAC algorithm negotiated by the side malware running in the browser can be used to Handshake Protocol, and also on the fact that the exactly 13 initiate all the needed TLS sessions, with an HTTP bytes of header data are incorporated in the MAC calculation cookie being automatically injected by the browser in (hence our title). a predictable location in the plaintext stream in each session. The malware can also control the location of 1SSL Pulse (://www.trustworthyinternet.org/ssl-pulse/) reported that only 11.4% of 200,000 surveyed support TLS 1.2 in January 2013; the cookie such that there is only one unknown byte in most major browsers currently do not support TLS 1.2. the target block at each stage of the attack. The attacker

527 then combines the “one known byte” variant of our exploiting a timing difference in TLS decryption processing. attack and the base64 optimisation above (assuming the In essence, in OpenSSL, if the padding was incorrectly sensitive part of the cookie is base64 encoded). Putting formatted, then no MAC check was performed, while if all of these improvements together, we estimate that the padding was correct, then the MAC check was done. HTTP cookies can be recovered using 213 sessions per In turn, this meant faster production of an error message in byte of cookie (with all the sessions being automatically the “invalid padding” case than in the “valid padding” case. generated). Note that the malware does not need the Thus the padding oracle was revealed through a timing side- ability to inject chosen plaintext into an existing TLS channel. A complication for full plaintext recovery is that session for this attack. in TLS, the corresponding error messages are fatal, leading to the termination of the TLS session. To overcome this, B. How the Attacks Work Canvel et al. considered the multi-session setting, wherein Our new attacks exploit the fact that, when badly for- it is assumed that the same plaintext is transmitted in the matted padding is encountered during decryption, a MAC same position in the ciphertext in many sessions. Moeller check must still be performed on some data to prevent the [17] subsequently pointed out that not doing padding format known timing attacks. But what data should be used for checks is not an option, since this enables even simpler that calculation? The TLS 1.1 and 1.2 RFCs recommend attacks. The correct solution, as advocated in TLS 1.1 and checking the MAC as if there was a zero-length pad. As TLS 1.2, is to check the padding format carefully, report a noted in those RFCs: single error message for padding and MAC failures, and This leaves a small timing channel, since MAC to make the record processing time essentially the same performance depends to some extent on the size whether or not the padding is correct. Most recently, in of the data fragment, but it is not believed to be [1], we showed that the OpenSSL implementation of DTLS large enough to be exploitable, due to the large did not adopt the known attack countermeasures. We also block size of existing MACs and the small size of introduced novel timing amplification techniques to build the timing . full plaintext recovery attacks against this implementation We confirm that there are indeed small timing differences, of DTLS, even though DTLS has no explicit error messages but, contrary to what is written in the RFCs, they can be to time. exploited. In short, provided there is a fortuitous alignment Theoretical support for the MEE construction used in of various factors such as the size of MAC tags, the block (D)TLS can be found in [12], [14], [18]. In particular, cipher’s block size, and the number of header bytes, then Paterson et al. [18] gave the first positive security results there will be a time difference in the time that it takes to for a fully accurate model MEE-TLS-CBC that includes process TLS records having good and bad padding, and this all the details of the CBC-mode encoding step (which in- difference will show up in the time at which error messages corporates padding), proving that MEE-TLS-CBC provides appear on the network. This timing side-channel can then be Length Hiding Authenticated-Encryption security, provided “wrangled” into revealing plaintext data via careful statistical that its MAC and CBC-mode block cipher components analysis of multiple timing samples. satisfy natural security properties, that the MAC tags are long enough, and that it is implemented so that decryption C. Disclosure does not reveal the cause of any failures. Our attacks exploit Given the large number of affected implementations, we the fact that current implementations of (D)TLS fail to meet first notified the IETF TLS Working Group chairs, the this last assumption. So our attacks do not contradict the IETF Security Area directors and the IRTF Crypto Forum results of [18], but instead relativize their applicability to Research Group (CFRG) chairs of our attacks in November practice. 2012. We then began the process of contacting individual In independent work, Pironti et al. [19] identify effectively vendors. Patches to address our attacks have been issued by the same timing channel in TLS that we exploit. However OpenSSL, NSS, GnuTLS, PolarSSL, CyaSSL, MatrixSSL, they dismiss it as being “too small to be measured over , F5, BouncyCastle and Oracle. For further details, see the network” and instead focus on using it to recover the full version [2]. information about message lengths. The recent CRIME attack exploits the optional use of compression in TLS in D. Further Details on Related Work combination with a chosen plaintext capability to mount a Padding oracle attacks began with Vaudenay [25], who plaintext recovery attack. showed that the presence of a padding oracle, that is, an oracle telling an attacker whether the padding was correctly II. THE (D)TLS RECORD PROTOCOL formatted or not, could be leveraged to build a decryption capability. Canvel et al. [5] showed that such an oracle could We focus on the cryptographic operation of the TLS be obtained for the then-current version of OpenSSL by and DTLS Record Protocols in the case of CBC-mode

528 SQN HDR Payload transmitted over the then has the : ||C MAC HDR

where C is the concatenation of the ciphertext blocks Ci Payload MAC tag Padding (including or excluding the IV depending on the particular Encrypt SSL or TLS version). Note that the sequence number is not transmitted as part of the message. In DTLS, the data Ciphertext transmitted over the wire is the same as in TLS, except that SQN is included as part of the record header and the CBC- Figure 1. D(TLS) encryption process mode IV is always explicit. Simplistically, the decryption process reverses this se- quence of steps: first the ciphertext is decrypted block by encryption. The core encryption process is illustrated in block to recover the plaintext blocks: Figure 1 and explained in more detail below. P = D (C ) ⊕ C , Data to be protected by TLS or DTLS is received from the j Ke j j−1 application and may be fragmented and compressed before where D denotes the decryption algorithm of the block further processing. An individual record R (viewed as a byte cipher. Then the padding is removed, and finally, the MAC sequence of length at least zero) is then processed as follows. is checked, using the header information (and, in TLS, The sender maintains an 8-byte sequence number SQN which a version of the sequence number that is maintained at 2 is incremented for each record sent , and forms a 5-byte field the receiver). Finally, in DTLS, the sequence number is HDR consisting of a 2-byte version field, a 1-byte type field, optionally checked for replays. and a 2-byte length field. It then calculates a MAC over In reality, much more sophisticated processing than this is the bytes SQN||HDR||R; let T denote the resulting MAC needed. The receiver should check that the ciphertext size is tag. Note that exactly 13 bytes of data are prepended to the a multiple of the block size and is large enough to contain record R here before the MAC is computed. The size of at least a zero-length record, a MAC tag of the required the MAC tag is 16 bytes (HMAC-MD5), 20 bytes (HMAC- size, and at least one byte of padding. After decryption, the SHA-1), or 32 bytes (HMAC-SHA-256). We let t denote receiver should check that the format of the padding is one this size in bytes. of the possible patterns when removing it, otherwise attacks The record is then encoded to create the plaintext P are possible [17] (SSL allows a loose padding format, while by setting P = R||T ||pad. Here pad is a sequence of no specific padding checks are enforced during decryption in padding bytes chosen such that the length of P in bytes TLS 1.0, so both are potentially vulnerable to the attacks in is a multiple of b, where b is the block-size of the selected [17]). Typically this is done by examining the last byte of the block cipher (so b =8for 3DES and b =16for AES). In plaintext, treating it as a padding length byte padlen, and all versions of TLS and DTLS, the padding must consist of using this to dictate how many additional bytes of padding p +1 copies of some byte value p, where 0 ≤ p ≤ 255. should be removed. But care is needed here, since blindly In particular, at least one byte of padding must always be removing bytes could result in an underflow condition: there added. So examples of valid byte sequences for pad are: needs to be sufficient bytes in the plaintext to remove a total “0x00”, “0x01||0x01” and “0x02||0x02||0x02”. The padding of padlen+1 bytes and leave enough bytes for at least may extend over multiple blocks, and receivers must support zero-length record and a MAC tag. the removal of such extended padding. If all this succeeds, then the MAC can be recomputed In the encryption step, the encoded record P is encrypted and compared to the MAC tag in the plaintext. If the using CBC-mode of the selected block cipher. TLS 1.1 and padding fails to be correctly formatted, then implementations 1.2 and both versions of DTLS mandate an explicit IV, should continue to perform a MAC check anyway, to avoid which should be randomly generated. TLS 1.0 and SSL use providing a timing side-channel of the type exploited in [5]. a chained IV; our attacks work for either option. Thus, the But since the padding format is incorrect in this case, it’s ciphertext blocks are computed as: not immediately clear where the padding ends and the MAC tag is located: in effect, the plaintext is now unparseable. Cj = EK (Pj ⊕ Cj−1) e The solution recommended in TLS 1.1 and 1.2 (and by where Pi are the blocks of P , C0 is the IV, and Ke is the extension, also in DTLS 1.0 and 1.2) is to assume zero- key for the block cipher E. For TLS (and SSL), the data length padding, interpret the last t bytes of the plaintext as a MAC tag, interpret the remainder as the record R and run 2In fact, in DTLS, this 8-byte field is composed from a 16-bit epoch MAC verification on SQN||HDR||R. This has been adopted number and a 48-bit sequence number. We will abuse terminology and refer throughout to the 8-byte field as being the sequence number for both in OpenSSL and elsewhere; GnuTLS on the other hand TLS and DTLS. removes padlen +1 bytes from the end of the plaintext,

529 takes the next t bytes as the MAC, interprets what is left as for the outer hash operation, for a total of 4 compression R and then runs MAC verification on SQN||HDR||R. function evaluations. Messages M containing from 56 up to For TLS, any error arising during decryption should be 64 + 55 = 119 bytes can be encoded in two 64-byte blocks, treated as fatal, meaning an encrypted error message is sent meaning that the inner hash is done in 3 compression func- to the sender and the session terminated with all keys and tion evaluations, with 2 more being required for the outer other cryptographic material being disposed of. For DTLS, operation, for a total of 5. In general, an extra compression such errors may be rated non-fatal and the session would function evaluation is needed for each additional 64 bytes proceed to process the next ciphertext. of message data, with the exact number needed being given −55 It should now be apparent that implementing the basic by the formula  64  +4, where is the message length decryption processing of TLS and DTLS requires some in bytes. A single compression function evaluation takes care in implementation, with there being significant room typically around 500 to 1000 hardware cycles (depending on for coding errors and inadequate parsing. Moreover, this the hash function and details of the implementation), giving should all be implemented in such a way that the processing a time in the sub-μs range for modern processors. time does not leak anything about the plaintext (including Recall that in TLS the MAC is computed on plaintext the padding bytes). As we shall see, this has proved to after removing padding. Hence, one might expect the total be a challenge for implementers: no implementation we running time for decryption processing to reveal some infor- examined gets it completely correct, and the advice from mation about the size of the depadded plaintext, perhaps up TLS 1.1 and 1.2 that one should extract and check the to a resolution of 64 bytes in view of the above discussion. MAC tag as if the padding were of zero-length leaves an Our exploits this, but we will show that exploitable timing side-channel. much more is possible.

A. Details of HMAC III. A DISTINGUISHING ATTACK As mentioned above, TLS and DTLS exclusively use the In this section we describe a simple distinguishing attack HMAC algorithm [13], with HMAC-MD5, HMAC-SHA- against the MEE-TLS-CBC construction as used in TLS. 3 1, and HMAC-SHA-256 being supported in TLS 1.2. To This is a warm-up to our plaintext recovery attacks, but compute the MAC tag T for a message M with key Ka, we note that even a distinguishing attack against such an HMAC applies the specified hash algorithm H twice, in an important protocol would usually be regarded as a significant iterated fashion: weakness.

T = H((Ka ⊕ opad)||H((Ka ⊕ ipad)||M)). Recall that in a distinguishing attack, the attacker gets to choose pairs of messages (M0,M1). One of these is Here opad and ipad are specific 64-byte values, and the encrypted, Md, say, and the resulting ciphertext is given key Ka is zero-padded to bring it up to 64 bytes before the to the attacker. The attacker’s task is to decide the value of XOR operations are performed. For all the hash functions the bit d. To prevent the attacker from winning trivially, we H used in TLS, the application of H itself uses an encoding require that M0 and M1 have the same length. step called Merkle-Damgard˚ strengthening. Here, an 8-byte We focus on the case where b =16, i.e. the block cipher length field followed by padding of a specified byte format is AES. A variant of the attack works for b =8. Suppose the are appended to the message M to be hashed. The padding MAC algorithm is HMAC-H where H is either MD5, SHA- is at least 1 byte in length and aligns the data on a 64- 1 or SHA-256. Let M0 consist of 32 arbitrary bytes followed byte boundary. The relevant hash functions also have an by 256 copies of 0xFF. Let M1 consist of 287 arbitrary bytes iterated structure, processing messages in chunks of 64 bytes followed by 0x00. Note that both messages have 288 bytes, (512 bits) using a compression function, with the output and hence fit exactly into 18 plaintext blocks. Our attacker of each compression step being chained into the next step. submits the pair (M0,M1) for encryption and receives a The compression function in turn involves a complex round MEE-TLS-CBC ciphertext HDR||C.NowC consists of a structure, with many basic arithmetic operations on data CBC-mode encryption of an encoded version of Md, where being involved in each round. the encoding step adds a MAC tag T and some padding In combination, these features mean that HMAC imple- pad. Because the end of Md aligns with a block boundary, mentations for MD5, SHA-1 and SHA-256 have a distinctive the additional bytes T ||pad are encrypted in separate blocks M timing profile. Messages of length up to 55 bytes can from Md. The attacker now forms a new ciphertext HDR||C be encoded into a single 64-byte block, meaning that the in which C keeps the same 16-byte IV as C (if explicit IVs first, inner hash operation in HMAC is done in 2 com- are being used), but truncates the non-IV part of C to 288 pression function evaluations, with 2 more being required bytes. This has the effect of removing those blocks of C that contain T ||pad. 3TLS ciphersuites using HMAC with SHA-384 are specified in RFC 5289 Now the attacker submits HDR||C for decryption. If (ECC cipher suites for SHA256/SHA384) and RFC 5487 (Pre-Shared Keys SHA384/AES) but we do not consider this algorithm further here. the record underlying C was M0, then the plaintext P

530 corresponding to C appears to end with the valid 256- a smartphone. Furthermore, the jitter may be significantly byte padding pattern 0xFF ...0xFF. In this case, all of these reduced when the adversary runs as a separate process bytes are removed, and the remaining 32 bytes of plaintext on the machine performing TLS decryption. This may be are interpreted as a short message and a MAC tag. For possible in virtualised environments, e.g. in a cloud scenario example, if H is SHA-1, then we have a 12-byte message as explored in [22]. The attack also destroys the TLS session, and a 20-byte MAC tag. The MAC verification fails (with since in TLS such errors are fatal. The attack can be iterated overwhelming probability), and an error message is returned across L sessions, with Md being encrypted in each session, to the attacker. If the underlying record was M1, then P and statistical processing used to extract the timing signal. appears to end with the valid 1-byte padding pattern 0x00. In DTLS, there are no error messages, but the techniques In this case, a single byte is removed, and the remaining of [1] can be applied to solve this problem. There, the 287 bytes of plaintext are interpreted as a long message and authors send a packet containing a ciphertext C closely fol- a MAC tag. Again, the MAC verification fails and an error lowed by a DTLS message, with the latter always provoking message is returned to the attacker. a response message. Any timing difference arising from the Notice that when d =0,soC encrypts M0, a short decryption of C then shows up as a difference in the arrival message consisting of 13 bytes of header plus at most 16 time of the response messages. The signal amplification bytes of message (when the hash algorithm is MD5) is techniques from [1] can also be used to boost the timing passed through the MAC algorithm. To calculate the MAC difference – here, the idea is to send multiple packets all requires 4 evaluations of H’s compression function. On the containing C in quick succession, to create a cumulative other hand, when d =1, C encrypts M1, and a long message timing difference (since each time C is processed, it will be consisting of 13 bytes of header plus at least 255 bytes of processed in the same way). message is passed through the MAC algorithm. Then to In the attack as described, we have used 288 byte mes- calculate the MAC requires at least 8 evaluations of H’s sages. This ensured that there were sufficient bytes left compression function, at least 4 more than for the d =0 after the removal of padding to leave room for a message case. Hence, we expect the time it takes to produce the (possibly of zero length) and a MAC tag. This ensures error message on decryption failure to be somewhat larger that C passes any sanity checks that might be applied if d =1than when d =0, on the order of a couple of μs during decryption. However, these sanity checks might be for a modern processor. This timing difference then allows, exploitable in variants of our basic attack. For example, in theory, a distinguishing attack on the MEE-TLS-CBC an implementation that finds it does not have enough bytes construction used in TLS. left to contain a MAC after depadding may choose to skip MAC verification altogether, leading to an increased timing A. Practical Considerations difference. In describing the attack, we have ignored the time taken Note that the attack would still work as described if the to remove padding. This is different for the two messages truncated MACs specifed for TLS in [11] were used, since being processed, and the difference is opposite to that for the full HMAC-H computation is still performed but only MAC checking in that padding removal for M0 takes longer certain bytes of the computed tag are compared to bytes of than for M1. Similarly, we have ignored any other timing the plaintext. differences that might arise during other processing steps. In We report on the successful implementation of this attack practice, as we will see in Section V, these differences turn in Section V. out to be smaller than the MAC timing difference. IV. PLAINTEXT RECOVERY ATTACKS The attack exploits the requirement from the (D)TLS RFCs that implementations be able to properly decrypt A. General Approach records having variable length padding, but does not require As we have seen in the previous section, the processing implementations to actually send records containing such time for a (D)TLS record (and therefore the appearance padding. A variant attack is possible in case only minimum- time of error messages) will depend on the amount of length padding is supported, but involves a smaller timing padding that the receiver interprets the encoded plaintext as signal. containing. However, by placing a target ciphertext block at In TLS, the error messages are sent over the network, the end of the encrypted record, an attacker can arrange that and so can easily be detected by the attacker. However, the plaintext block corresponding to this block is interpreted these messages are subject to network jitter, and this may as padding, and hence make the processing time depend on be large enough to swamp the timing difference arising plaintext bytes. But, it seems that large amounts of valid from the 4 extra compression function evaluations. On the padding are needed to create a significant timing difference, other hand, the timing signal may be quite large when and this is difficult to arrange in a plaintext recovery attack. the cryptographic processing is performed in a constrained We show that this barrier to plaintext recovery can be environment, e.g. on an 8-bit or 16-bit processor, or even on overcome under certain circumstances.

531 ∗ Let C be any ciphertext block whose corresponding 2) P4 ends with a valid padding pattern of length at plaintext P ∗ the attacker wishes to recover. Let C denote least 2 bytes: in this case, at least 2 bytes of padding the ciphertext block preceding C∗. Note that C may be the are removed, and the next 20 bytes are interpreted as IV or the last block of the preceding ciphertext if C∗ is the a MAC tag T . This leaves a record R of length at first block of a ciphertext. We have: most 42 bytes, meaning that MAC verification is then 55 P ∗ = D (C∗) ⊕ C. performed on a message of length at most bytes. Ke 3) P4 ends with any other byte pattern: in this case, the For any block B of plaintext or ciphertext, we write byte pattern does not correspond to valid padding. B =[B0B1 ...Bb−1], where Bi denote the bytes of B. Following the prescription in the TLS 1.1 and 1.2 ∗ ∗ ∗ ∗ In particular, we have P =[P0 P1 ...Pb−1]. RFCs, the plaintext is treated as if it contains no bytes As usual, we assume that the attacker is capable of eaves- of padding, so the last 20 bytes are interpreted as a dropping on the (D)TLS-protected communications and of MAC tag T , and the remaining 44 bytes of plaintext injecting messages of his choice into the network. For TLS, are taken as the record R. MAC verification is then or DTLS with sequence number checking disabled, we do performed on a 57-byte message. not need the ability to prevent messages from reaching their In all cases, the MAC verification will fail (with over- destination. Nor do we require a chosen-plaintext capability. whelming probability) and an error message produced. No- B. Full Plaintext Recovery tice that, in accordance with the discussion in Section II-A, in Cases 1 and 3, the MAC verification will involve 5 For simplicity of presentation, in what follows, we assume evaluations of the compression function for SHA-1, while the CBC-mode IVs are explicit (as in TLS 1.1, 1.2 and b =16 Case 2 only requires 4 evaluations. Therefore, we can hope DTLS 1.0, 1.2). We also assume that (so our to distinguish Case 2 from Cases 1 and 3 by timing the block cipher is AES). It is easy to construct variants of b =8 appearance of the error message on the network. Here our attacks for implicit IVs and for . We begin by the timing difference is that needed for a single SHA- considering only TLS, with details for DTLS to follow. 1 compression function evaluation (compared to 4 such We also assume that the TLS implementation follows the evaluations in our distinguishing attack). Notice that the advice in the TLS 1.1 and 1.2 RFCs about checking the size of the header, 13 bytes, in conjunction with the MAC MAC as if there was a zero-length pad when the padding is tag size, 20 bytes, are critical in generating this distinctive incorrectly formatted. We will examine the security of other timing behaviour. implementation options in Section VI. Most importantly, In Case 2, assuming that the plaintext has no special and for reasons that will become clear, we assume for the structure, the most likely padding pattern to arise is the t =20 moment that (so that the MAC algorithm is HMAC- one of length 2, namely 0x01||0x01, with all longer padding t =16 t =32 SHA-1). We consider and (HMAC-MD5 and patterns being roughly 256 times less likely. Thus, if the HMAC-SHA-256) shortly. Δ Δ 16 attacker selects a mask in such a way that he detects Let be a block of bytes and consider the decryption Case 2 after submitting Catt(Δ) for decryption, then he can of a ciphertext Catt(Δ) of the form infer that P4 ends with 0x01||0x01, and, using the equation att ∗ ∗ ∗ C (Δ) = HDR||C0||C1||C2||C ⊕ Δ||C P4 = P ⊕ Δ, can now recover the last 2 bytes of P . (In fact, by repeating the attack with a mask Δ that is in which there are 4 non-IV ciphertext blocks, the penulti- Δ C ⊕ Δ C modified from in the third-to-last byte, the attacker can mate block is an XOR-masked version of and easily separate the case of a length 2 padding pattern from the last block is C∗. The corresponding 64-byte plaintext is P = P ||P ||P ||P all longer patterns.) 1 2 3 4 in which The question remains: how does the attacker trigger Case ∗ P ∗ P4 = DKe (C ) ⊕ (C ⊕ Δ) 2, so that he can extract the last 2 bytes of ? Recall that = P ∗ ⊕ Δ. the attacker has the freedom to select Δ. By injecting a sequence of ciphertexts Catt(Δ) with values of Δ that vary P Notice that 4 is closely related to the unknown, target over all possible values in the last 2 bytes Δ14, Δ15, then ∗ plaintext block P . We consider 3 distinct cases, which (in the worst case) after 216 trials, the attacker will surely between them cover all possibilities for what can happen select a value for Δ such that Catt(Δ) triggers Case 2. during decryption of Catt(Δ): Once the last 2 bytes of P ∗ have been extracted, the 1) P4 ends with a 0x00 byte: in this case, a single byte of attacker can more efficiently recover the remaining bytes padding is removed, the next 20 bytes are interpreted of P ∗, working from right to left. This phase is essentially as a MAC tag T , and the remaining 64 − 21 = 43 identical to Vaudenay’s original padding oracle attack [25]. bytes of plaintext are taken as the record R.MAC For example, to extract the third-to-last byte, the attacker verification is then performed on a 13 + 43 = 56-byte can use his new knowledge of the last two bytes of P ∗ to message SQN||HDR||R. now set Δ14, Δ15 so that P4 ends with 0x02||0x02. Then he

532 att ∗ generates candidates C (Δ) as before, but modifying Δ13 knows the value of the byte P14. Then he sets the starting 8 ∗ only. After at most 2 trials, he will produce a ciphertext value of Δ such that Δ14 = P14 ⊕ 0x01, so that when which falls into case 2 again, which reveals he has managed Catt(Δ) is decrypted, the second-to-last byte of P4 already ∗ 8 to set a value 0x02 in the third-to-last byte of P4 = P ⊕Δ. equals 0x01. Then he iterates over the 2 possible values ∗ From this, he can recover P13. Recovery of each subsequent for Δ15, eventually finding one such that P4 has its last two byte in P ∗ requires at most 28 trials, giving a total of 14·28 bytes equal to 0x01||0x01, triggering Case 2. He can then trials to complete the extraction of P ∗. proceed to recover the rest of P ∗ with the same complexity 1) Practical considerations: In practice, for TLS, there as before. Overall, this attack, which recovers 15 bytes of are two severe complications. Firstly, the TLS session is plaintext with 1-out-of-2 of the last bytes of the target block destroyed as soon as the attacker submits his very first attack known, consumes only 15L · 28 sessions, where L is the ciphertext. Secondly, the timing difference between the cases number of trials used for each Δ value in each byte position. is very small, and so likely to be hidden by network jitter This can be further reduced by combining the two variants. and other sources of timing difference. For example, for base64 encoded plaintext, only 15L · 26 The first problem can be overcome for TLS by mounting sessions are needed to decrypt a block. a multi-session attack, wherein we suppose that the same 3) Combining Lucky 13 with the BEAST: A significant plaintext is repeated in the same position over many sessions limitation of our attacks as described so far is their consump- (as in [5], for example). We have used masks Δ in such a tion of many TLS sessions. This limitation can be overcome way that no further modification to the attack is needed to by combining our attacks with techniques from the BEAST cater for this setting – of course blocks C and C∗ change attack [10] to target TLS-protected HTTP cookies. for each session. Specifically, in the context of a communicat- The second problem can be overcome in the same multi- ing with a over TLS, the user can be induced into session setting by iterating the attack many times for each downloading malware into his browser from a rogue . Δ value and then performing statistical processing of the This malware, perhaps implemented in Javascript, can then recorded times to estimate which value of Δ is most likely initiate all the TLS sessions need for our attack, with to correspond to Case 2. In practice, we have found that a the browser automatically appending the targetted HTTP basic percentile test (and even averaging) works well – see cookie to the browser’s initial HTTP request. Furthermore, Section V for further details. Assuming that L trials are used by adjusting the length of that initial HTTP request, the for each Δ value, the attack as described consumes roughly malware can ensure that there is only one unknown byte of L · 216 sessions, with one ciphertext Catt(Δ) being tried in HTTP cookie plaintext in each target ciphertext block. This each session. allows our remote attacker to carry out the variant attack 2) More efficient variants: The attack complexity can be described immediately above. Assuming the targeted part of significantly reduced by assuming that the language from the cookie is base64 encoded, the attack consumes L · 26 which plaintexts are drawn can be modelled using a finite- sessions per byte of HTTP cookie. As we will discuss in length Markov chain. This is a fair assumption for natural more detail in Section V, we found that setting L =27 languages, as well as application-layer protocol messages yields reliable plaintext recovery in our experimental set- such as HTML, XML etc. This model can be used to up, giving us an attack that recovers HTTP cookies using drive the selection of candidate plaintext bytes in order of roughly 213 sessions per unknown byte of cookie. decreasing likelihood, and from this, determine the bytes of Δ needed to test whether a guess for the plaintext bytes leads C. Plaintext Recovery for Other MAC Algorithms to valid padding or not. Similar techniques were used in [5], A critical feature of our attack above is the relationship [10] in combination with sequential statistical techniques to between the size of the header included in the MAC calcu- reduce the complexity of recovering low-entropy plaintexts. lation (fixed at h =13bytes), the MAC tag size t, and the Note that this approach does not work well if TLS’s optional block size b. For example, if TLS happened to be designed compression is used. Another possibility is that the plaintext such that h =12, then, with t =20and b =16, a similar bytes are drawn from a reduced space of possibilities. For case analysis as before shows that our ciphertext Catt(Δ) example, in HTTP basic access authentication, the username would have the property of having faster MAC verification and password are Base64 encoded, meaning that each byte if P4 also ends with the single byte 0x00 (the valid padding of plaintext has only 64 possible values. Similar restrictions pattern of length 1). This would allow an improved 28 attack often apply to the sensitive parts of HTTP cookies. against TLS with CBC-mode and HMAC-SHA-1. In some In a related attack scenario, if the attacker already knows sense, 13 is lucky, but 12 would have been luckier! one of the last two bytes of P ∗, he can recover the other Similarly, we have (less efficient) variants of our attacks byte with much lower complexity than our analysis so far for HMAC-MD5 and HMAC-SHA-256, where the tag sizes would suggest. This is then a plaintext recovery attack with t are 16 and 32 bytes, respectively. In fact, because here partially-known-plaintext. For example, suppose the attacker t is a multiple of b, the analysis is largely the same in

533 both cases, and we consider only HMAC-MD5 in detail. V. E XPERIMENTAL RESULTS FOR OPENSSL This time Catt(Δ) is such that we fall into Case 2 (valid A. Experimental Setup padding with a message of size at most 55 bytes, giving ∗ fast MAC verification) only if P4 = P ⊕ Δ ends with We ran version 1.0.1 of OpenSSL on the client and the a valid padding of length 6 or more. With no additional server. In our laboratory set-up, a client, the attacker and the information on P ∗ the attacker would need (worst case) 248 targeted server are all connected to the same VLAN on a attempts to construct the correct Δ so as to trigger this case; 100Mbps Ethernet switch. The targeted server was running detecting that he had done so would be more difficult in on a single core processor machine operating at 1.87 GHz view of the large number of candidate Δ values. This is with 1 GByte of RAM, while the attacker was running on not an attractive attack, especially in view of the practical a dual core processor machine operating at 3.4 GHz, with 2 considerations for TLS mentioned above. On the other hand, GByte of RAM. we do have attractive partially-known-plaintext attacks for To simulate the (D)TLS client, we made use of HMAC-MD5 and HMAC-SHA-256. For example, if any 5 s_client, a generic tool that is available as part of the out of the last 6 bytes of P ∗ are known, we can recover the OpenSSL distribution package. We modified s_client’s remaining 11 bytes using 11L · 28 sessions. The attack can source code to satisfy our testing requirements. We also also be made more efficient if the plaintext has low entropy, developed a basic Python script that calls s_client when by trying candidates for the last 6 bytes of P ∗ in order needed. Our attack code is written in C and is capable of of decreasing probability and then recovering the remaining capturing, manipulating and injecting packets of choice into bytes of P ∗ once the right 6-byte candidate is found. This the network. would be an good option for password recovery, for example. In the case of TLS, the attacker captures the “targeted” A similar analysis can be carried out for truncated MAC packet, manipulates it and then sends the crafted version to algorithms, as per [11]. For example, for an 80-bit (10-byte) the targeted server causing the TLS session to terminate. MAC tag, if any 11 out of the last 12 bytes of P ∗ are known, This crafted packet forces the client and the targeted server we can recover the remaining 5 bytes using 5L·28 sessions. to lose TCP synchronization, causing delay in the TCP Finally, we note that the “Lucky 13 + BEAST” attacks connection shutdown. To speed up the TCP connection tear work equally well, no matter what the MAC tag size is. down, the attacker sends spoofed RST packets to the client and the targeted system upon detecting the TLS encrypted alert message, forcing both systems to independently destroy D. Applying the Attacks to DTLS the underlying TCP structure associated with the terminated So far we have focussed on TLS. The changes needed to TLS session. handle DTLS are the same as for our distinguishing attack All the timing values presented in the paper are based in Section III: we can use the techniques of [1] to amplify on hardware cycles, which are specific to processor speed. the timing differences and to emulate TLS’s error messages. For example, 187 hardware cycles on our targeted server The amplification capability reduces the attack complexity operating at speed of 1.87 GHz translate to an absolute dramatically: essentially, we can accurately test each Δ value timing of 100 ns. To count the hardware cycles, we made using just a few packet trains instead of requiring L trials. use of an existing C library licensed under GNU GPL v34. There is one further critical difference that we wish to emphasise: as already noted, DTLS does not treat errors B. Statistical Analysis arising during decryption as being fatal. This means that the The network timings we collect in each experiment entire attack against DTLS can be carried out in a single are from skewed distribution(s) with long and many session, that is, without requiring the same plaintext to be outliers. However, we found that using basic statistical repeated in the same position in the plaintext across multiple techniques (medians and, more generally, percentiles) was sessions, and without waiting for the Handshake Protocol to sufficient to analyse our data. rerun for each session. These differences brings our attack well within the bounds C. Distinguishing Attack for OpenSSL TLS of practicality for DTLS. This is particularly so if DTLS’s Figure 2 shows the experimental distribution of timing optional checking of sequence numbers is disabled. Even if values for the TLS distinguishing attack described in Sec- this is not the case, the attacks are quite feasible in practice, tion III. The figure indicates that, with enough samples, it provided enough DTLS messages are available, or if the should be possible to distinguish encryptions of message M0 upper layer protocol being protected by DTLS produces (consisting of 32 arbitrary bytes followed by 256 copies of replies to sent messages in a consistent manner. These issues 0xFF) from encryptions of message M1 (consisting of 287 are discussed at greater length in [1] and the next section, arbitrary bytes followed by 0x00). where we report on the successful implementation of our attacks for the OpenSSL implementation of TLS and DTLS. 4code..com/p/fau-timer

534 takweebyte where attack .PanetRcvr tak o pnS TLS OpenSSL for Attacks Recovery Plaintext D. available. is sample one only ubro ape r vial.Teatc led a a has already when attack guessing The over moderate available. advantage a only significant are if even samples reliable of is attack number the attack; that distinguishing evident is concrete it this for probabilities success au sgetrthan greater is value eann iigsmls n hnoutput then and samples, timing remaining L einsre-iedcyto iea ucinof function a as time decryption server-side median P enfrteepce au of value expected the for seen n tako pnS L,soigfse rcsigtm ntecs of case the in time processing distinguish- faster for showing removed) TLS, (outliers M values OpenSSL timing on of attack Distribution ing 2. Figure n hntyn l osbevle of values possible all trying then and h atclrvle of values particular the lal,tedt sniir u h dp at “dip” the but noisier, is setup. data experimental some our the has in Clearly, timings network network distri- corresponding median the the of shows bution on 4 Figure message attack success. of error an prospect that timing other indicate on for times time based server-side processing These the in values. stability byte the is notable Also ecluaeatrsodvalue threshold a calculate we n forces one hsivle setting involves This 15 O ∗ 0 iigsmls le ules aclt h eino the of median the calculate outliers, filter samples, timing )Prilpanetrecovery: plaintext Partial 1) eue ipetrsodts obidacnrt attack: concrete a build to test threshold simple a used We PEN (in = Probability red S TLS SSL 0xFF 0.00001 0.00002 0.00003 0.00004 0.00005 0.00006 oprdto compared ) 1.50 P 0  adaeCycles Hardware 15 la euto npoesn iecnbe can time processing in reduction clear A . ∗ 10 6 ⊕ ITNUSIGATC UCS PROBABILITIES SUCCESS ATTACK DISTINGUISHING 1.51 Δ  128 P 64 32 16 10 L 15 8 4 2 1 15 6 ∗ 1.52 M T Δ oas equal also to a ercvrdwhen recovered be can 1 14  and ucs Probability Success 10 P (in 6 oforce to 14 ∗ 1.53 al I Table  blue 0 acltdb Attacker by Calculated  = 10 fi sls.TbeIsosthe shows I Table less. is it if 0.992 0.983 0.951 0.914 0.858 0.769 0.756 ). 6 T 0x01 Δ 1.54 1 ae npoln,gather profiling, on based 15 eto Vdsrbsan describes IV Section P  0x01 10 14 namely , ∗ 6 Δ (so 1.55 ⊕ 15 iue3sosthe shows 3 Figure .  Δ dniyn which identifying , 10 Δ 6 L 14 1.56 14 1 Δ =1 oequal to P  Δ ftemedian the if 15 = 10 14 ∗ 15 6  ..when i.e. , 1.57 = 0x00 sknown. is =  Δ 0xFE 10 6 0xFE 15 0x01 and ) for . is , . 535 et ftepretl au is for value value percentile the if test: The ftetmn itiuin(esrdover (measured distribution timing the of rcsigtime. processing nti ae h takrwudne oa of total a need IV. would Section attacker the from case, attack this recovery In plaintext full the perform iue3 pnS L einsre iig i adaecce)when cycles) hardware (in timings server median TLS P OpenSSL 3. Figure oa esoscnue nteatc crepnigto (corresponding attack of number the different in for a values different consumed represents figure sessions least the total at in on based curve is Each figure the in data-point distinguishable. clearly hardware of terms in timings network median TLS OpenSSL when cycles 4. Figure ubro esosnee is needed sessions of number ucs rbblt f09 when 0.93 of probability success ed ofse rcsigtime. processing faster to leads ubro esosis sessions of number ucs rbblt f1when 1 of probability as success increases probability a success attack the the expected, of As median. the including minimised ie n ftels w ye.W aentimplemented not have We bytes. two variant. last block, this a the from of bytes one unknown given 15 recovering to easily extend 14 ∗ iue5sosscespoaiiisfrteatc.Each attack. the for probabilities success shows 5 Figure )Fl litx recovery: plaintext Full 2) would attack the that anticipate we results, these Given = x 0x01

ai ersnstepretl sdi u statistical our in used percentile the represents -axis Hardware Cycles Hardware Cycles Calculated by Adversary Calculated on Server 1.288 1.289 1.290 1.291 1.292 600 12 800 12 000 13 200 13 400 13 600 13 800 13 000 14 1.286 1.287 ti vdn htarneo ecnie okwell, work percentiles of range a that evident is It . Δ P and 14 15 ∗        0 10 10 10 10 10 10 10 P 6 6 6 6 6 6 6 = h n o hc the which for one the 0 15 ∗ 0x01 = L 0xFF 50 h ubro rasfreach for trials of number the , and 2 50 15 sexpected, As . P . 15 ∗ L 100 = 100 p   nrae.W led reach already We increases. hnw aea h correct the as take we then , h etse ol eto be would step next The 15 15 0xFF 2 16 L L iial,w aea have we Similarly, . 150 sexpected As . Δ =2 150 =2 15 p t ecnievalue percentile -th = 8  7  hr h total the where , hr h total the where , 15 0xFE 15 64 200  200 L  0xFE 0xFE experiments. L ape)is samples) Δ ed ofaster to leads · 15 Δ 2 16 250 250 = value). trials 0xFE o recovering for server. the from remote native of is a values attacker than Higher the rather message). depend message error we error TLS since levels application-layer higher noise an somewhat the on of generally that overhead are (note the DTLS setups for enduring connection without TLS but and TLS TCP the with On happen values. timing to stable achieving hand, of for other number choice good total that a the experimentally is found and have We probability injected. success packets attack the selects off attacker repeated The corresponding value. is the mask process for This waits reply. and Heartbeat request Heartbeat DTLS a h rvosydsrbdatcst takDL.Nwthe Now ( DTLS. number attack a to with sends attacks attacker combination described in previously [1] the from techniques amplification and DTLS OpenSSL for Attacks Recovery Plaintext E. iue7sostescespoaiiisfrrecovering IV. for Section probabilities in of success the described step shows attack first 7 recovery the Figure effectively plaintext is full attack the this DTLS; OpenSSL is and work would attack full with the TLS that below for results indicate Our strongly be TLS. DTLS for can implemented for attack not block recovery have plaintext successfully We that full time. been the in shorter have much bytes a block in remaining a recovered the of bytes then However, two recovered, setup. our last in the hours once 64 around take tear-down would sessions with and the example, to set-up due For case connection time times. the TLS of and In amount TCP considerable 2. a underlying Case takes triggers this value TLS, of mask which discover to percentile-based recovery: plaintext partial TLS OpenSSL recovering for probabilities success 5. Figure eoeigteukonpanetbt ihonly with byte plaintext unknown the recovering 10 ( L iue6sostepretl-ae ucs probabilities success percentile-based the shows 6 Figure sepandi eto VD ecnuetetiming the use can we IV-D, Section in explained As eas odce -yercvr takagainst attack recovery 2-byte a conducted also We tcnb enta h taki eyefcie reliably effective, very is attack the that seen be can It . 0 =2 . 266 P

15 Success Probabilities ∗ 3 . .Ee for Even ). 0.0 0.2 0.4 0.6 0.8 1.0 when 0                         n                L =1       n P 20       =2 15 ∗ =10        sidctv fwa ih eexpected be might what of indicative is 2      suigthat assuming 7 8       letslowly. albeit , ras( trials 40       gi,teatc svr effective, very is attack the Again, .       n Percentiles       L fcatdpces olwdby followed packets, crafted of )       =2 L P 60       15 ∗       =1       7 assuming n       eetmt htthe that estimate we ,tescesprobability success the ), P 80 and       14  ∗            L skon for known, is n       P 100       14 L nodrt trade- to order in ∗ ol eue if used be could       2 2 2 2 2 2 known. ie o each for times 17 16 15 14 13 12 Trials Trials Trials Trials Trials Trials       L L L L L L 2       2 2 2 2 2 2 11 9 8 7 4 6 5      n  =10 trials n P 2 14 23 = ∗ 536 led ec oethan more reach already iue6 pnS TSprilpanetrcvr:percentile-based recovery: plaintext partial DTLS recovering OpenSSL for probabilities success 6. Figure iue8 pnS TS2bt eoey ecniebsdsuccess percentile-based recovery: 2-byte DTLS OpenSSL recovering for probabilities 8. Figure probability ( success trials with bytes both recovering success percentile-based recovery: 2-byte DTLS OpenSSL recovering for probabilities 7. Figure tak iue8sosorrslsfor results recovery plaintext our full shows a 8 to Figure easily attack. extend should attack the hnacul fhp wyfo h evr rta very reliable the where get that scenarios to or realistic needed more server, are be i.e. the there would Nevertheless, from remote, Given results. sessions away is of server. hops attacker numbers expect of the large the would couple when as a we fail LAN than involved, to same differences attacks the the timing in small situated the not is Environments Network Challenging More F. -yercvr srlal given that reliable see We is TLS. for recovery model 2-byte experimental an as serves recall ehv o odce xeiet hr h attacker the where experiments conducted not have We

Success Probabilities Success Probabilities Success Probabilities L 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 =2 0 0 0                                                                            3                                 .Teqaiyo hs eut seiec that evidence is results these of quality The ).                               20     20 20                                                              40        P P       40 40              14 14 ∗ ∗              80%          Percentiles Percentiles Percentiles        and and                60              P 60 60              P P          15 ucs aeusing rate success ∗          15 15     ∗ ∗              with    , ,       80              n n 2           80 80    23          =1 =10           P             14    ∗           ras( trials . 100              . n          known, 100 100                              =1 2 2 2 2 2 2 2 2 2 2 2 2 2 14 13 12 11 21 20 19 25 24 23 22 21 20 L Trials Trials Trials Trials Trials Trials Trials Trials Trials Trials Trials Trials Trials n 0 hc we which , . =2 =10 93 2              L L L L L L L L L L L L L     22          2 2 2 2 2 2 2 2 2 2 2 2 2 5 4 3 6 9 7 6 4 5 3 8 5 4              for 7 trials. . ;we ); 2 19 proximity requirement can be met, for example when a a non-null record, no length sanity tests will fail. hostile network service provider attacks its customers, or MAC verification is then performed on a message in cloud computing environments. For DTLS, the timing SQN||HDR||R that contains at most 311 bytes. signals can be amplified, effectively by an arbitrary amount, In both cases, the MAC verification will fail (with over- and so we would expect to be able to mount the attacks whelming probability) and an error message produced. No- remotely. tice that, in accordance with the discussion in Section II-A, in Case 1, the MAC verification will involve 9 evaluations of VI. OTHER IMPLEMENTATIONS OF TLS the compression function for SHA-1, while Case 2 requires 5 1) GnuTLS: The GnuTLS implementation of MEE- at most 8 evaluations. Therefore, we can hope to distinguish TLS-CBC deals with bad padding in a different way to that the two cases by careful timing, as previously. recommended in the RFCs: instead of assuming zero-length Now the single-byte plaintext recovery attack is straight- padding, it uses the last byte of plaintext to determine how forward: the attacker injects a sequence of ciphertexts many plaintext bytes to remove (whether or not those bytes Catt(Δ) with values of Δ that vary over all possible values 8 are correctly formatted padding). Since this approach is a in the last byte Δ15, then (in the worst case) after 2 natural alternative to the RFCs’ advice for handling bad trials, the attacker will surely select a value for Δ such that padding we analyse it here; for full details of the analysis, Catt(Δ) triggers Case 1. When this is detected, he knows see the full version [2]. that P20 ends with a 0x00 byte and can infer the value of the ∗ ∗ Firstly, we point out that GnuTLS-style processing is just last byte of P via the blockwise equation P20 = P ⊕ Δ. as vulnerable to distinguishing attacks as RFC-compliant For further improvements to this attack and analysis for processing. Indeed, the attack described in Section III will HMAC-MD5 and HMAC-SHA-256, see [2] 6 work just as before . We next present an attack that recovers We note that we have not found attacks for GnuTLS-style the rightmost byte of plaintext in any target block for processing that can extract more than the last byte of the C∗ GnuTLS-style padding processing. Let denoting the target block. This is not surprising in view of the fact that C target ciphertext block, denote the previous ciphertext the decryption time for GnuTLS-style processing depends Δ 16 block and denote a mask block of bytes. We consider only on the last byte of plaintext. Catt(Δ) the decryption of a ciphertext of the form: We worked with version 3.0.21 of GnuTLS to implement att ∗ C (Δ) = HDR||C0||C1||C2|| ...||C18||C ⊕ Δ||C the above attacks. We found that the network timings for error messages were much noisier than for OpenSSL. in which there are 20 non-IV ciphertext blocks, the penul- We also identified some coding errors in GnuTLS which timate block is an XOR-masked version of C and the last hampered our attack. Nevertheless, we were still able to ∗ block is C , the target ciphertext block. The corresponding reliably recover up to 4 bits of the last byte of each block. 320-byte plaintext is P = P1||P2|| ...||P19||P20 in which For details, see the full version [2]. Whilst extracting less ∗ plaintext than our OpenSSL attack, far fewer TLS sessions P20 = DK (C ) ⊕ (C ⊕ Δ) e were required for GnuTLS. This indicates that ignoring = P ∗ ⊕ Δ. the recommendations of the RFCs can have severe security Now we need consider only two distinct cases, which consequences. 7 between them cover all possibilities: 2) NSS: Network Security Services (NSS) is an open- source set of libraries implementing, amongst other things, 1) P20 ends with a 0x00 byte: in this case, a single byte of TLS. It is widely used, including in Mozilla client prod- padding is removed, the next 20 bytes are interpreted ucts and . The NSS code8 takes the same as a MAC tag T , and the remaining 320 − 21 = 299 approach as in GnuTLS, potentially rendering the code bytes of plaintext are taken as the record R.MAC vulnerable to an attack recovering a single byte of plaintext verification is then performed on a 13 + 299 = 312- per block. byte message SQN||HDR||R. 3) PolarSSL: We also examined the PolarSSL9 imple- 2) P20 ends with any other byte value: in this case, at mentation of TLS. The code10 behaves in much the same least two bytes of “padding” are removed, the next way as OpenSSL, setting a variable padlen to 0 if the 20 bytes are interpreted as a MAC tag T , and the padding check fails, and then verifying the MAC on a remaining bytes of plaintext are taken as the record record stripped of padlen bytes. This would render it R. Because the starting message length, at 320 bytes, is long enough to allow for the removal of 256 bytes 7http://www.mozilla.org/projects/security/pki/nss of padding and a 20-byte MAC whilst still leaving 8We worked with version 3.13.6 available at https://ftp.mozilla.org/pub/ mozilla.org/security/nss/releases/NSS 3 13 6 RTM/src/. 5www.gnu.org/software/gnutls/ 9polarssl.org/ 6In fact, since the attack only involves plaintexts which are correctly 10We worked with version 1.1.4 available at polarssl.org/trac/browser/ padded, it will work for any correct decryption algorithm. trunk/library/ssl tls.c.

537 vulnerable to the attacks described in Section IV. However, B. Use RC4 in its default configuration, PolarSSL does not send any TLS The simplest countermeasure for TLS is to switch to using alert messages when decryption errors are encountered. This the RC4 stream cipher in place of CBC-mode encryption. protects PolarSSL against our attack but also means that it However, this is not an option for DTLS. When a stream is not RFC-compliant in this aspect. cipher is used in TLS, no padding is required. Consequently 11 4) yaSSL: The yaSSL embedded SSL library, CyaSSL, none of the attacks in this paper will work. RC4 is widely is targetted at embedded and real-time operating system supported in implementations of TLS, the same countermea- environments. It appears to have rather few known vulner- sure is effective against the BEAST attack, and has been 12 abilities, with only 5 being reported in the CVE database fairly widely adopted in response to BEAST. The use of a 13 since 2005. The CyaSSL code does not perform proper stream cipher in a MEE construction is well-supported by padding checks, but instead just examines the last byte of theory [12]. But the first bytes of keystream output by the plaintext and uses this to determine how many bytes to RC4 generator have certain small biases, and TLS does not remove. This approach renders the code vulnerable to the discard these before starting encryption. For this reason, we old attack from [17] which recovers one byte of plaintext do not recommend using RC4. per block. This was the only implementation that we found that still contains this basic flaw. C. Use Authenticated Encryption 5) Java: We have examined the BouncyCastle14 and Another possibility is to switch from MEE-TLS-CBC to OpenJDK15 Java implementations of TLS. using a dedicated authenticated encryption algorithm, such The BouncyCastle code does careful sanity checking of as AES-GCM or AES-CCM which were standardised for use the padding length (as indicated by the last byte of plaintext) in TLS in RFCs 5288 [24] and 6655 [15], respectively. In but treats the padding as having length 1 if the padding theory, this should obviate all attacks based on weaknesses format, when checked, is found to be incorrect (a variable in the MEE construction. However, we cannot rule out paddingsize is set to 0, but then the plaintext size is implementation errors, and we are not aware of any detailed reduced by an amount paddingsize+minLength where analysis of implementations of these algorithms in (D)TLS minLength is set to be 1 larger than the MAC tag size). for potential side-channels. A further issue is that authenti- This deviates slightly from the recommendation of the RFCs cated encryption was only added in TLS 1.2, and this version to treat the padding as having length zero, but still allows of TLS is not yet widely supported in implementations. our attacks in Sections III and IV to be applied (for Case 3 of the main plaintext recovery attack in Section IV, MAC D. Careful implementation of MEE-TLS-CBC decryption verification ends up being performed on a 56-byte message, Our final option is to implement MEE-TLS-CBC decryp- but this will still involve 5 evaluations of the compression tion more carefully. function for SHA-1). The key requirement is to ensure uniform processing The OpenJDK code appears follow the recommendation time for all MEE-TLS-CBC ciphertexts of a given size. of the RFCs in treating the padding as having zero length if That is, the total processing time should depend only on the padding format, when checked, is found to be incorrect. the ciphertext size, and not on any characteristics of the This is because this case is trapped by exception handling, underlying plaintext (including padding). The basic principle during which the variable defining the plaintext length is not to be followed in achieving this is quite simple: since changed. This potentially renders it vulnerable to our attacks the major timing differences arise from MAC processing, in Sections III and IV. implementations should make sure the same amount of MAC processing is carried out no matter what the underlying VII. COUNTERMEASURES plaintext indicates the message length to be. However, A. Add Random Time Delays this simple principle is complicated by the need to also A natural reaction to timing-based attacks is to add perform careful sanity checking on the underlying plaintext random time delays to the decryption process to frustrate whilst avoiding the introduction of yet more timing side- statistical analysis. In fact, this countermeasure is surpris- channels, and to make sure appropriate amounts of MAC ingly ineffective, as we demonstrate in the full version [2]. processing are performed even when these checks fail. A further complication arises because the number of bytes to 11 yassl.com/yaSSL/Home.html be examined in the padding check depends on the last byte of 12www.cvedetails.com/vulnerability-list/vendor id-3485/Yassl. 13We worked with version 2.3.0 available at yassl.com/yaSSL/Source/ the last plaintext block, and so, even if the MAC processing output/src/internal.c.html. is made uniform, the running time of the padding check may 14www.bouncycastle.org/viewcvs/viewcvs.cgi/java/crypto/src/org/ still leak a small amount of information about the plaintext. bouncycastle/crypto/tls/TlsBlockCipher.java?view=markup With these remarks in mind, we now proceed to give a 15hg.openjdk.java.net/jdk7/l10n/jdk/file/3598d6eb087c/src/share/classes/ sun/security/ssl/SSLSocketImpl.java and hg.openjdk.java.net/jdk7/2d/jdk/ detailed prescription of how to achieve constant-time pro- file/85fe3cd9d6f9/src/share/classes/sun/security/ssl/CipherBox.java cessing of MEE-TLS-CBC ciphertexts, incorporating suit-

538 able sanity checking. In what follows, we let plen denote 0.00006 the length (in bytes) of the plaintext P obtained immediately 0.00005 after CBC-mode decryption of the ciphertext, padlen ty 0.00004 i denote the last byte of that plaintext interpreted as an integer l i between 0 and 255, and t denote the length of the MAC tags 0.00003

(in bytes). Also, let HDR, SQN denote the (D)TLS record 0.00002 Probab header and the expected value of the sequence number for 0.00001 this record. Our recommended procedure is then as follows: 0 1.54 106 1.55 106 1.56 106 1.57 106 1.58 106 1.59 106 1.60 106 1.61 106 1) First sanity check the ciphertext: check that its length Hardware Cycles Calculated by Attacker in bytes is a multiple of the block-size b and is at least max{b, t +1} (for chained IVs) or b +max{b, t +1} Figure 9. Distribution of timing values (outliers removed) for distinguish- (for explicit IVs). If these conditions are not met, then ing attack on OpenSSL TLS, using our decryption procedure. return fatal error. 2) Decrypt the ciphertext to obtain plaintext P ;now plen will be a multiple of b and at least max{b, t+1}. 3) If t + padlen +1> plen, then the plaintext is not evaluations and not full MAC computations. These give a long enough to contain the padding (as indicated by MAC computation time that is the same irrespective of how the last byte of plaintext) plus a MAC tag. In this case, much padding is removed (and equal to that carried out in run a loop as if there were 256 bytes of padding, with earlier steps). a dummy check in each iteration. Then let P denote the first plen − t bytes of P , compute a MAC on We have implemented the above procedure by modi- SQN||HDR||P and do a constant-time comparison of fying OpenSSL version 1.0.1, the same version used for the computed MAC with the last t bytes of P . Return our attacks. We then ran our distinguishing attack from fatal error. Section III against the modified code. Each packet in the 4) Otherwise (when t + padlen +1 ≤ plen), check attack passes the padding check, but fails MAC verification, the last padlen +1 bytes of P to ensure they are all causing the server to close the TLS session and send an equal (to the last byte of P ), ensuring that the loop encrypted alert message. Figure 9 shows the distribution of does check all the bytes (and does not stop as soon as timing values (in hardware cycles) after implementing our the first mismatch is detected). If this fails, then run procedure. This figure should be compared with Figure 2: a loop as if there were 256 − padlen − 1 bytes of visual inspection alone shows that the timing difference is padding, with a dummy check in each iteration, and substantially reduced. In fact, the separation between the then do a MAC check as in the previous step. Return medians of the two distributions is reduced from about fatal error. 8500 to about 1100 hardware cycles (from around 2.5μs 5) Otherwise (the padding is now correctly formatted) to 0.32μs). In turn, this small separation means that 128 run a loop as if there were 256−padlen−1 bytes of sessions are needed to achieve a distinguishing success padding, doing a dummy check in each iteration. Then probability of 0.68, whereas, prior to our modifications, let P denote the first plen−padlen−1−t bytes of just 1 session was enough to give a success probability of P , and let T denote the next t bytes of P (the remain- 0.756. For the plaintext recovery attack, the adversary will der of P is valid padding). Run the MAC computation have access to timing differences roughly one quarter of on SQN||HDR||P to obtain a MAC tag T . Then set this, i.e. roughly 80ns on our hardware. Notice also that L1 =13+plen−t, L2 =13+plen−padlen−1−t, the two distributions are reversed compared to Figure 2, i.e. L1−55 L2−55 0xFF and perform an additional  64 − 64  MAC processing packets now takes longer, on average, than compression function evaluations (on dummy data). for 0x00 packets. We believe that this is caused by overhead Finally, do a constant-time comparison of T and T . introduced by a SHA1_Update function call that occurs If these are equal, then return P . Otherwise, return for 0xFF packets but not 0x00 packets. fatal error. To achieve further reductions in timing difference would When implementing the above procedure, it would be require a more sophisticated “constant time” programming tempting to omit seemingly unnecessary computations that approach. The OpenSSL patch in versions 1.0.1d, 1.0.0k are performed, for example when t + padlen +1 > plen. and 0.9.8y addressing the attacks in this paper provides an However, these are needed to prevent other timing side- example of how to do this. The complexity of the OpenSSL channels like those reported in [1] for the GnuTLS imple- patch is notable, with around 500 lines of new ‘C’ code mentation of DTLS. Notice also that the dummy computa- being required. For further discussion and explanation, see tions performed in the last step are compression function www.imperialviolet.org/2013/02/04/luckythirteen.html.

539 VIII. DISCUSSION [6] C. M. Chernick, C. Edington III, M. J. Fanto, and R. Rosen- thal. Guidelines for the Selection and Use of Trans- We have demonstrated a variety of attacks against im- port Layer Security (TLS) Implementations. In NIST Spe- plementations of (D)TLS. We reiterate that the attacks are cial Publication 800-52, June 2005, National Institute of ciphertext-only, and so can be carried out by the standard Standards and Technology. Available at http://csrc.nist.gov/ MITM attacker, without a chosen-plaintext capability. The publications/nistpubs/800-52/SP-800-52.pdf , 2005. [7] T. Dierks and C. Allen. The TLS Protocol Version 1.0. RFC attacks that are possible depend crucially on low-level imple- 2246, Internet Engineering Task Force, 1999. mentation details, as well as factors such as the relationship [8] T. Dierks and E. Rescorla. The Transport Layer Security between the MAC tag size t and the block size b. All (TLS) Protocol Version 1.1. RFC 4346, Internet Engineering implementations we examined were vulnerable to one or Task Force, 2006. more attacks. It is an interesting open question as to whether [9] T. Dierks and E. Rescorla. The Transport Layer Security (TLS) Protocol Version 1.2. RFC 5246, Internet Engineering similar timing attacks could be developed against the TLS Task Force, 2008. encryption operation using a variant of the CRIME attack. [10] T. Duong and J. Rizzo. Here come the ⊕ Ninjas. Unpublished For TLS, we need a multi-session attack, with, in some manuscript, 2011. cases, many sessions. This limits the practicality of the [11] D. Eastlake 3rd. Transport Layer Security (TLS) Extensions: attacks, but note that they be further improved using standard Extension Definitions. RFC 6066, 2011. [12] H. Krawczyk. The order of encryption and authentication techniques such as language models and sequential estima- for protecting communications (or: How secure is SSL?). In tion. They can also be enhanced in a BEAST-style attack CRYPTO, pages 310–331, 2001. to enable efficient recovery of HTTP cookies. The timing [13] H. Krawczyk, M. Bellare, and R. Canetti. HMAC: Keyed- differences we must detect are close to or below the levels Hashing for Message Authentication. RFC 2104 (Informa- of jitter one typically finds in real networks. In particular, tional), 1997. [14] U. Maurer and B. Tackmann. On the soundness of our attacker needs to be positioned relatively close (in terms authenticate-then-encrypt: formalizing the malleability of of network hops) to the machine being attacked. Still, the symmetric encryption. In ACM CCS, pages 505–515, 2010. attacks should be considered as a realistic threat to TLS, [15] D. McGrew and D. Bailey. AES-CCM Cipher Suites for and we have described a range of suitable countermeasures. Transport Layer Security (TLS). RFC 6655 (Proposed Stan- The attacks are much more serious for DTLS, because dard), 2012. [16] N. Modadugu and E. Rescorla. The Design and Implemen- of this protocol’s tolerance of errors and because of the tation of Datagram TLS. In NDSS, 2004. availability of timing amplification techniques from [1]. Very [17] B. Moeller. Security of CBC ciphersuites in SSL/TLS: careful implementation of the MEE-TLS-CBC decryption Problems and countermeasures, 2004. http://www.openssl. algorithm is needed to thwart these amplification techniques. org/∼bodo/tls-cbc.txt. In view of this, we highly recommend the use of a suitable [18] K. G. Paterson, T. Ristenpart, and T. Shrimpton. Tag size does matter: Attacks and proofs for the TLS record protocol. authenticated encryption algorithm in preference to CBC- In ASIACRYPT, pages 372–389, 2011. mode for DTLS. [19] A. Pironti, P.-Y. Strub, and K. Bhargavan. Identifying website users by TLS traffic analysis: New attacks and effective ACKNOWLEDGEMENTS countermeasures. Technical Report 8067, INRIA, September 2012. We thank Xuelei Fan, David McGrew, Adam Langley, [20] E. Rescorla and N. Modadugu. Datagram Transport Layer Brad Wetmore and the anonymous reviewers for useful Security. RFC 4347, Internet Engineering Task Force, 2006. feedback. We also thank Eric Rescorla for pointing out [21] E. Rescorla and N. Modadugu. Datagram Transport Layer that our attacks can be enhanced in the web setting using Security Version 1.2. RFC 6347, Internet Engineering Task Force, 2012. BEAST-like techniques. [22] T. Ristenpart, E. Tromer, H. Shacham, and S. Savage. Hey, you, get off of my cloud: exploring information leakage in REFERENCES third-party compute clouds. In E. Al-Shaer, S. Jha, and A. D. Keromytis, editors, ACM Conference on Computer and [1] N. AlFardan and K. G. Paterson. Plaintext-recovery attacks Communications Security, pages 199–212. ACM, 2009. against Datagram TLS. In NDSS, 2012. [23] P. Rogaway. Problems with proposed IP cryptography. [2] N. AlFardan and K. G. Paterson. Lucky thirteen: Breaking Unpublished manuscript, 1995. http://www.cs.ucdavis.edu/ the TLS and DTLS record protocols. Full version of this ∼rogaway/papers/draft-rogaway--comments-00.txt. paper, available from www.isg.rhul.ac.uk/tls, 2013. [24] J. Salowey, A. Choudhury, and D. McGrew. AES Galois [3] G. V. Bard. The vulnerability of SSL to chosen plaintext Counter Mode (GCM) Cipher Suites for TLS. RFC 5288 attack. IACR Cryptology ePrint Archive, 2004:111, 2004. (Proposed Standard), 2008. [4] G. V. Bard. A challenging but feasible blockwise-adaptive [25] S. Vaudenay. Security Flaws Induced by CBC Padding - chosen-plaintext attack on SSL. In SECRYPT, pages 99–109, Applications to SSL, IPSEC, WTLS ... In EUROCRYPT, 2006. pages 534–546, 2002. [5] B. Canvel, A. P. Hiltgen, S. Vaudenay, and M. Vuagnoux. Password Interception in a SSL/TLS Channel. In D. Boneh, editor, CRYPTO, volume 2729 of LNCS, pages 583–599. Springer, 2003. ISBN 3-540-40674-3.

540