<<

Revisiting AES-GCM-SIV: Multi-user Security, Faster Derivation, and Better Bounds

Priyanka Bose1, Viet Tung Hoang2, and Stefano Tessaro1

1 Dept. of Computer Science, University of California Santa Barbara. 2 Dept. of Computer Science, Florida State University, USA.

Abstract. This paper revisits the multi-user (mu) security of symmetric , from the perspective of delivering an analysis of the AES- GCM-SIV AEAD scheme. Our end result shows that its mu security is comparable to that achieved in the single-user setting, in a strong sense. In particular, even when instantiated with short keys (e.g., 128 bits), the security of AES-GCM-SIV is not impacted by the collisions of two user keys, as long as each individual nonce is not re-used by too many users. This is the first example of a scheme shown to have this property inthe mu setting. Our bounds also substantially improve upon existing analyses (Gueron and Lindell, CCS ’17; Iwata and Seurin, ePrint ’17) even in the single-user setting, in particular when messages of variable lengths are encrypted. We also validate security when adopting an optimization of the key-derivation function of AES-GCM-SIV that halves its complexity (over the latest proposal). On the way, we provide a number of results of independent interest. In particular, we advocate analyzing mu security in a setting where the data processed by every user is bounded, and where user keys are gener- ated according to arbitrary, possibly correlated distributions. This model refines works on mu security by providing a more meaningful parame- ter regime, but also allows to unify this with a treatment of re-keying. In this model, we analyze the security of counter-mode encryption and Wegman-Carter MACs. We then lift these analyses to GCM-SIV and AES-GCM-SIV.

Keywords: Multi-user security, AES-GCM-SIV, authenticated encryp- tion, concrete security

1 Introduction

This work continues the study of the multi-user (mu) security of symmetric , the setting where the adversary distributes its resources to attack multiple instances of a , with the end goal of compromising at least one of them. This attack model was recently the object of extensive scrutiny [2, 9, 17, 18, 23, 26, 32], and its relevance stems from the en masse deployment of symmetric cryptographic schemes, e.g., within billions of daily TLS connections. 2 Bose, Hoang and Tessaro

The main goal is to study the degradation in security as the number of users increases.

Our contributions. This paper will extend this line of work in different ways. The most tangible contribution is a complete analysis in the mu setting of the AES-GCM-SIV [14] scheme by Gueron, Langley, and Lindell, a scheme for au- thenticated encryption with associated data (AEAD) which is undergoing eval- uation to become the object of an RFC. Our main result will show that the scheme’s security does not degrade in the mu setting, in a sense much stronger than what was claimed in previous mu analyses. Also, we abstract the require- ment needed for AES-GCM-SIV’s key-derivation step, and show that a very simple KDF is sufficient for high security. Beyond this, our analysis also delivers conceptual and technical insights of wider interest. Concretely, our result will highlight the benefit of ensuring limited nonce re-use across different users (e.g., by choosing nonces randomly). We show that in this setting AES-GCM-SIV does not suffer any impact from key-collisions, in particular allowing security to go beyond the Birthday barrier (wrt the key length) even in the multi-user setting. The resulting analysis is particularly in- volved, and calls for a precise understanding of the power of verification queries (for which nonce re-use cannot be restricted). Previous analyses of AE schemes (specifically, those of [9]) do not ensure security when two users have the same key, thus forcing either an increase of key length or a worse security guarantee. On the way, we advocate a refined model of mu security where the amount of data processed by each user is bounded. This is consistent with the fact that either users would re-key after a certain amount of data has been processed, or, even without explicit re-keying, the lifespan of a key is generically short. The model also naturally lends itself to a modular treatment of nonce-based re- keying in AES-GCM-SIV. This will, in particular, rely on analyses in this regime of (randomized) counter-mode encryption, as well as of the universal-hash based PRF underlying AES-GCM-SIV. We now continue with a more detailed overview of our results.

Multi-user security. The notion of multi-user (mu) security was introduced by Bellare, Boldyreva and Micali [3] for public-key encryption, although in the symmetric setting the notion already appeared implicitly earlier [4]. For example, in the mu definition of encryption security under chosen plain- text attacks, each user 푖 is assigned a (secret) key 퐾푖, and the attacker can make encryption queries Enc(푖, 푀), which result in either an encryption of 푀 under 퐾푖 (in the real world), or an equally long random (in the ideal world). The goal is to distinguish what is the case. Assessing security in this model is interesting and non-trivial. Take for ex- ample randomized counter-mode encryption (CTR), based on a block with key length 푘 and block length 푛, which we model as an ideal cipher. For any single-user adversary encrypting, in total, 퐿 blocks of data and making 푝 퐿2 푝 queries to the ideal cipher, the is bounded by 휖푠푢(퐿, 푝) ≤ 2푛 + 2푘 (a proof was given e.g. in [5]). If the attacker now adaptively distributes its queries 3 across up to 푢 users, a simple hybrid argument shows that there can be at most a multiplicative-factor loss 푢 in the bound, or more precisely, the bound is now 2푢퐿2 푢(푝+퐿) 휖푚푢(퐿, 푝, 푢) ≤ 푢·휖푠푢(퐿, 푝+퐿) ≤ 2푛 + 2푘 . Usually, we do not want to force any particular 푢, and would like the adversary to encrypt its budget of 퐿 blocks adaptively across as many users as it sees fit. And a-priori, we cannot prevent the adversary to pick one of the following worst-case choices: (1) Querying one message only with length 퐿, or (2) Querying 퐿 messages with length 1. Thus, in 2퐿3 퐿푝+퐿2 the worst case, the bound becomes 휖푚푢(퐿, 푝) ≤ 2푛 + 2푘 . A number of recent works [2,17,18,26,32] have shown (mostly, in the context of PRF security) that this is an overly pessimistic point of view: the security loss can be much smaller, and often 휖푚푢(퐿, 푝) ≈ 휖푠푢(퐿, 푝) holds.

A fresh look at mu security. Paradoxically, this ideal-outcome scenario could still be somewhat pessimistic. Indeed, it is well possible that for CTR (or any other scheme) in the mu setting, even if 휖푚푢(퐿, 푝) ≈ 휖푠푢(퐿, 푝), the matching attack is a single-user attack, requiring a single honest user to encrypt 퐿 ≈ 2푛/2 blocks under the same key. For 푘 = 푛 = 128, this boils down to multiple exabytes of data to be encrypted with the same key. This is not likely, regardless of the computational power of the adversary, because users may re-key, or simply because a user could not be brought to encrypt that much. If we assume instead an upper bound 퐵 on the number of blocks encrypted by each user, an 퐿-block adversary for 퐿 > 퐵 would be forced to spread its effort across at least 퐿/퐵 users. As one of our first results, we show that for CTR, the advantage of such an attacker is upper bounded by

퐿퐵 퐿2 푎푝 + + . 2푛 2푛+푘 2푘 for some constant 푎. This bound already shows some themes we will lift to AES-GCM-SIV later such as: (1) Beyond-birthday security is possible, e.g., for 푘 = 푛 = 128 and 퐵 = 232, the bound is of the order 퐿/296 + 푝/2128; (2) The role of local computation – captured here by the number of ideal-cipher queries 푝 – is independent of 퐿, and the number of users, and very similar to the single-user case, and (3) The bound is independent of the number of users. Previous results on mu security target deterministic security games, such as PRFs/PRPs [2, 17, 18, 26, 32] or deterministic AE [9, 23], and security falls apart when more than 2푘/2 users are , and their keys collide. Here, key- collisions are irrelevant, and security well beyond 2푘/2 users is possible. This viewpoint generalizes that of Abdalla and Bellare [1], who were first to observe, in a simpler model, that re-keying after encrypting 퐵 blocks increases security. In a way, what we argue here is that analyses of schemes under re-keying, and under multi-user attacks, are essentially addressing two interpretations of the same technical problem once the right modeling is adopted. In this model, we will also derive a bound for the PRF security of GMAC+, the PRF underlying AES-GCM-SIV. Eventually, we will use these analyses for our final result on AES-GCM-SIV. 4 Bose, Hoang and Tessaro

AES-GCM-SIV: Prior work and new mu bounds. AES-GCM-SIV pushes the re-keying philosophy a bit further, making it nonce based – i.e., to encrypt a message 푀 with a nonce 푁, we first derive nonce-key 퐾푁 from the master key and 푁, using a key-derivation function KD, and then encrypt the message 푀 with nonce 푁 under key 퐾푁 using a base AE scheme AE. The intuition is that here nonces play the roles of users, and a security bound in the mu setting for AE should become a bound on the end scheme in the single-user setting, where now 퐵 is a bound on the amount of data encrypted per nonce, rather than per user. In particular, existing analyses [16, 20] exploit this, and for the single-user security as a nonce-misuse resistant scheme (so-called mrae security) of AES-GCM-SIV with key length 128 bits, they show an advantage bound of order 푄 푄퐵2 ℓ 푄푅 푝 + + max + , 296 2128 2128 2128 for any adversary that makes at most 푝 ideal-cipher queries, encrypts at most 퐵 blocks per nonce, uses at most 푄 < 264 nonces in encryption/verification queries, where 푅 is the maximum number of repetition of a nonce, and ℓmax is the maximal length of a verification query. Here, for suitable instantiations of the key-derivation function (which includes in particular simpler ones than those currently under consideration) we will show that the upper bound for the multi-user mrae security of AES-GCM-SIV is of order 퐿퐵 푑(푝 + 퐿) + , 2128 2128 where 퐿 is an upper bound on the overall number of encrypted/verified blocks, 퐵 is a bound on the number of blocks encrypted per user-nonce pair, and 푑 is a bound on the the number of users across which any nonce is re-used. This shows a number of surprising things: First off, our bound is an improvement even in the single-user case, as 푑 = 1 vacuously holds, and even if we use the 퐿퐵 KDF considered in previous works. The term 2128 can be much smaller than 푄퐵2 2128 , as in many settings 푄 and 퐿 can be quite close (e.g., if most messages are very short). In fact, the point is slightly more subtle, and to our advantage, and we elaborate on it at the end of the introduction. Second, if 푑 is constant (which we can safely assume if nonces are randomly chosen), security does not degrade as the number of users increases. In particular, the security is unaffected by key collisions. If 푑 cannot be bounded, we necessarily need to increase the key to 푑(푝+퐿) 256 bits, and in this case the second term becomes 2256 . Finally, we have no assumption on the data amount of verification queries per user-nonce pair (other than the overall bound 퐿), whereas the bounds in prior works can become weak if there is a very long verification query, and the adversary uses only a single nonce among verification queries.3

3 Since the penalty of verification queries can be very high, [16] has to assume that the 푛 2 푛 term ℓmax푄푅/2 is smaller than 푄퐵 /2 ; this assumption might hold in settings where servers adopt DDOS countermeasures to terminate connections of too many invalid verification queries. 5

This shows that AES-GCM-SIV enjoys great quantitative security in the mu setting, and in fact presents a number of unique features. The rest of the introduction will explain how we get to our result.

Review of AES-GCM-SIV. Let us first however review AES-GCM-SIV [14] a little more in detail. The scheme is based on GCM-SIV+, a slight modification of GCM-SIV, proposed in [15]. This relies in turn on SIV (“synthetic IV”) [31], an AEAD scheme which combines a PRF F and an encryption scheme SE (only meant to be CPA secure) to achieve nonce-misuse resistance. For message 푀, nonce 푁, and associated data 퐴, the encryption of SIV results into a ciphertext 퐶 obtained as

IV ← F(퐾F, (푀, 푁, 퐴)) , 퐶 ← SE.E(퐾E, 푀; IV) , where 퐾F and 퐾E are the two components of the secret key, and SE.E(퐾E, 푀; IV) is the function of SE run with IV IV. In GCM-SIV+, SE is counter mode, and F is what we call GMAC+, a Wegman- Carter MAC [34] similar to, but different from, the one used in GCM[25]. It composes an xor-universal with 푛-bit key, with a with block length 푛 and key length 푘. GMAC+’s total key length is hence 푘 + 푛 bits. (As we target AES, 푛 = 128 and 푘 ∈ {128, 256}.) A difference from the original SIV scheme is that the same block cipher key is used across GMAC+ and counter-mode, but appropriate domain separation is used. Finally, AES-GCM-SIV enhances GCM-SIV+ via nonce-based key derivation. That is, now we have a single key 퐾, and to encrypt with a nonce 푁, we first derive a (푛 + 푘)-bit key 퐾푁 ← KD(퐾, 푁) using a key-derivation function KD, + and then encrypt using GCM-SIV with the key 퐾푁 .

Mu-security and nonce-based key derivation. We will be able to combine our analyses for CTR and GMAC+ into an analysis for GCM-SIV+ in the bounded mu model. This is less obvious than it would initially seem, because due to the key re-use, the technique for generic composition used in the original SIV scheme fails. The leading terms here are of order similar to the above bound for CTR. The question is now whether nonce-based key derivation achieves its purpose in the mu setting, where 퐵 is now a bound on the number of blocks encrypted per nonce-user pair. Indeed, say the master secret key 퐾 has length 푘 = 128. Then, should the number of users exceed 2푘/2 = 264, with high probability two users will end up with identical keys. If we treat KD as a PRF, like [16, 20] do, all security will vanish at this point. Indeed, the existing mu analysis of GCM succumbs to this problem [9], and the problem seem unavoidable here too, since we are considering a deterministic security game.

Bounded nonce re-use. The way out from this problem leverages the setting where nonces are not re-used too often, which is what we are concerned with, and in particular we assume a nonce is re-used by at most 푑 users. Consider the canonical attack to break privacy of the scheme: Fix a sufficiently long message 푀 and nonce 푁, and encrypt them over and over for different users, and if 6 Bose, Hoang and Tessaro the same ciphertext appears twice after roughly 2푘/2 queries, we know we are likely to be in the real world, as this is not likely to happen in the ideal world, where are random. This however requires us to re-use the same nonce roughly 2푘/2 times for different users. A first interesting point we observe isthat KD need not become insecure after 2푘/2 queries as a PRF if the number of re- uses of a nonce is sufficiently small. We will confirm this to be true for alarge class of KDF constructions we consider (see below). Unfortunately, this is not enough. The catch is that the above argument only applies to privacy. We need to guarantee authenticity, too, and even though we are bounding the number of re-uses of a nonce in encryption queries, it is not meaningful to do the same for verification queries. It seems that key collisions are back to haunt us. However, we will show that this is not necessarily the case. To get some intuition, consider the security of KD as a MAC, i.e., the adver- sary issues, in a first stage, queries (푖, 푁), getting KD(퐾푖, 푁), but respecting the constraint that no nonce is used more than 푑 times across different 푖’s where 푑 is relatively small. Then, in a second stage, the adversary gets to ask unrestricted verification queries with input (푖, 푁, 푇 ), except for the obvious requirement that (푖, 푁) must be previously un-queried. The adversary wins if KD(퐾푖, 푁) = 푇 for one of these verification queries. At first glance, a collision 퐾푖 = 퐾푗 could help if we have queried (푖, 푁) in the first stage, learnt 푇 , and now can submit (푗, 푁, 푇 ) in the second. Unfortunately, however, we need to be able to have detected such collisions. This is hard to do during the first stage, even with many queries, due to the constraint of reusing 푁 only 푑 times. Thus, the only obvious way to exploit this would be to try, for each of the 푞 first-stage queries (푖, 푁) with corresponding output 푇 , to query (푗, 푁, 푇 ) for many 푗 ̸= 푖. This would however require roughly 2푘 trials to succeed. Finally, note that while it may be that we ′ ′ ′ ask two verification queries (푖, 푁, 푇 ) and (푗 , 푁 , 푇 ) where 퐾푖 = 퐾푗, this does not seem to give any help in succeeding, because a verification query does not reveal the actual output of KD on that input. Confirming this intuition is not simple. We will however do so for a specific class of natural KD constructions outlined below, and point out that the setting of AE is even harder than studying the security of KD itself as a MAC. Indeed, our KD is used to derive keys for GMAC+ and CTR at the same time, and we need to prove unpredictability of the overall encryption scheme on a new pair (푁, 푖) which was previously unqueried. This is the most technically involved part of the paper.

A simpler KDF. Finally, let us address how we instantiate KD. The construc- tion of KD from [14] is truncation based, and makes 4 (for 푘 = 128), respectively 6 (for 푘 = 256) calls to a block cipher to derive a key. A recent proposal [20] suggests using the so-called XOR construction to achieve higher security, as mul- tiple analyses [7, 11, 22, 28, 30] confirm better bounds than for truncation [12]. Still, the resulting KD would need 4 resp. 6 calls. They also consider a faster construction, based on CENC [19], which would require 3 resp. 4 calls. Rather than following the route of analyzing these concrete constructions, we apply our result to a general class of KDFs which includes in particular 7 all of these proposals, but also simpler ones. For instance, our bounds apply to the following simple KDF, a variant of which was in the initial AES-GCM- SIV proposal, but was discarded due to security concerns.4 Namely, given the underlying block cipher 퐸, the KDF outputs

KD(퐾, 푁) = 퐸(퐾, pad(푁, 0)) ‖ 퐸(퐾, pad(푁, 1)) (1) for 푘 = 푛 and 푁 an nl-bit string, with nl ≤ 푛 − 2, and, analogously, for 푘 = 2푛, one can extend this by additionally concatenating 퐸(퐾, pad(푁, 2)). Here, pad is a mapping with the property that the sets {pad(푁, 0), pad(푁, 1), pad(푁, 2)} defined by each 푁 are disjoint. This approach seems to contradict common sense which was adopted in the new KDF variants for AES-GCM-SIV, because the derived keys are not truly random. The point is that existing proofs (implicitly or explicitly) leverage the multi-user security of 퐸, in a setting with independent random secret keys. However, we observe (validating this in the ideal-cipher model) that mu security bounds are robust to deviation of the key distribution, and in particular additional constraints of sub-keys and part of them being distinct generally only make matters better, rather than worse.5 We note that a crucial point of our analyses is that we do not prove PRF security of these KDFs. Rather, we study the distributions on keys they induce, and then (implicitly) rely on the security of the underlying components using keys obtained from (slightly) non-uniform distributions. In platforms that support AES hardware acceleration, the difference in per- formance between the KDF in Equation (1) and the current one in AES-GCM- SIV is not important, as demonstrated via the experiments in [14]. Still, we believe it is important for schemes to be minimal, and thus to understand the security of simplest possible instantiations of the KDF.

Sub-optimality of POLYVAL. We also observe that the universal hash POLYVAL of AES-GCM-SIV suffers from a suboptimal design issue. That is, if both the message and the associated data are the empty string, then their hash image under POLYVAL is always 0128, regardless of the hash key. This does not create any issue in the single-user setting, and thus has not received any attention so far. However, in the multi-user setting, it substantially weakens + + 퐿퐵 푑(푝+퐿) the security of GCM-SIV and GMAC to 2128 + 2128 , despite their use of 256-bit keys. Had the in POLYVAL been done properly so that the hash image of empty strings under a random key has a uniform distribution, the secu- + + 퐿퐵 퐿푝 rity of GCM-SIV and GMAC could be improved to 2128 + 2256 , meaning this bound is independent of the number 푑 of users that reuse any particular nonce. While this issue does not affect the concrete security bound of AES-GCM-SIV,

4 Thus, our analysis shows that this proposal would have been a good and more efficient choice. 5 A similar key-derivation scheme has been used to derive sub-keys from tweaks in the setting of FPE within the DFF construction [33]. This was then formalized and studied in [6], though we stress that their analysis is quite different from ours, and considers a much less demanding setting. 8 Bose, Hoang and Tessaro this change is recommended especially if GCM-SIV+ or GMAC+ are used as standalone schemes.

Relation to existing works. As mentioned above, two recent papers [16,20] have analyzed AES-GCM-SIV in the su setting, and we want to elaborate further on our improvement in this special case. As we argue above, their bound contains a term of the order 푄퐵2/2푛, which we improve to 퐿퐵/2푛. The fact that the latter is better is not quite obvious. Indeed, it is not hard to improve the term 2 푛 ∑︀푄 2 푛 푄퐵 /2 in [16,20] to 푖=1 퐵푖 /2 , where 퐵푖 is a bound on the number of blocks encrypted with the 푖-th nonce. This seems to address the point that different amounts of data can be encrypted for different nonces. The crucial point is that we capture a far more general class of attacks by only limiting the adversary in terms of 퐿, 푝, and 푑. For instance, for a parameter 퐿, consider the following single-user adversary using 푄 = 퐿/2 nonces. It will select a random subset of the 푄 nonces, of size 퐿/(2퐵), for which it encrypts 퐵 blocks of data, and for the remaining 퐿/2 − 퐿/(2퐵) nonces, it only encrypts one block of data. In our bound, we still get a term 퐿퐵/2푛. In contrast, with the parametrization adopted by [16, 20], we can only set 푄 = 퐿/2 and 퐵푖 = 퐵 for all 푖 ∈ [푄], because any of the nonces can, a priori, be used to encrypt 퐵 blocks. This ends up giving a term of magnitude 퐿퐵2/2푛, however, which is much larger. For 퐵 = 232, the difference between 퐿/264 and 퐿/296 is enormous. Switching to the type of bounds we consider is not just aesthetics: The adver- sary can of course make even more convoluted, adaptive choices in its attack pat- tern. The analysis needs to handle these, and this is non-trivial. This type of ques- tion was the object of several recent works in the mu regime [2,17,18,23,26,32].

Standard vs ideal-model. We also note that the bound of [20] is expressed in the standard model, and contains a term 푄휖, where 휖 is the advantage of a PRF adversary 풜′ against the cipher 퐸, making 퐵 queries. The catch is that 휖 is very sensitive to the time complexity of 풜′, which we approximate with the number of ideal-cipher queries 푝. Thus, 푄휖 is of order 푄(퐵2/2푛 + 푝/2푘). While [20] argues that 푄퐵2/2푛 is the largest term, the ideal model makes it evident that the hidden term 푄푝/2푘 is likely to be far more problematic in the case 푛 = 푘. Indeed, 푝 ≥ 푄 and 퐵2 ≤ 푄 are both plausible (the attacker can more easily invest local computation than obtain honest under equal nonces), 푄2 푘/2 and this becomes 2푘 . This shows security is bounded by 2 . The work of [23] on classical GCM also seemingly focuses on the standard model and thus seems to fail to capture such hidden terms. In contrast, [16] handles this properly. We stress that we share the sentiment that ideal-model analysis may oversim- plify some security issues. However, we find them a necessary evil when trying to capture the influence of local computation in multi-user attacks, which isa fundamental part of the analysis.

Outline of this paper. We introduce basic notions and security definitions in the multi-user setting in Section3. Then, in Section4, we study the security of our basic building blocks, CTR and GMAC+, in the multi-user setting. In 9

Section5, we analyze SIV composition when keys are re-used across encryption and PRF, and observe this to work in particular for the setting of GCM-SIV. Finally, Section6 studies our variant of AES-GCM-SIV with more general key derivation.

2 Preliminaries

Notation. Let 휀 denote the empty string. For a finite set 푆, we let 푥 ←$ 푆 denote the uniform sampling from 푆 and assigning the value to 푥. Let |푥| denote the length of the string 푥, and for 1 ≤ 푖 < 푗 ≤ |푥|, let 푥[푖, 푗] (and also 푥[푖 : 푗]) denote the substring from the 푖th bit to the 푗th bit (inclusive) of 푥. If 퐴 is an , we let 푦 ← 퐴(푥1,... ; 푟) denote running 퐴 with randomness 푟 on inputs 푥1,... and assigning the output to 푦. We let 푦 ←$ 퐴(푥1,...) be the resulting of picking 푟 at random and letting 푦 ← 퐴(푥1,... ; 푟). In the context that we use a blockcipher 퐸 : {0, 1}푘 × {0, 1}푛 → {0, 1}푛, the block length of a {︀ ⌈︀ ⌉︀}︀ string 푥, denoted |푥|푛, is max 1, |푥|/푛 .

Systems and Transcripts. Following the notation from [17] (which was in turn inspired by Maurer’s framework [24]), it is convenient to consider interac- tions of a distinguisher 퐴 with an abstract system S which answers 퐴’s queries. The resulting interaction then generates a transcript 휏 = ((푋1, 푌1),..., (푋푞, 푌푞)) of query-answer pairs. It is well known that S is entirely described by the prob- abilities pS(휏) that if we make queries in 휏 to system S, we will receive the answers as indicated in 휏. We will generally describe systems informally, or more formally in terms a set of oracles they provide, and only use the fact that they define a corresponding probabilities pS(휏) without explicitly giving these probabilities.

The H-coefficient technique. We now describe the H-coefficient technique of Patarin [10, 29]. Generically, it considers a deterministic distinguisher 풜, in- teracting with system S0 or with system S1. Let 풳0 and 풳1 be random variables for the transcripts defined by these interactions with S0 and S1, and a bound on the distinguishing advantage of 풜 is given by the statistical distance SD(풳0, 풳1). Lemma 1. [10, 29] Supposed we can partition transcripts into good and bad p (휏) transcripts. Further, suppose that there exists 휖 ≥ 0 such that 1 − S0 ≤ 휖 for pS1 (휏) every good transcript 휏 such that pS1 (휏) > 0. Then,

SD(풳1, 풳0) ≤ 휖 + Pr[풳1 is bad] .

3 Multi-user Security of Symmetric Primitives

We revisit security definitions for basic symmetric primitives in the multi-user setting. We will in particular extend existing security definitions to impose overall 10 Bose, Hoang and Tessaro bounds on the volume of data processed by each user, however we will relegate this matter to theorem statements restricting the considered adversaries, rather than hard-coding these bounds in the definitions.

3.1 Symmetric and

We define AE syntax here, as well as natural multi-user generalizations ofclas- sical security notions for confidentiality and integrity. Since this paper will both with probabilistic and deterministic schemes, we define both, following the treatment of Namprempre, Rogaway, and Shrimpton [27]. Our notational con- ventions are similar to those from [9].

IV-based encryption. An IV-based symmetric encryption scheme SE consists of two , the randomized encryption algorithm SE.E and the deter- ministic decryption algorithm SE.D, and is associated with a corresponding key length SE.kl ∈ N and initialization-vector (IV) length SE.vl ∈ N. Here, SE.E takes as input a secret key 퐾 ∈ {0, 1}SE.kl and a 푀 ∈ {0, 1}*. It then SE.vl ′ samples IV ←$ {0, 1} , deterministically computes a ciphertext core 퐶 from ′ 퐾, 푀 and IV, and returns 퐶 ← IV ‖ 퐶 . We often write 퐶 ←$ SE.E퐾 (푀) or 퐶 ←$ SE.E(퐾, 푀). If we want to force the encryption scheme to run on a spe- cific IV, then we write SE.E(퐾, 푀; IV). The corresponding decryption algorithm SE.D takes as input a key 퐾 ∈ {0, 1}SE.kl and a ciphertext 퐶 ∈ {0, 1}*, returns either a plaintext 푀 ∈ {0, 1}*, or an error symbol ⊥. For correctness, we require that if 퐶 is output by SE.E퐾 (푀), then SE.D퐾 (퐶) returns 푀. We allow all algorithms to make queries to an ideal primitive 훱, in which case this will be made explicit when not clear from the context, e.g., by writing SE[훱] in lieu of SE.

Chosen-plaintext security for IV-based encryption. We re-define the traditional security notion of ind-security for the multi-user setting. Our defini- tion will however incorporate a general, stateful key-generation algorithm KeyGen which is invoked every time a new user is spawned via a call to the New or- acle. KeyGen is a parameter of the game, and it takes additionally some input string aux which is supplied by the adversary. The traditional mu security setting would have KeyGen simply output a random string, and ignore aux, but we will consider a more general setting to lift mu bounds to the key-derivation setting. The game is further generalized to handle an arbitrary ideal primitive (an ideal cipher, a , or a combination thereof) via an oracle Prim.6 Also note that the oracle Prim can simply trivially provide no functionality, in which case we revert to the standard-model definition. We note that the key-generation algorithm KeyGen does not have access to the oracle Prim.

6 If Prim is meant to handle multiple primitives, we assume they can be accessed through the same interface by pre-pending to the query a prefix indicating which primitive is meant to be queried. 11

mu-ind Game GSE,KeyGen,훱 (풜) Enc(푖, 푀)

st0 ← 휀; 푣 ← 0; 푏 ←$ {0, 1} If 푖∈ / {1, . . . , 푣} then return ⊥ ′ New,Enc,Prim Prim 푏 ←$ 풜 퐶1 ←$ SE.E (퐾푖, 푀) ′ |퐶1| Return (푏 = 푏) 퐶0 ←$ {0, 1} Return 퐶푏 New(aux) 푣 ← 푣 + 1 (퐾푣, st푣) ←$ KeyGen(st푣−1, aux)

mu-mrae Game GAE,KeyGen,훱 (풜) New(aux)

st0 ← 휀; 푣 ← 0; 푏 ←$ {0, 1} 푣 ← 푣 + 1 ′ New,Enc,Vf,Prim 푏 ←$ 풜 (퐾푣, st푣) ←$ KeyGen(st푣−1, aux) Return (푏′ = 푏)

Vf(푖, 푁, 퐶, 퐴) Enc(푖, 푁, 푀, 퐴) If 푖∈ / {1, . . . , 푣} then return ⊥ If 푖∈ / {1, . . . , 푣} then return ⊥ If (푖, 푁, 퐶, 퐴) ∈ 푉 [푖] then return true If (푖, 푁, 푀, 퐴) ∈ 푈[푖] then return ⊥ Prim If 푏 = 0 then return false 퐶1 ← AE.E (퐾푖, 푁, 푀, 퐴) |퐶 | Prim $ 1 푀 ← AE.D (퐾푖, 푁, 퐶, 퐴) 퐶0 ← {0, 1} Return (푀 ̸= ⊥) 푈[푖] ← 푈[푖] ∪ {(푖, 푁, 푀, 퐴)} 푉 [푖] ← 푉 [푖] ∪ {(푖, 푁, 퐶푏, 퐴)} Return 퐶푏

Fig. 1: Security definitions for chosen-plaintext security of IV-based encryp- tion (top), as well as nonce-misuse resistance for authenticated encryption (bottom). We assume (without making this explicit) that Prim implements the ideal- primitive 훱.

mu-ind Given an adversary 풜, the resulting game is GSE,KeyGen,훱 (풜), and is depicted on the left of Figure1. The associated advantage is

mu-ind [︀ mu-ind ]︀ AdvSE,KeyGen,훱 (풜) = 2 · Pr GSE,KeyGen,훱 (풜) − 1 . Whenever we use the canonical KeyGen which outputs a random string regardless mu-ind of its input, we will often omit it, and just write AdvSE,훱 (풜) instead.

Authenticated encryption scheme. An authenticated encryption scheme AE with associated data (also referred to as an AEAD scheme), the algorithms AE.E and AE.D are both deterministic. In particular, AE.E takes as input a secret key 퐾 ∈ {0, 1}AE.kl, a nonce 푁 ∈ {0, 1}AE.nl, a plaintext 푀 ∈ {0, 1}*, and the associated data 퐴, and returns the ciphertext 퐶 ← AE.E(퐾, 푁, 푀, 퐴). The corresponding decryption algorithm AE.D takes as input a key 퐾 ∈ {0, 1}AE.kl, the nonce 푁, the ciphertext 퐶 ∈ {0, 1}*, and the associated data 퐴, and returns either a plaintext 푀 ∈ {0, 1}*, or an error symbol ⊥. We require that if 퐶 is output by AE.E퐾 (푀, 푁, 퐴), then AE.D퐾 (퐶, 푁, 퐴) returns 푀. Our security notion for AE is nonce-misuse-resistant: Ciphertexts produced by encryptions with the same nonce are pseudorandom as long as the encryptions 12 Bose, Hoang and Tessaro are on different messages or associated data, even if they are for the same nonce. mu-mrae Our formalization of AE multi-user security in terms of GAE,KeyGen,훱 (풜) is that of Bellare and Tackmann [9], with the addition of a KeyGen algorithm to handle arbitrary correlated key distributions. It is depicted in Figure1, at the bottom. Given an adversary 풜 and a key-generation algorithm KeyGen, we are then going to define

mu-mrae [︀ mu-mrae ]︀ AdvAE,KeyGen,훱 (풜) = 2 · Pr GAE,KeyGen,훱 (풜) − 1 .

As above, KeyGen is omitted if it is the canonical one. We say that an adversary is 푑-repeating if among the encryption queries, an adversary only uses each nonce for at most 푑 users. We stress that we make no assumption on how the adversary picks nonces for the verification queries, and for each individual user, the adversary can repeat nonces in encryption queries as often as it wishes. If nonces are chosen arbitrarily then 푑 can be as big as the number of encryption queries. If nonces are picked at random then 푑 is a small constant.

A key-. We now show that for any AE scheme AE that uses the canonical KeyGen, if an adversary can choose nonces arbitrarily then there is an attack, using 푞 encryption queries and no verification query, that achieves advantage 푞(푞 − 1)/2AE.kl+3. Suppose that under AE, a ciphertext is always at least as long as the cor- responding plaintext. Fix an arbitrary message 푀 such that |푀| ≥ AE.kl + 2. Fix a nonce 푁 and associated data 퐴. The adversary 풜 attacks 푞 users, and for each user 푖, it queries Enc(푖, 푁, 푀, 퐴) to get answer 퐶푖. If there are distinct 푖 and 푗 such that 퐶푖 = 퐶푗 then it outputs 1, hoping that users 푖 and 푗 have the same key. For analysis, we need the following well-known result; see, for example, [13, Chapter 5.8] for a proof.

Lemma 2 (Lower√ bound for ). Let 푞, 푁 ≥ 1 be integers such that 푞 ≤ 2푁. Suppose that we throw 푞 balls at random into 푁 bins. Then 푞(푞−1) the chance that there is a bin of at least two balls is at least 4푁 . From Lemma2 above, in the real world, the adversary will output 1 if two users have the same key, which happens with probability at least 푞(푞 − 1)/2AE.kl+2. In contrast, since the ciphertexts are at least |푀|-bit long, in the ideal world, it outputs 1 with probability at most 푞(푞 − 1)/2|푀|+1 ≤ 푞(푞 − 1)/2AE.kl+3. Hence

푞(푞 − 1) 푞(푞 − 1) 푞(푞 − 1) Advmu-mrae(풜) ≥ − = . AE,훱 2AE.kl+2 2AE.kl+3 2AE.kl+3

3.2 Multi-user PRF Security We consider keyed functions F : {0, 1}F.kl ×{0, 1}F.il → {0, 1}F.ol, possibly making queries to an ideal primitive 훱. Here, note that we allow F.il = *, indicating a 13

mu-prf Game GF,KeyGen,훱 (풜) New(aux) Eval(푖, 푀)

푣 ← 0; st0 ← ∅ 푣 ← 푣 + 1 If 푖∈ / {1, . . . , 푣} return ⊥ Prim 푏 ←$ {0, 1} (퐾푣, st푣) ←$ KeyGen(st푣−1, aux) 푌1 ← F (퐾푖, 푀) ′ New,Eval,Prim 푏 ←$ 풜 휌푣 ←$ Func(F.il, F.ol) 푌0 ←$ 휌푖(푀) F.kl Return (푏′ = 푏) 퐾푣 ←$ {0, 1} Return 푌푏

Fig. 2: Definitions of multi-user PRF security. Again, Prim implements the ideal prim- itive 훱. variable-input-length function. We define a variant of the standard multi-user version of PRF security from [4] using (as in the previous section) a general algorithm KeyGen to sample possibly correlated keys. Concretely, let Func(il, ol) be the set of all functions {0, 1}il → {0, 1}ol, where, once again, il = * is allowed. We give the multi-user PRF security game in Figure2, on the left. There, F’s access to 훱 is modeled by having oracle access to Prim, here. For any adversary 풜, and key generation algorithm KeyGen, we define mu-prf [︁ mu-prf ]︁ AdvF,KeyGen,훱 (풜) = 2 · Pr GF,KeyGen,훱 (풜) − 1 . As usual, we will omit KeyGen when it is the canonical key generator outputting independent random keys.

3.3 Decomposing AE Security While the notion mu-mrae is very strong, it might be difficult to prove that an AE scheme, say AES-GCM-SIV meets this notion, if one aims for beyond- birthday bounds. We therefore decompose this notion into separate privacy and authenticity notions, as defined below.

mu-priv Privacy. Consider the game GAE,KeyGen,훱 (풜) in Fig.3 that defines the (misuse- resistant) privacy of an AE scheme AE, with respect to a key-generation algo- rithm KeyGen, and an ideal primitive 훱. Define mu-priv mu-priv AdvAE,KeyGen,훱 (풜) = 2 Pr[GAE,KeyGen,훱 (풜)] − 1 . Under this notion, the adversary is given access to an encryption oracle that either implements the true encryption or returns a random string of appropri- ate length, but there is no decryption oracle. If the adversary repeats a prior encryption query then this query will be ignored.

mu-auth Authenticity. Consider the game GAE,KeyGen,훱 (풜) in Fig.3 that defines the (misuse-resistant) authenticity of an AE scheme AE, with respect to a key- generation algorithm KeyGen, and an ideal primitive 훱. Define

mu-auth mu-auth AdvAE,KeyGen,훱 (풜) = 2 Pr[GAE,KeyGen,훱 (풜)] − 1 . Under this notion, initially a bit 푏 is set to 0 and the adversary is given an encryption oracle that always implements the true encryption, and a verification 14 Bose, Hoang and Tessaro

mu-priv mu-auth Game GAE,KeyGen,훱 (풜) Game GAE,KeyGen,훱 (풜)

푣 ← 0; st0 ← 휀; 푏 ←$ {0, 1} 푣 ← 0; st0 ← 휀; 푏 ← 0 ′ New,Enc,Prim New,Enc,Vf,Prim 푏 ←$ 풜 풜 Return (푏′ = 푏) Return (푏 = 1)

New(aux) New(aux) 푣 ← 푣 + 1 푣 ← 푣 + 1 $ (퐾푣, st푣) ←$ KeyGen(st푣−1, aux) (퐾푣, st푣) ← KeyGen(st푣−1, aux)

Enc(푖, 푁, 푀, 퐴) Enc(푖, 푁, 푀, 퐴) If 푖∈ / {1, . . . , 푣} then return ⊥ If 푖∈ / {1, . . . , 푣} then return ⊥ Prim If (푖, 푁, 푀, 퐴) ∈ 푈[푖] then return ⊥ 퐶 ← AE.E (퐾푖, 푁, 푀, 퐴) Prim 퐶1 ← AE.E (퐾푖, 푁, 푀, 퐴) 푉 [푖] ← 푉 [푖] ∪ {(푖, 푁, 퐶, 퐴)} |퐶1| 퐶0 ←$ {0, 1} Return 퐶 푈[푖] ← 푈[푖] ∪ {(푖, 푁, 푀, 퐴)} Vf(푖, 푁, 퐶, 퐴) Return 퐶푏 If 푖 ̸∈ {1, . . . , 푣} then return ⊥ If (푖, 푁, 퐶, 퐴) ̸∈ 푉 [푖] then Prim 푀 ← AE.D (퐾푖, 푁, 퐶, 퐴) If (푀 ̸= ⊥) then 푏 ← 1

Fig. 3: Games to define privacy(left), and authenticity (right) of anAE scheme AE with respect to a key-generation algorithm KeyGen : 풦 × 풩 → {0, 1}AE.kl. The oracle Prim implements the ideal primitive 훱. In the authenticity notion, queries to Vf must be performed after all queries to Enc. oracle. We require that verification queries be made after all evaluation queries. On a verification (푖, 푁, 퐶, 퐴), if there is a prior encryption query (푖, 푁, 푀, 퐴) for an answer 퐶, then the oracle ignores this query. Otherwise, the oracle sets 푏 ← 1 Prim if AE.D (퐾푖, 푁, 퐶, 퐴) returns a non-⊥ answer. The goal of the adversary is to set 푏 = 1.

Relations. Note that in the mrae notion, the adversary can perform encryption and verification queries in an arbitrary order. In contrast, in the authenticity notion, the adversary can only call the verification oracle after it finishes querying the encryption oracle. Still, in Proposition1 below, we show that authenticity and privacy tightly implies mrae security. See AppendixC.

Proposition 1. Let AE be an AE scheme associated with a key-generation al- gorithm KeyGen and an ideal primitive 훱. Suppose that a ciphertext in AE is always at least 푛-bit longer than the corresponding plaintext. For any adversary 풜0 that makes 푞푣 verification queries, we can construct adversaries 풜1 and 풜2 such that

2푞 Advmu-mrae (풜 ) ≤ Advmu-priv (풜 ) + Advmu-auth (풜 ) + 푣 . AE,KeyGen,훱 0 AE,KeyGen,훱 1 AE,KeyGen,훱 2 2푛 15

Any query of 풜1 or 풜2 is produced directly from 풜0. If 풜0 is 푑-repeating then so are 풜1 and 풜2.

4 Multi-User Security of Basic Symmetric Schemes

4.1 Security of Counter-Mode Encryption We study the mu-security of counter mode encryption, or CTR for short. While this is interesting on its own right (we are not aware of any analysis achieving a comparable bound in the literature), we will also use Theorem1 below to obtain security results for AES-GCM-SIV. For this reason, we introduce some extra notions to handle the degree of generality needed for our proof. Also, our result is general enough to suggest an efficient solution to the re-keying problem first studied by Abdalla and Bellare [1].

General IVs. We will consider a general IV-increasing procedure add, which is associated with some maximal message length of 퐿max blocks, and a block length 푛. In particular, add takes an 푛-bit string IV and an offset 푖 ∈ {0, . . . , 퐿max − 1} as inputs, and is such that add(IV, 푖) returns an 푛-bit string, and for all IV, the strings add(IV, 0),..., add(IV, 퐿max − 1) are distinct. We also say that add has min-entropy ℎ if for a random 푛-bit IV, and every 푖 ∈ Z퐿max , add(IV, 푖) takes any value with probability at most 2−ℎ, i.e., its min-entropy is at least ℎ. For example, the canonical IV addition is such that add(IV, 푖) = IV + 푖 푛 푛 (mod 2 ), where we identify 푛-bit strings with integers in Z2푛 . Here, 퐿max = 2 . 32 In contrast, the AES-GCM-SIV will use CTR with 퐿max = 2 , 푛 = 128, and add(IV, 푖) = 1 ‖ IV[2, 96] ‖ (IV[97, 128] + 푖 (mod 232)). Clearly, here, the min- entropy is 127 bits, due to the first bit being set to one.

CTR encryption. Let 퐸 : {0, 1}푘 × {0, 1}푛 → {0, 1}푛 be a block cipher, i.e., 퐸(퐾, ·) is a permutation for all 푘-bit 퐾. We denote 퐸(퐾, ·) = 퐸퐾 (·), and −1 퐸퐾 is the inverse of 퐸퐾 . Further, let add be a general IV-increasing procedure with maximal block length 퐿max. We define the IV-based encryption scheme CTR = CTR[퐸, add] with CTR.kl = 푘, and where encryption operates as follows (where we use ←푛 to denote some function which pads a message 푀 into 푛-bit blocks). CTR.E(퐾, 푀): 푛 푛 퐶[0] ← IV ←$ {0, 1} , 푀[1], . . . , 푀[ℓ] ← 푀 If ℓ > 퐿max then return ⊥ For 푖 = 1 to ℓ do 퐶[푖] ← 퐸퐾 (add(IV, 푖 − 1)) ⊕ 푀[푖] Return 퐶[0] ‖ 퐶[1] ‖···‖ 퐶[ℓ]

Decryption CTR.D re-computes the masks 퐸퐾 (add(IV, 푖 − 1)) using 퐶[0] = IV, and then retrieves the message blocks by xoring the masks to the ciphertext. Here, we assume without loss of generality messages are padded (e.g., PKCS#7), so that they are split uniquely into full-length 푛-bit blocks. Our result extends 16 Bose, Hoang and Tessaro easily to the more common padding-free variant where the last block is allowed to be shorter than 푛 bits, and the output of 퐸퐾 (add(IV, ℓ − 1)) is truncated ac- cordingly, since an adversary can simulate the padding-free version by removing the appropriate number of bits from the received ciphertexts.

Security of CTR. We establish the (CPA) security of randomized CTR in the ideal-cipher model for an arbitrary key-generation algorithm KeyGen which produces keys that collide with small probability. In particular, we say that KeyGen is 훼-smooth if for a sequence of keys (퐾1, . . . , 퐾푢) output by an arbitrary 푘 interaction with New, we have Pr[퐾푖 = 퐾] ≤ 훼 for all 푖 and 퐾 ∈ {0, 1} , and −푘 Pr[퐾푖 = 퐾푗] ≤ 훼 for all 푖 ̸= 푗. The canonical KeyGen is 훼-smooth for 훼 = 2 . See AppendixD for the proof.

Theorem 1. Let 퐸 be modeled as an ideal cipher, add have min-entropy ℎ, and KeyGen be 훼-smooth. Further, let 퐿, 퐵 ≥ 1 such that 퐿 ≤ 2(1−휖)ℎ−1, for some 휖 ∈ (0, 1], and let 풜 be an adversary that queries Enc for at most 퐿 푛-bit blocks, and at most 퐵 blocks for each user, and makes 푝 Prim queries. Then,

(︂ 1 1 )︂ Advmu-ind (풜) ≤ 2−푛/2 + (︀퐿퐵 + 퐿2훼)︀ · + + 푎푝훼 , CTR[퐸,add],KeyGen,퐸 2푛 2ℎ

⌈︀ 1.5푛 ⌉︀ where 푎 := 휖ℎ − 1. The bound highlights the benefits when each user only encrypts 퐵 blocks. In particular, assume ℎ = 푛, 훼 = 1/2푘. If 퐵 = 2푏, then the number 퐿 of blocks encrypted overall by the scheme can be as high as 2푛−푏. (The second term has 퐿2 in the numerator, but the denominator is much larger, i.e., 2푛+푘.) Another interesting feature is that the contribution of Prim queries to the bound is independent of the number of users and 퐿.

More on the bound. Previous works [16,20] implicitly give mu security bounds for CTR, but adopt a different model, where the adversary is a-priori constrained in (1) the number of queries 푞, (2) a bound 퐵푖 on the number of blocks encrypted ∑︀푢 2 푛 per user 푖 ∈ [푢]. The resulting bounds contain a leading term 푖=1 퐵푖 /2 , assuming no primitive queries are made (adding primitive queries 푝 only degrades the bound). This is essentially what one can obtain by applying a naïve hybrid argument to the single-user analysis. We discussed the disadvantage of such a bound in the introduction already.

Re-keying, revisited. Also, in contrast to previous works, the above result holds for an arbitrary KeyGen, and only requires very weak randomness from it. This suggests a new and efficient solutions for the re-keying problem of[1]. Let 퐻 : {0, 1}푘 × {0, 1}* → {0, 1}푘 be a hash function, and let KeyGen, on input aux ∈ {0, 1}*, simply output 퐻(퐾, aux) for some master secret key 퐾, and this KeyGen is 훼-smooth if 퐻 is for example POLYVAL from AES-GCM-SIV, where 훼 = ℓ/2푘, and ℓ is an upper bound on the length of aux. We can assume ℓ to be fixed to something short, even 1. Indeed, aux could be a counter, or some other 17

−푛/2 2퐿퐵 2퐿2 short string. The resulting bound (when ℎ = 푛) would be 2 + 2푛 + 2푛+푘 + 푎푝/2푘. Note that this solution heavily exploits the ideal-cipher model – clearly, we are indirectly assuming some form of related-key security on 퐸 implicitly, and one should carefully assess the security of 퐸 in this setting. The results in the model of Abdalla and Bellare [1] are weaker in that they only study more involved key-derivation methods (but with the benefit of a standard-model security reduction), in a more constrained model, where the adversary sequentially queries 퐵 blocks on a key, before moving to the next key. Our model, however, is adaptive, as the adversary can distribute queries as it pleases across users. But difference is not only qualitative, as quantitative bounds in [1] are obtained via naïve hybrid arguments.

4.2 Security of GMAC+

This section deals with an abstraction of GMAC+, the PRF used within the AES- GCM-SIV mode of operation. We show good mu bounds for this construction. The ideas extend similarly to various Wegman-Carter type MACs [34], but we focus here on GMAC+.

The GMAC+ construction. The construction relies on a hash function 퐻 : {0, 1}푛 × {0, 1}* × {0, 1}* → {0, 1}푛, which is meant to satisfy the following properties. (We employ the shorthand 퐻퐾 (푀, 퐴) = 퐻(퐾, 푀, 퐴).)

Definition 1. Let 퐻 : {0, 1}푛 × {0, 1}* × {0, 1}* → {0, 1}푛. We say that 퐻 is 푐-almost XOR universal if for all (푀, 퐴) ̸= (푀 ′, 퐴′), and all 훥 ∈ {0, 1}푛, and 푛 퐾 ←$ {0, 1} ,

푐 · max{|푀| + |퐴| , |푀 ′| + |퐴′| } Pr[퐻 (푀, 퐴) ⊕ 퐻 (푀 ′, 퐴′) = 훥] ≤ 푛 푛 푛 푛 , 퐾 퐾 2푛 where |푋|푛 = max{1, ⌈|푋| /푛⌉} is the block length of string 푋, as defined in Section2. Further, we say it is 푐-regular if for all 푌 ∈ {0, 1}푛, 푀, 퐴 ∈ {0, 1}*, 푛 and 퐾 ←$ {0, 1} ,

푐 · (|푀| + |퐴| ) Pr[퐻 (푀, 퐴) = 푌 ] ≤ 푛 푛 . 퐾 2푛 We say it is weakly 푐-regular if this is only true for (푀, 퐴) ̸= (휀, 휀), and 푛 퐻퐾 (휀, 휀) = 0 for all 퐾.

Remark 1. Note that for POLYVAL as used in AES-GCM-SIV, we can set 푐 = 1.5 provided that we exclude the empty string as input. This is because the empty string results in POLYVAL outputting 0푛 regardless of the key, and thus POLYVAL is only weakly 푐-regular. It is easy to fix POLYVAL so that this does not happen (as the input is padded with its length, it is sufficient to ensure that the length padding of the empty string contains at least one bit with value 1). See AppendixB for more details. 18 Bose, Hoang and Tessaro

We also consider a generic function xor : {0, 1}푛 × {0, 1}nl → {0, 1}푛, for nl < 푛, which is meant to add a nonce to a string. In particular, we require: (1) 휆-regularity: For every 푁 ∈ {0, 1}nl and 푍 ∈ {0, 1}푛, there are at most 휆 strings 푌 ∈ {0, 1}푛 such that xor(푌, 푁) = 푍, (2) injectivity: For every 푌 , xor(푌, ·) is injective, and (3) linearity: For every 푌, 푌 ′, 푁, 푁 ′, we have xor(푌, 푁)⊕ xor(푌 ′, 푁 ′) = xor(푌 ⊕ 푌 ′, 푁 ⊕ 푁 ′).

Example 1. In GCM-SIV and AES-GCM-SIV, one uses

xor(푌, 푁) = 0 ‖ (푌 ⊕ 0푛−nl푁)[2 : 푛] .

This is clearly 2-regular, injective, and linear. Note that here it is important to prepend 0’s to the nonce 푁; if one instead appends 0’s to 푁 then injectivity of xor will be destroyed.

Given 퐻 and xor, as well as a block cipher 퐸 : {0, 1}푘 ×{0, 1}푛 → {0, 1}푛, we define GMAC+ = GMAC+[퐻, 퐸, xor]: {0, 1}푘+푛 × ({0, 1}* × {0, 1}* × {0, 1}nl) → {0, 1}푛 such that

+ GMAC (퐾in ‖ 퐾out, (푀, 퐴, 푁)) = 퐸퐾out (xor(퐻퐾in (푀, 퐴), 푁)) . (2)

Mu-prf security of GMAC+. We upper bound the mu prf advantage for GMAC+; see AppendixE for the proof. We stress here that the adversary’s Eval queries have form (푖, 푀, 퐴, 푁), and the length of such queries is implicitly defined as |푀|푛 + |퐴|푛. We also consider an arbitrary KeyGen algorithm, which outputs pairs of keys 푖 푖 푛 푘 (퐾in, 퐾out) ∈ {0, 1} × {0, 1} . We will only require these keys to be pairwise- close to uniform, i.e., we say that KeyGen is 훽-pairwise almost uniform (AU) 푖 푖 푗 푗 if for every 푖 ̸= 푗, the distribution of (퐾in, 퐾out), (퐾in, 퐾out) is such that very 1 pair of (푛+푘)-bit strings appears with probability at most 훽 22(푛+푘) . Clearly, the canonical KeyGen satisfies this with 훽 = 1, but we will be for instance interested later on in cases where 훽 = 1 + 휖 for some small constant 휖 > 0.

Theorem 2 (Security of GMAC+). Let 퐻 : {0, 1}푛 × {0, 1}* × {0, 1}* → {0, 1}푛 be 푐-almost xor universal and 푐-regular, KeyGen be 훽-pairwise AU, xor be injective, linear, and 휆-regular, and let 퐸 : {0, 1}푘 × {0, 1}푛 → {0, 1}푛 be a block cipher, which we model as an ideal cipher. Then, for any adversary 풜 making 푞 Eval queries of at most 퐿 푛-bit blocks (with at most 퐵 blocks queries per user), as well as 푝 ideal-cipher queries,

(1 + 퐶)푞퐵 퐶퐿(푝 + 푞) + 훽푞2 Advmu-prf (풜) ≤ + , (3) GMAC+[퐻,퐸,xor],퐵,퐸 2푛 2푛+푘 where 퐶 := 푐 · 휆 · 훽.

Here, parameters are even better than in the case of counter-mode, but this is in part due to the longer key. In particular, this being PRF security, it is unavoidable that security is compromised when more than 2(푘+푛)/2 users are 19 involved. The interesting fact is that partial key collisions (i.e., a collision in the hash keys or in the cipher keys) alone do not help. For example, take 푘 = 푛 = 128, 퐶 = 훽 = 1, 퐵 = 232, 퐿 = 푞퐵, 푞 ≤ 295, then the bound becomes roughly 푞/295 +푝/2128, and note that this is when processing up to 2128 blocks of data.

Weak regularity. We also provide a version of Theorem2 for the case where 퐻 is only weakly 푐-regular. We stress that the security loss is substantial here (and thus if using GMAC+ alone, one should rather make sure 퐻 is 푐-regular), but nonetheless the security is preserved in the case where a nonce 푁 is reused across a sufficiently small number 푑 of users. A proof sketch is in Appendix E.1.

Theorem 3 (Security of GMAC+, weak regularity). Let 퐻 : {0, 1}푛 × {0, 1}*×{0, 1}* → {0, 1}푛 be 푐-almost xor universal and weakly 푐-regular, KeyGen be 훽-pairwise AU, xor be injective, linear, and 휆-regular, and let 퐸 : {0, 1}푘 × {0, 1}푛 → {0, 1}푛 be a block cipher, which we model as an ideal cipher. Then, for any adversary 풜 making 푞 Eval queries of at most 퐿 푛-bit blocks (with at most 퐵 blocks queries per user), as well as 푝 ideal-cipher queries,

(1 + 퐶)푞퐵 퐶퐿(푝 + 2푞) + 훽푞2 푑(푝 + 푞) Advmu-prf (풜) ≤ + + , (4) GMAC+[퐻,퐸,xor],퐵,퐸 2푛 2푛+푘 2푘 where 퐶 := 푐 · 휆 · 훽, and 푑 is a bound on the number of users re-using any given nonce.

5 SIV Composition with Key Reuse

SIV with key reuse. Let 퐸 : {0, 1}푘 × {0, 1}푛 → {0, 1}푛 be a blockcipher that we will model as an ideal cipher. Let F : {0, 1}F.kl ×풩 ×{0, 1}* ×{0, 1}* → {0, 1}* be a keyed function, with F.kl ≥ 푘. Let SE : {0, 1}푘 × {0, 1}* → {0, 1}* be an IV-based encryption scheme of IV length 푛. Both F and SE are built on top of 퐸. In a generic SIV composition, the key 퐾in ‖ 퐾out of F and the key 퐽 of SE will be chosen independently. However, for efficiency, it would be convenient if one can reuse 퐾out = 퐽, which GCM-SIV does. Formally, let AE = SIV[F, SE] be the AE scheme as defined in Fig.4.

Results. We consider security of the SIV construction for F = GMAC+ and SE = CTR. We assume that GMAC+ and CTR use functions xor and add, respectively, such that (1) xor is 2-regular, injective, and linear, and xor(푋, 푁) ∈ 0{0, 1}푛−1 for every string 푋 ∈ {0, 1}푛 and every nonce 푁 ∈ {0, 1}nl, and (2) add has min- entropy 푛−1, and add(IV, ℓ) ∈ 1{0, 1}푛−1 for every IV ∈ {0, 1}푛 and every ℓ ∈ N. (Those notions for add and xor can be found in Section 4.1 and Section 4.2 respec- tively.) This assumption holds for the design choice of AES-GCM-SIV. We thus only write CTR[퐸] or GMAC+[퐻, 퐸] instead of CTR[퐸, add] or GMAC+[퐻, 퐸, xor]. Below, we show the mu-mrae security of SIV[GMAC+[퐻, 퐸], CTR[퐸]], with re- spect to a pairwise AU KeyGen, and a 푐-regular, 푐-AXU hash function 퐻; the 20 Bose, Hoang and Tessaro

AE.E(퐾in ‖ 퐾out, 푁, 푀, 퐴) AE.D(퐾in ‖ 퐾out, 푁, 퐶, 퐴) ′ 퐸 IV ← F(퐾in ‖ 퐾out, 푁, 푀, 퐴) IV ‖ 퐶 ← 퐶; 푀 ← SE.D (퐾out, 퐶) 퐸 퐸 퐶 ← SE.E (퐾out, 푀; IV) 푇 ← F (퐾in ‖ 퐾out, 푁, 푀, 퐴) Return 퐶 If 푇 ̸= IV then return ⊥ else return 푀

Fig. 4: The SIV construction (with key reuse) AE = SIV[F, SE] that is built on top of an ideal cipher 퐸.

notion of pairwise AU for key-generation algorithms can be found in Section 4.2. See AppendixF for the proof.

Theorem 4 (Security of SIV). Let 퐸 : {0, 1}푘 × {0, 1}푛 → {0, 1}푛 be a blockcipher that we will model as an ideal cipher. Fix 0 < 휖 < 1. Let 퐻 : {0, 1}푛 × {0, 1}* × {0, 1}* → {0, 1}* be a 푐-regular, 푐-AXU hash. Let AE ← SIV[GMAC+[퐻, 퐸], CTR[퐸]]. Then for any 훽-pairwise AU KeyGen and for any adversary 풜 that makes at most 푞 encryption/verification queries whose total block length is at most 퐿 ≤ 2(1−휖)푛−4, and encryption queries of at most 퐵 blocks per user, and 푝 ≤ 2(1−휖)푛−4 ideal-cipher queries,

1 훽푎푝 (3훽푐 + 7훽)퐿2 + 4훽푐퐿푝 Advmu-mrae (풜) ≤ + + AE,KeyGen,퐸 2푛/2 2푘 2푛+푘 (4푐훽 + 0.5훽 + 6.5)퐿퐵 + , 2푛 where 푎 = ⌈1.5푛/(푛 − 1)휖⌉ − 1.

Remarks. The proof of Theorem4 only needs to know that the mu-ind proof of CTR and the mu-prf proof of GMAC+ follow some high-level structure that we will describe below. We do not need to know any other specific details about those two proofs. This saves us the burden of repeating the entire prior proofs in Section 4.1 and Section 4.2. The mu-ind proof of CTR uses the 퐻-coefficient technique and follows this canonical structure:

(i) When the adversary finishes querying, we grant it all the keys. Note thatin the ideal world, the keys are still created but not used.

(ii) For each ideal-cipher query 퐸퐾 (푋) for answer 푌 , the transcript correspond- ingly stores an entry (prim, 퐾, 푋, 푌, +). Likewise, for each query 퐸−1(퐾, 푌 ) for answer 푋, the transcript stores an entry (prim, 퐾, 푋, 푌, −). For each query Enc(푖, 푀) with answer 퐶, we store an entry (enc, 푖, 푀, 퐶). (iii) When the adversary finishes querying, for each entry (enc, 푖, 푀, 퐶), in the real world, we grant it a table that stores all triples (퐾푖, 푋, 퐸(퐾푖, 푋)) for all queries 퐸(퐾푖, 푋) that CTR.E[퐸](퐾푖, 푀; 푇 ) makes, where 퐾푖 is the key of user 푖 and 푇 is the IV of 퐶. In the ideal world, the proof generates a corresponding fake table as follows. If we consider the version of CTR in which messages are padded (e.g., PKCS#7), then one can first parse 21

푛 푛 IV‖퐶1 ‖···‖퐶푚 ← 퐶 and 푀1 ‖···‖푀푚 ← 푀 and then return (퐾푖, 푋1, 퐶1 ⊕ 푛 푀1),..., (퐾푖, 푋푚, 퐶푚 ⊕ 푀푚), where 푋푖 = add(IV, 푖 − 1) and we use ← to denote some function that pads a message into 푛-bit blocks. If one uses the well-known padding-free version of CTR where the last block of the message is allowed to be shorter than 푛-bit, then one first pads 퐶 with random bits so that the last fragmentary block becomes 푛-bit long, and likewise pads 푀 with 0’s so that the last fragmentary block becomes 푛-bit long, and then proceeds as above. (This step can be optionally omitted for the padding version since the adversary can generate the table by itself.)

(iv) Consider a transcript 휏. If there are two tables 풯1 and 풯2 in 휏 that contain triples (퐾, 푋, 푌 ) and (퐾, 푋′, 푌 ′) respectively, and either 푋 = 푋′, or 푌 = 푌 ′, then 휏 must be considered bad. If there is a table 풯 that contains triples (퐾, 푋, 푌 ) and (퐾, 푋′, 푌 ′) such that either 푋 = 푋′, or 푌 = 푌 ′, then 휏 is also considered bad. In addition, if there is a table 풯 that contains a triple (퐾, 푋, 푌 ), and there is an entry (prim, 퐾, 푋′, 푌 ′, ·), and either 푋 = 푋′ or 푌 = 푌 ′, then 휏 is considered bad. The proof may define some other criteria for badness of transcripts.

We say that a CTR transcript is CTR-bad if it is bad according to the criteria defined by the proof of Theorem1. (Note that although not all of those criteria are specified in the structure above, it is enough for our purpose, as our proofof Theorem4 does not need to know those specific details.) The proof of GMAC+ also follows a similar high-level structure. We say that a GMAC+ transcript is GMAC+-bad if it is bad according to the criteria defined by the proof of Theorem2.

Weak regularity. We also provide a version of Theorem4 for the case where 퐻 is only weakly 푐-regular. Again, the security loss is substantial here, but security is preserved if each nonce is reused across a sufficiently small number 푑 of users. A proof sketch is given in Appendix F.1.

Theorem 5 (Security of SIV, weak regularity). Let 퐸 : {0, 1}푘 ×{0, 1}푛 → {0, 1}푛 be a blockcipher that we will model as an ideal cipher. Fix 0 < 휖 < 1. Let 퐻 : {0, 1}푛 × {0, 1}* × {0, 1}* → {0, 1}* be a weakly 푐-regular, 푐-AXU hash. Let AE ← SIV[GMAC+[퐻, 퐸], CTR[퐸]]. Then for any 훽-pairwise AU KeyGen and for any adversary 풜 that makes at most 푞 encryption/verification queries whose total block length is at most 퐿 ≤ 2(1−휖)푛−4, and encryption queries of at most 퐵 blocks per user, and 푝 ≤ 2(1−휖)푛−4 ideal-cipher queries,

1 훽푎푝 (3훽푐 + 7훽)퐿2 + 4훽푐퐿푝 Advmu-mrae (풜) ≤ + + AE,KeyGen,퐸 2푛/2 2푘 2푛+푘 (4푐훽 + 0.5훽 + 6.5)퐿퐵 푑푝 + (2푑 + 푎)퐿 + + , 2푛 2푘 where 푎 = ⌈1.5푛/(푛 − 1)휖⌉ − 1, and 푑 is a bound on the number of users re-using any given nonce. 22 Bose, Hoang and Tessaro

6 AES-GCM-SIV with A Generic Key-Derivation

In this section we consider the mu-mrae security of AES-GCM-SIV with respect to a quite generic class of key-derivation functions. This class includes the current KDF KD0 of AES-GCM-SIV, but it contains another KDF KD1 that is not only simpler but also twice faster. This KD1 was the original KDF in AES-GCM-SIV, but then subsequently replaced by KD0. Our multi-user bound is even better than the single-user bound of Gueron and Lindell [16]. In this section, we assume that GMAC+ and CTR use functions xor and add, respectively, such that (1) xor is 2-regular, injective, and linear, and xor(푋, 푁) ∈ 0{0, 1}푛−1 for every string 푋 ∈ {0, 1}푛 and every nonce 푁 ∈ 풩 = {0, 1}nl, and (2) add has min-entropy 푛 − 1, and add(IV, ℓ) ∈ 1{0, 1}푛−1 for every IV ∈ {0, 1}푛 and every ℓ ∈ N. (Those notions for add and xor can be found in Section 4.1 and Section 4.2 respectively.) This assumption holds for the design choice of AES-GCM-SIV. We thus only write CTR[퐸] or GMAC+[퐻, 퐸] instead of CTR[퐸, add] or GMAC+[퐻, 퐸, xor]. Below, we will formalize the Key-then-Encrypt transform that captures the way AES-GCM-SIV generates session keys for every encryption/decryption. We then describe our class of KDFs.

The KtE transform. Let AE be an AE scheme of nonce space 풩 and let KD : 풦 × 풩 → {0, 1}AE.kl be a key-derivation function. Given KD and AE, the Key- then-Encrypt (KtE) transform constructs another AE scheme AE = KtE[KD, AE] as shown in Fig.5.

AE.E(퐾, 푁, 푀, 퐴) AE.D(퐾, 푁, 퐶, 퐴) 퐽 ← KD(퐾, 푁); 퐶 ← AE.E(퐽, 푁, 푀, 퐴) 퐽 ← KD(퐾, 푁); 푀 ← AE.D(퐽, 푁, 퐶, 퐴) Return 퐶 Return 푀

Fig. 5: The AE scheme AE = KtE[KD, AE] constructed from an AE scheme AE and a key-derivation function KD, under the KtE transform.

Natural KDFs. Let 푛 ≥ 1 be an integer and let 푘 ∈ {푛, 2푛}. Let 퐸 : {0, 1}푘 × {0, 1}푛 → {0, 1}푛 be a blockcipher that we will model as an ideal cipher. Let pad : 풩 × {0,..., 5} → {0, 1}푛 be a padding mechanism such that pad(푁0, 푠0) ̸= pad(푁1, 푠1) for every distinct pairs (푁0, 푠0), (푁1, 푠1) ∈ 풩 × {0,..., 5}. Let KD[퐸]: {0, 1}푘 ×풩 → {0, 1}푛+푘 be a KDF that is associated with a deterministic algorithm KD.Map :({0, 1}푛)6 → {0, 1}푛+푘. We say that KD[퐸] is natural if on input (퐾, 푁), KD[퐸] first calls 푅0 ← 퐸(퐾, pad(푁, 0)), . . . , 푅5 ← 퐸(퐾, pad(푁, 5)), and then returns KD.Map(푅0, . . . , 푅5). It might seem arbitrary to limit the number of blockcipher calls of a natural KDF to six. However, note that since 푘 ≤ 2푛, the block length of each (푘 + 푛)- bit derived key is at most three. All known good constructions, which we list 23

KD0[퐸](퐾, 푁) KD1[퐸](퐾, 푁)

For 푠 = 0 to 5 do 푅푠 ← 퐸퐾 (pad(푁, 푠)) For 푠 = 0 to 5 do 푅푖 ← 퐸퐾 (pad(푁, 푠)) For 푖 = 0 to 2 do Return (푅0 ‖ 푅1 ‖ 푅2)[1 : 푛 + 푘] 푉푖 ← 푅2푖[1 : 푛/2] ‖ 푅2푖+1[1 : 푛/2] Return (푉0 ‖ 푉1 ‖ 푉2)[1 : 푛 + 푘]

Fig. 6: Key-derivation functions KD0 (left) and KD1 (right). below, use at most six blockcipher calls. Using more would simply make the performance and even the bounds worse. We therefore define a natural KDF to use at most six blockcipher calls.

The current KDF KD0[퐸] of AES-GCM-SIV, as shown in the left panel of Fig.6, is natural; it is defined for even 푛 only. For 푘 = 푛, it can be implemented using four blockcipher calls, but for 푘 = 2푛 it needs six blockcipher calls. Con- sider the KDF KD1[퐸] on the right panel of Fig.6. For 푘 = 푛 it can be imple- mented using two blockcipher calls, and 푘 = 2푛 it needs three blockcipher calls. This KDF is also simpler to implement than KD0. Iwata and Seurin [21] propose to use either the XOR construction [8,11] or the CENC construction [19]. Both XOR and CENC constructions are natural; the former uses four blockcipher calls for 푘 = 푛 and six blockcipher calls for 푘 = 2푛, and the latter uses three and four blockcipher calls respectively. For a natural key-derivation function KD[퐸], we say that it is 훾-unpredictable 푛 15 푛 푛+푘 if for any subset 푆 ⊆ {0, 1} of size at least 16 · 2 and any 푠 ∈ {0, 1} , if the random variables 푅0, . . . , 푅5 are sampled uniformly without replacement 푛+푘 from 푆 then Pr[KD.Map(푅0, . . . , 푅5) = 푠] ≤ 훾/2 . Lemma3 below shows that both KD0[퐸] and KD1[퐸] are 2-unpredictable; see AppendixG for the proof. One might also show that both the XOR and CENC constructions are 2-unpredictable. Therefore, in the remainder of this section, we only consider natural, 2-unpredictable KDFs.

Lemma 3. Let 푛 ≥ 128 be an even integer and let 푘 ∈ {푛, 2푛}. Let 퐸 : {0, 1}푘 × {0, 1}푛 → {0, 1}푛 be a blockcipher that we will model as an ideal cipher. Then both KD0[퐸] and KD1[퐸] are 2-unpredictable.

Ideal counterpart of natural KDF. For a natural KDF KD[퐸], consider its following ideal version KD[푘]. The key space of KD[푘] is the entire set Perm(푛). It takes as input a permutation 휋 ∈ Perm(푛) and a string 푁 ∈ 풩 , computes 푅푠 ← 휋(pad(푁, 푠)) for all 푠 ∈ {0,..., 5}, and returns KD.Map(푅0, . . . , 푅5). Of course KD[푘] is impractical since its key length is huge, but it will be useful in studying the security of the KtE transform. The following bounds the privacy and authenticity of KtE[KD[푘], AE] via the mu-mrae security of the AE scheme AE; the proof is in AppendixH. In light of that, in the subsequent subsections, we will analyze the difference between security of KtE[KD[퐸], AE] and that of KtE[KD[푘], AE]. 24 Bose, Hoang and Tessaro

KeyGen(st, aux)

(푁, 푖) ← aux; (휋1, 푆1, . . . , 휋푚, 푆푚) ← st If (푖 ∈ {1, . . . , 푚} and 푁 ∈ 푆푖) or (푖 ̸∈ {1, . . . , 푚 + 1}) then //Unexpected input, return a random key anyway 푘+푛 퐾 ←$ {0, 1} ; return (퐾, st) If 푖 ∈ {1, . . . , 푚} then 푆푖 ← 푆푖 ∪ {푁}; st ← (휋1, 푆1, . . . , 휋푚, 푆푚) If 푖 = 푚+1 then 휋푚+1 ←$ Perm(푛); 푆푚+1 ← {푁}; st ← (휋1, 푆1, . . . , 휋푚+1, 푆푚+1) Return (KD[푘](휋푖, 푁), st)

Fig. 7: Key-generation algorithm KeyGen corresponding to KD[푘].

Proposition 2. Let 푛 ≥ 8 be an integer and let 푘 ∈ {푛, 2푛}. Let 퐸 : {0, 1}푘 × {0, 1}푛 → {0, 1}푛 be a blockcipher that we will model as an ideal cipher. Let KD[퐸] be a natural KDF. Let AE be an AE scheme of key length 푘 + 푛. Let AE = KtE[KD[푘], AE]. Then for any adversaries 풜1 and 풜2, we can construct a key-generation algorithm KD.KeyGen as shown in Fig.7, and an adversary 풜 such that

mu-priv mu-auth mu-mrae Adv (풜1) + Adv (풜2) ≤ 3 Adv (풜) . AE,퐸 AE,퐸 AE,KeyGen,퐸 For any type of queries, the number of 풜’s queries is at most the maximum of that of 풜1 and 풜2, and the similar claim holds for the total block length of the encryption/verification queries. Moreover, the maximum of total block length of encryption queries per user of 풜 is at most the maximum of that per (user, nonce) pair of 풜1 and 풜2. The following lemma says that if KD[퐸] is 2-unpredictable then the con- structed KeyGen in the theorem statement of Proposition2 is 4-pairwise AU; the notion of pairwise AU for key-generation algorithms can be found in Sec- tion 4.2. The proof is in AppendixI. Lemma 4. Let 푛 ≥ 8 be an integer and let 푘 ∈ {푛, 2푛}. Let 퐸 : {0, 1}푘 × {0, 1}푛 → {0, 1}푛 be a blockcipher that we will model as an ideal cipher. Let KD[퐸] be a natural, 2-unpredictable KDF. Then the corresponding key-generation algorithm KeyGen in Fig.7 is 4-pairwise AU.

Indistinguishability of KD[퐸]. For an adversary 풜, define

dist dist AdvKD[퐸](풜) = 2 Pr[GKD[퐸](풜)] − 1 as the advantage of 풜 in distinguishing a natural KDF KD[퐸] and its ideal dist counterpart KD[푘] in the multi-user setting, where game GKD[퐸](풜) is defined in Fig.8. Under this notion, the adversary is given access to both 퐸 and 퐸−1, an oracle New() to initialize a new user 푣 with a truly random master key 퐾푣 and a secret ideal permutation 휋푣, and an evaluation oracle Eval that either implements KD[퐸] or KD[푘]. We say that an adversary 풜 is 푑-repeating if among its evaluation queries, a nonce is used for at most 푑 users. 25

dist Game GKD[퐸](풜) Eval(푖, 푁)

′ New,Eval,퐸,퐸−1 If 푖 > 푣 then return ⊥ 푣 ← 0; 푏 ←$ {0, 1}; 푏 ←$ 풜 If 푏 = 1 then return KD[퐸](퐾 , 푁) Return (푏′ = 푏) 푖 Else return KD[푘](휋푖, 푁) Procedure New() 푘 푣 ← 푣 + 1; 퐾푣 ←$ {0, 1} ; 휋푣 ←$ Perm(푛)

Fig. 8: Game to distinguish KD[퐸] and its ideal counterpart KD[푘].

Lemma5 below bounds the indistinguishability advantage between KD[퐸] and KD[푘]. The proof is in AppendixJ; it uses some technical balls-into-bins results in AppendixA. Lemma 5. Fix 0 < 휖 < 1. Let 푛 ≥ 16 be an integer and let 푘 ∈ {푛, 2푛}. Let 퐸 : {0, 1}푘 × {0, 1}푛 → {0, 1}푛 be a blockcipher that we will model as an ideal cipher. Let KD[퐸] be a natural KDF. For any 푑-repeating adversary 풜 that makes at most 푝 ≤ 2푛−4 ideal-cipher queries, and 푞 ≤ 2(1−휖)푛−4 evaluation queries, 1 24푝푞 + 18푞2 푎푝 + 푑(푝 + 3푞) Advdist (풜) ≤ + + KD[퐸] 2푛/2 2푘+푛 2푘 where 푎 = ⌈1.5/휖⌉−1. The theorem statement still holds if we grant the adversary the master keys when it finishes querying.

6.1 Privacy Analysis Lemma6 below reduces the privacy security of KtE[KD[퐸], AE] for a generic AE scheme AE, to that of KtE[KD[푘], AE]; the proof relies crucially on Lemma5. Lemma 6. Fix 0 < 휖 < 1. Let 푛 ≥ 16 be an integer and let 푘 ∈ {푛, 2푛}. Let 퐸 : {0, 1}푘 × {0, 1}푛 → {0, 1}푛 be a blockcipher that we will model as an ideal cipher. Let KD[퐸] be a natural KDF. Let AE be an AE scheme of key length 푘+푛, and let AE = KtE[KD[퐸], AE]. Consider a 푑-repeating adversary 풜 that makes 푝 ≤ 2푛−5 ideal-cipher queries and 푞 ≤ 2(1−휖)푛−4 encryption queries. Suppose that using AE to encrypt 풜’s encryption queries would need to make 퐿 ≤ 2푛−5 ideal-cipher queries. Then 2 48(퐿 + 푝)푞 + 36푞2 Advmu-priv(풜) ≤ Advmu-priv (풜) + + AE,퐸 KtE[KD[푘],AE],퐸 2푛/2 2푘+푛 2푎(퐿 + 푝) + 2푑(퐿 + 푝 + 3푞) + , 2푘 where 푎 = ⌈1.5/휖⌉ − 1. Proof. We first construct an adversary 풜 that tries to distinguish KD[퐸] and KD[푘]. Adversary 풜 simulates game Gmu-priv(풜), but each time it needs to gen- AE,퐸 erate a session key, it uses its Eval oracle instead of KD[퐸]. However, if 풜 26 Bose, Hoang and Tessaro previously queried Eval(푖, 푁) for an answer 퐾, next time it simply uses 퐾 without querying. Finally, adversary 풜 outputs 1 only if the simulated game dist returns true. Let 푏 be the challenge bit in game GKD[퐸](풜). Then

Pr[Gdist (풜) ⇒ true | 푏 = 1] = Pr[Gmu-priv(풜)], and KD[퐸] AE,퐸 dist mu-priv Pr[GKD[퐸](풜) ⇒ false | 푏 = 0] = Pr[GKtE[KD[푘],AE],퐸(풜)] .

Subtracting, we get

dist 1(︀ mu-priv mu-priv )︀ Adv (풜) = Adv (풜1) − Adv (풜1) . KD[퐸] 2 AE,퐸 KtE[KD[푘],AE],퐸

Note that 풜 makes at most 푝 + 퐿 ≤ 2푛−4 ideal-cipher queries, and 푞 Eval queries. Moreover, 풜 is also 푑-repeating. Hence using Lemma5,

1 24(퐿 + 푝)푞 + 18푞2 푎(퐿 + 푝) + 푑(퐿 + 푝 + 3푞) Advdist (풜) ≤ + + . KD[퐸],KD[푘] 2푛/2 2푘+푛 2푘 Putting this all together,

2 48(퐿 + 푝)푞 + 36푞2 Advmu-priv(풜) ≤ Advmu-priv (풜) + + AE,퐸 KtE[KD[푘],AE],퐸 2푛/2 2푘+푛 2푎(퐿 + 푝) + 2푑(퐿 + 푝 + 3푞) + . 2푘 This concludes the proof. ⊓⊔

6.2 Authenticity Analysis

In Section 6.1, we bound the privacy advantage by constructing a 푑-repeating ad- versary distinguishing KD[퐸] and KD[푘], and then using Lemma5. This method does not work for authenticity: the constructed adversary might be 푞-repeating, because there is no restriction of the nonces in verification queries, and one would end up with an inferior term 푞(퐿 + 푝 + 푞)/2푘. We instead give a dedicated analysis.

Restricting to simple adversaries. We say that an adversary is simple if for any nonce 푁 and user 푖, if the adversary uses 푁 for an encryption query of user 푖, then it will never use nonce 푁 on verification queries for user 푖. Lemma7 below reduces the authenticity advantage of a general adversary against KtE[KD[퐸], AE] to that of a simple adversary; the proof is in AppendixK, and is based on the idea of splitting the cases of where the adversary forges on a fresh (푁, 푖) pair and where it does not, and the latter can be handled using Lemma5 above. Handling the former is the harder part, which we deal with below. We discuss the bound however below, and give an overview of the proof. 27

Lemma 7. Let 푛 ≥ 16 be an integer and let 푘 ∈ {푛, 2푛}. Let 퐸 : {0, 1}푘 × {0, 1}푛 → {0, 1}푛 be a blockcipher that we will model as an ideal cipher. Let KD[퐸] be a natural KDF. Let AE be an AE scheme of key length 푛 + 푘, and let AE = KtE[KD[퐸], AE]. Let 풜0 be a 푑-repeating adversary that makes at most 푞 ≤ 2(1−휖)푛−4 encryption/verification queries and 푝 ≤ 2푛−5 ideal-cipher queries. Suppose that using AE to encrypt 풜0’s encryption queries and decrypt its verifi- cation queries would need to make 퐿 ≤ 2푛−5 ideal-cipher queries. Then, we can construct an adversary 풜1 and a simple adversary 풜2, both 푑-repeating, such that

mu-auth mu-auth mu-auth AdvAE,퐸 (풜0) ≤ AdvKtE[KD[푘],AE],퐸(풜1) + AdvAE,퐸 (풜2) 2 48(퐿 + 푝)푞 + 36푞2 2(푎 + 푑)퐿 + 2(푎 + 푑)푝 + 6푑푞 + + + , 2푛/2 2푛+푘 2푘 where 푎 = ⌈1.5/휖⌉ − 1. Any query of 풜1 or 풜2 is also a query of 풜0.

Handling simple adversaries. Lemma8 below shows that the AE scheme KtE[︀KD[퐸], SIV[GMAC+[퐻, 퐸], CTR[퐸]]]︀ has good authenticity against simple adversaries, for any 2-unpredictable, natural KDF KD[퐸]. The proof is in Ap- pendixL; it also uses some technical balls-into-bins results in AppendixA. Note that here we can handle both regular and weakly regular hash functions. (If we instead consider just regular hash functions, we can slightly improve the bound, but the difference is inconsequential.)

Lemma 8. Fix 0 < 휖 < 1 and let 푎 = ⌈1.5/휖⌉ − 1. Let 푛 ≥ 128 be an integer, and let 푘 ∈ {푛, 2푛}. Let 퐸 : {0, 1}푘 × {0, 1}푛 → {0, 1}푛 be a blockcipher that we will model as an ideal cipher. Let 퐻 : {0, 1}푛 × {0, 1}* × {0, 1}* → {0, 1}푛 be a hash function that is either 푐-regular or weakly 푐-regular. Let KD[퐸] be a natural, 2-unpredictable KDF. Let AE = SIV[GMAC+[퐻, 퐸], CTR[퐸]] and AE = KtE[KD[퐸], AE]. Let 풜 be a 푑-repeating, simple adversary that makes at most 푝 ≤ 2(1−휖)푛−8 ideal-cipher queries, and 푞 ≤ 2(1−휖)푛−8 encryption/verification queries whose total block length is at most 퐿 ≤ 2(1−휖)푛−8. Then

3 11푞 288(퐿 + 푝)푞 + 36푞2 + 48푐(퐿 + 푝 + 푞)퐿 Advmu-auth(풜) ≤ + + AE,퐸 2푛/2 2푛 2푛+푘 (8푎 + 7푎2 + 3푑)푞 (푛푎 + 6푎 + 6푑)퐿 + 6(푎 + 푑)푝 + + . 2푘 2푘 푞 푝푑 Discussion. The bound in Lemma8 consists of three important terms 2푛 , 2푘 , 푛푎퐿 and 2푘 , each corresponding to an actual attack. Let us revisit these, as this will be helpful in explaining the proof below. First, since the IV length is only 푛-bit long, even if an adversary simply outputs 푞 verification queries in a random fash- 푞 푝푑 ion, it would get an advantage about 2푛 . Next, for the term 2푘 , consider an ad- versary that picks a long enough message 푀 and then makes encryption queries (1, 푁, 푀, 퐴),..., (푑, 푁, 푀, 퐴) of the same nonce 푁 and associated data, for an- swers 퐶1, . . . , 퐶푑 respectively. (Recall that the adversary is 푑-repeating, so it cannot use the nonce 푁 in encryption queries for more than 푑 users.) By picking 28 Bose, Hoang and Tessaro

푝 candidate master keys 퐾1, . . . , 퐾푝 and comparing 퐶푖 with AE.E(퐾푗, 푁, 푀, 퐴) for all 푖 ≤ 푑 and 푗 ≤ 푝, the adversary can recover one master key with probability 푝푑 about 2푘 . 푛푎퐿 Finally, for the term 2푘 , consider the following attack. The adversary first picks a nonce 푁 and 푝 candidate keys 퐾1, . . . , 퐾푝, and then queries 푅0,푗 ← 푗 푗 퐸퐾 (퐾푗, pad(푁, 0)), . . . , 푅5,푗 ← 퐸(퐾푗, pad(푁, 5)) for every 푗 ≤ 푝. Let 퐾in‖퐾out ← KD.Map(푅0,푗, . . . , 푅5,푗). Now, if some 퐾푗 is the master key of some user 푖 then 푗 푗 퐾in ‖ 퐾out will be the session key of that user 푖 for nonce 푁. The adversary then picks an arbitrary ciphertext 퐶, and then computes 푀푗 ← CTR[퐸].D(퐾푗, 퐶) and −1 푗 푉푗 ← 퐸 (퐾out, 푇 ) for each 푗 ≤ 푝, where 푇 is the IV of 퐶. The goal of the adver- sary is to make a sequence of 푞 verification queries (1, 푁, 퐶, 퐴),..., (푞, 푁, 퐶, 퐴), for an ℓ-block associated data 퐴 that it will determine later. (Recall that in verification queries, the adversary can reuse a nonce across as many usersas it likes.) To maximize its chance of winning, the adversary will iterate through every possible string 퐴* of block length ℓ, and let count(퐴*) denote the number 푗 * of 푗’s that xor(퐻(퐾in, 푀푗, 퐴 ), 푁) = 푉푗. Then it picks 퐴 as the string to max- imize count(퐴). The proof of Lemma8 essentially shows that with very high 푛푎퐿 probability, we have count(퐴) ≤ 푛푎(ℓ + |퐶|푛) ≤ 푞 , and thus the advantage of 푛푎퐿 this attack is bounded by 2푘 .

Proof ideas. We now sketch some ideas in the proof of Lemma8. First consider an adversary that does not use the encryption oracle. Assume that the adver- sary does not repeat a prior ideal-cipher query, or make redundant ideal-cipher queries. For each query 퐸퐾 (푌 ) of answer 푌 , create an entry (prim, 퐾, 푋, 푌, +). −1 Likewise, for each query 퐸퐾 (푌 ) of answer 푋, create an entry (prim, 퐾, 푋, 푌, −). Consider a verification query (푖, 푁, 퐶, 퐴). Let 퐾푖 be the secret master key of user 푖, and let 퐾in ‖ 퐾out be the session key of user 푖 for nonce 푁. Let 푇 be the IV of 퐶. The proof examines several cases, but here we only discuss a few selective ones. If there is no entry (prim, 퐾푖, 푋, 푌, ·) such that 푋 ∈ {pad(푁, 0),..., pad(푁, 5)} then given the view of the adversary, the session key 퐾in ‖ 퐾out still has at least 푘 + 푛 − 1 bits of (conditional) min-entropy. In this case, the chance that AE.D(퐾in ‖ 퐾out, 푁, 퐶, 푀) returns a non-⊥ answer is roughly 1/2푛. Next, suppose that there is an entry (prim, 퐾, 푋, 푌, −) such that 퐾 = 퐾푖 and 푋 ∈ {pad(푁, 0),..., pad(푁, 5)}. By using some balls-into- bins analysis,7 we can argue that it is very likely that there are at most 6푎 entries (prim, 퐾*, 푋*, 푌 *, −) such that 푋* ∈ {pad(푁, 0),..., pad(푁, 5)}. Hence the chance this case happens is at most 6푎/2푘. Now consider the case that there are entries (prim, 퐾푖, pad(푁, 0), 푅0, +),..., 푛−1 (prim, 퐾푖, pad(푁, 5), 푅5, +), and (prim, 퐾out, 푉, 푇, −), with 푉 ∈ 0{0, 1} and 퐾in ‖퐾out ← KD.Map(푅0, . . . , 푅5). This corresponds to the last attack in the dis- cussion above. We need to bound Pr[Bad], where Bad is the the event (i) this case happens, and (ii) 푉 = xor(퐻(퐾in, 푀, 퐴), 푁), where 푀 ← CTR[퐸].D(퐾out, 퐶).

7 We note that this is not the classic balls-into-bins setting, because the balls are thrown in an inter-dependent way. In AppendixA we analyze this biased balls-into- bins setting. 29

This is highly non-trivial because somehow the adversary already sees the keys 퐾푖 and 퐾in ‖ 퐾out, and can adaptively pick (퐶, 퐴), as shown in the third attack above. To deal with this, we consider a fixed (푖*, 푁 *, 퐶*, 퐴*). There are at most 푝 * * * * septets 풯 of entries (prim, 퐾, pad(푁 , 0), 푅0, +),..., (prim, 퐾, pad(푁 , 5), 푅5, +) * 푛−1 ′ * * and (prim, 퐽, 푈, 푇 , −), with 푈 ∈ 0{0, 1} and 퐽 ‖ 퐽 ← KD.Map(푅0, . . . , 푅5). We then show that the chance that there are 푛ℓ푎 such septets 풯 such that ′ * * * 1−(3ℓ푛+2푛) * xor(퐻(퐽 (풯 ), 푀 (풯 ), 퐴 ), 푁 ) = 푈(풯 ) is at most 2 , where ℓ = |퐶 |푛+ * * * |퐴 |푛 ≥ 2 and 푀 (풯 ) ← CTR[퐸].D(퐽(풯 ), 퐶 ). Hence, regardless of how the ad- versary picks (푖, 푁, 퐶, 퐴) from all possible choices of (푖*, 푁 *, 퐶*, 퐴*), the chance ′ that there are 푛푎(|퐶|푛 +|퐴|푛) septets 풯 such that xor(퐻(퐽 (풯 ), 푀(풯 ), 퐴), 푁) = 푈(풯 ), where 푀(풯 ) ← CTR[퐸].D(퐽(풯 ), 퐶), is at most

∞ ∞ ∞ ∑︁ ∑︁ ∑︁ ∑︁ 2 1 21−(3푛ℓ+2푛) ≤ 22푛ℓ+2푛 · 21−(3푛ℓ+2푛) = ≤ . 2푛ℓ 2푛 ℓ=2 (푖*,푁 *,퐶*,퐴*) ℓ=2 ℓ=2 * * |퐶 |푛+|퐴 |푛=ℓ

1 푛푎·E[|퐴|푛+|퐶|푛] Thus Pr[Bad] ≤ 2푛 + 2푘 . Now we consider the general case where the adversary 풜 might use the en- cryption oracle. Clearly if for each encryption query (푖, 푁, 푀, 퐴), we grant the adversary the session key KD[퐸](퐾푖, 푁), where 퐾푖 is the master key of user 푖, then it only helps the adversary. Recall that here the adversary is simple, so it cannot query Enc(푖, 푁, 푀, 퐴) and later query Vf(푖, 푁, 퐶′, 퐴′). We also let the adversary compute up to 퐿 + 푝 ideal-cipher queries, so that the encryption oracle does not have to give the ciphertexts to the adversary. Effectively, we can −1 view that 풜 is in the following game 퐺0. It is given access to 퐸/퐸 and an oracle Eval(푖, 푁) that generates KD[퐸](푖, 푁). Then it has to generate a list of verification queries. The game then tries to decrypt those, and returns true only if some gives a non-⊥ answer. To remove the use of the Eval oracle, it is tempting to consider the vari- ant 퐺1 of game 퐺0 where Eval instead implements KD[푘], and then bound the gap between 퐺0 and 퐺1 by constructing a 푑-repeating adversary 풜 distin- guishing KD[퐸] and KD[푘]. However, this approach does not work because it is impossible for 풜 to correctly simulate the processing of the verification queries. Instead, we define game 퐺1 as follows. Its Eval again implements KD[푘], but after the adversary produces its verification queries, the game tries to program 퐸 so that the outputs of Eval are consistent with KD[퐸] on random master 푛+푘 keys 퐾1, 퐾2, · · · ←$ {0, 1} . (But 퐸 still has to remain consistent with its past ideal-cipher queries.) Of course it is not always possible, because the fake Eval might have generated some inconsistency. In this case, the game returns false, meaning that the adversary loses. If there is no inconsistency, then after the programming, the game processes the verification queries as in 퐺0.

To bound the gap between 퐺0 and 퐺1, we will construct a 푑-repeating adver- sary 풜 distinguishing KD[퐸] and KD[푘], but additionally, it wants to be granted the master keys after it finishes querying. Note that Lemma5 applies to this 30 Bose, Hoang and Tessaro key-revealing setting. Now, after the adversary 풜 finishes querying, it is granted the master keys and checks for inconsistency between the outputs of Eval and the ideal-cipher queries. If there is inconsistency then 풜 outputs 0, indicating that it has been dealing with KD[푘]. Otherwise, it has to simulate the process- ing of the verification queries. However, although it knows the keys now, itcan no longer queries 퐸. Instead, 풜 tries to sample an independent blockcipher 퐸˜, subject to (1) 퐸˜ and 퐸 agree on the outputs of the past ideal-cipher queries, and the outputs of Eval are consistent with KD[퐸˜] on master keys 퐾1, 퐾2,.... It then processes the verification queries using this blockcipher 퐸˜ instead of 퐸.

Although the game 퐺1 above does not completely remove the use of the Eval oracle, it still creates some sort of independence between the sampling of the master keys, and the outputs that the adversary 풜 receives, allowing us to repeat several proof ideas above.

Handling general adversaries. Combining Lemmas7 and8, we immedi- ately obtain the following result.

Lemma 9. Fix 0 < 휖 < 1 and let 푎 = ⌈1.5/휖⌉ − 1. Let 푛 ≥ 128 be an integer, and let 푘 ∈ {푛, 2푛}. Let 퐸 : {0, 1}푘 × {0, 1}푛 → {0, 1}푛 be a blockcipher that we will model as an ideal cipher. Let 퐻 : {0, 1}푛 × {0, 1}* × {0, 1}* → {0, 1}푛 be a hash function that is either 푐-regular hash or weakly 푐-regular. Let KD[퐸] be a natural, 2-unpredictable KDF. Let AE = SIV[GMAC+[퐻, 퐸], CTR[퐸]] and AE = KtE[KD[퐸], AE]. Let 풜 be a 푑-repeating adversary that makes at most 푝 ≤ 2(1−휖)푛−8 ideal-cipher queries, and 푞 ≤ 2(1−휖)푛−8 encryption/verification queries whose total block length is at most 퐿 ≤ 2(1−휖)푛−8. Then we can construct a 푑-repeating adversary 풜 such that

5 11푞 336(퐿 + 푝)푞 + 72푞2 Advmu-auth(풜) ≤ Advmu-auth (풜) + + + AE,퐸 KtE[KD[푘],AE],퐸 2푛/2 2푛 2푛+푘 48푐(퐿 + 푝 + 푞)퐿 (8푎 + 7푎2 + 9푑)푞 + (푛푎 + 8푎 + 8푑)퐿 + 8(푎 + 푑)푝 + + . 2푛+푘 2푘 Moreover, any query of 풜 is also a query of 풜.

6.3 Unwinding Mu-Mrae Security The following Theorem6 concludes the mu-mrae security of AE scheme AE = KtE[KD[퐸], SIV[GMAC+[퐻, 퐸], CTR[퐸]]]; the proof is in AppendixM. Note that here we can handle both regular and weakly regular hash functions. (If we instead consider just regular hash functions, we can slightly improve the bound, but the difference is inconsequential.)

Theorem 6 (Security of AES-GCM-SIV). Let 푛 ≥ 128 be an integer, and let 푘 ∈ {푛, 2푛}. Fix 0 < 휖 < 1 and let 푎 = ⌈1.5푛/(푛 − 1)휖⌉ − 1. Let 퐸 : {0, 1}푘 ×{0, 1}푛 → {0, 1}푛 be a blockcipher that we will model as an ideal cipher. Let 퐻 : {0, 1}푛 ×{0, 1}* ×{0, 1}* → {0, 1}푛 be a 푐-AXU hash function. Moreover, 31 either 퐻 is 푐-regular, or weakly 푐-regular. Let KD[퐸] be a natural, 2-unpredictable KDF. Let AE = SIV[GMAC+[퐻, 퐸], CTR[퐸]] and AE = KtE[KD[퐸], AE]. Let 풜 be a 푑-repeating adversary that makes at most 푝 ≤ 2(1−휖)푛−8 ideal-cipher queries, and 푞 ≤ 2(1−휖)푛−8 encryption/verification queries whose total block length isat most 퐿 ≤ 2(1−휖)푛−8 and encryption queries of at most 퐵 blocks per (user, nonce) pair. Then,

10 (17푎 + 4푎2 + 24푑 + 푛푎)퐿 + (22푎 + 13푑)푝 Advmu-mrae(풜) ≤ + AE,퐸 2푛/2 2푘 (48푐 + 30)퐿퐵 (303 + 108푐)퐿2 + (192 + 96푐)퐿푝 + + . 2푛 2푛+푘

We note that one way that 푑 can be kept small is by choosing nonces ran- domly, or at least with sufficient entropy. Then, by a classical balls-into-bins analysis, if 푞 is quite smaller than 2nl, where nl is the nonce length, which holds in practice for nl = 96, then the value 푑 is bounded by a constant with high probability. We also point out that if 푑 cannot be bounded, then our security bound still gives very meaningful security guarantees if 푘 = 2푛 (i.e., this would have us use AES-256). As there is a matching attack in the unbounded 푑 case, which just exploits key collisions, this suggests the need to increase keys to 256 in the multi-user case. However, many uses in practice will have 푑 bounded, and for these 128-bit keys will suffice.

Acknowledgments. We thank Shay Gueron and Yehuda Lindell for insightful feedback. Priyanka Bose and Stefano Tessaro were supported by NSF grants CNS- 1553758 (CAREER), CNS-1423566, CNS-1719146, CNS-1528178, and IIS-1528041, and by a Sloan Research Fellowship. Viet Tung Hoang was supported in part by NSF grant CICI-1738912 and the First Year Assistant Professor Award of Florida State University.

References

1. M. Abdalla and M. Bellare. Increasing the lifetime of a key: a comparative analysis of the security of re-keying techniques. In T. Okamoto, editor, ASIACRYPT 2000, volume 1976 of LNCS, pages 546–559. Springer, Heidelberg, Dec. 2000. 2. M. Bellare, D. J. Bernstein, and S. Tessaro. Hash-function based PRFs: AMAC and its multi-user security. In M. Fischlin and J.-S. Coron, editors, EUROCRYPT 2016, Part I, volume 9665 of LNCS, pages 566–595. Springer, Heidelberg, May 2016. 3. M. Bellare, A. Boldyreva, and S. Micali. Public-key encryption in a multi-user set- ting: Security proofs and improvements. In B. Preneel, editor, EUROCRYPT 2000, volume 1807 of LNCS, pages 259–274. Springer, Heidelberg, May 2000. 4. M. Bellare, R. Canetti, and H. Krawczyk. Pseudorandom functions revisited: The cascade construction and its concrete security. In 37th FOCS, pages 514–523. IEEE Computer Society Press, Oct. 1996. 32 Bose, Hoang and Tessaro

5. M. Bellare, A. Desai, E. Jokipii, and P. Rogaway. A concrete security treatment of symmetric encryption. In 38th FOCS, pages 394–403. IEEE Computer Society Press, Oct. 1997. 6. M. Bellare and V. T. Hoang. Identity-based Format-Preserving Encryption. In CCS 2017, 2017. 7. M. Bellare and R. Impagliazzo. A tool for obtaining tighter security analyses of pseudorandom function based constructions, with applications to PRP to PRF conversion. Cryptology ePrint Archive, Report 1999/024, 1999. http://eprint. iacr.org/1999/024. 8. M. Bellare, T. Krovetz, and P. Rogaway. Luby-Rackoff backwards: Increas- ing security by making block non-invertible. In K. Nyberg, editor, EUROCRYPT’98, volume 1403 of LNCS, pages 266–280. Springer, Heidelberg, May / June 1998. 9. M. Bellare and B. Tackmann. The multi-user security of authenticated encryption: AES-GCM in TLS 1.3. In M. Robshaw and J. Katz, editors, CRYPTO 2016, Part I, volume 9814 of LNCS, pages 247–276. Springer, Heidelberg, Aug. 2016. 10. S. Chen and J. P. Steinberger. Tight security bounds for key-alternating ciphers. In P. . Nguyen and E. Oswald, editors, EUROCRYPT 2014, volume 8441 of LNCS, pages 327–350. Springer, Heidelberg, May 2014. 11. W. Dai, V. T. Hoang, and S. Tessaro. Information-theoretic indistinguishability via the chi-squared method. In J. Katz and H. Shacham, editors, CRYPTO 2017, Part III, volume 10403 of LNCS, pages 497–523. Springer, Heidelberg, Aug. 2017. 12. S. Gilboa and S. Gueron. Distinguishing a truncated random permutation from a random function. Cryptology ePrint Archive, Report 2015/773, 2015. http: //eprint.iacr.org/2015/773. 13. S. Goldwasser and M. Bellare. Lecture notes on cryptography. Summer Course “Cryptography and Computer Security” at MIT, 1999. 14. S. Gueron, A. Langley, and Y. Lindell. AES-GCM-SIV: Specification and analysis. Cryptology ePrint Archive, Report 2017/168, 2017. http://eprint.iacr.org/ 2017/168. 15. S. Gueron and Y. Lindell. GCM-SIV: Full nonce misuse-resistant authenticated encryption at under one cycle per byte. In I. Ray, N. Li, and C. Kruegel:, editors, ACM CCS 15, pages 109–119. ACM Press, Oct. 2015. 16. S. Gueron and Y. Lindell. Better bounds for block cipher modes of operation via nonce-based key derivation. In CCS 2017, 2017. 17. V. T. Hoang and S. Tessaro. Key-alternating ciphers and key-length extension: Exact bounds and multi-user security. In M. Robshaw and J. Katz, editors, CRYPTO 2016, Part I, volume 9814 of LNCS, pages 3–32. Springer, Heidelberg, Aug. 2016. 18. V. T. Hoang and S. Tessaro. The multi-user security of double encryption. In J. Coron and J. B. Nielsen, editors, EUROCRYPT 2017, Part II, volume 10211 of LNCS, pages 381–411. Springer, Heidelberg, May 2017. 19. T. Iwata. New blockcipher modes of operation with beyond the birthday bound security. In M. J. B. Robshaw, editor, FSE 2006, volume 4047 of LNCS, pages 310–327. Springer, Heidelberg, Mar. 2006. 20. T. Iwata and Y. Seurin. Reconsidering the security bound of AES-GCM-SIV. Cryptology ePrint Archive, Report 2017/708, 2017. http://eprint.iacr.org/ 2017/708. 21. T. Iwata and Y. Seurin. Reconsidering the security bound of aes-gcm-siv. Cryptol- ogy ePrint Archive, Report 2017/708, 2017. ://eprint.iacr.org/2017/708. 33

22. S. Lucks. The sum of PRPs is a secure PRF. In B. Preneel, editor, EURO- 2000, volume 1807 of LNCS, pages 470–484. Springer, Heidelberg, May 2000. 23. A. Luykx, B. Mennink, and K. G. Paterson. Analyzing multi-key security degra- dation. Cryptology ePrint Archive, Report 2017/435, 2017. http://eprint.iacr. org/2017/435. 24. U. M. Maurer. Indistinguishability of random systems. In L. R. Knudsen, editor, EUROCRYPT 2002, volume 2332 of LNCS, pages 110–132. Springer, Heidelberg, Apr. / May 2002. 25. D. A. McGrew and J. Viega. The security and performance of the Galois/counter mode (GCM) of operation. In A. Canteaut and K. Viswanathan, editors, IN- DOCRYPT 2004, volume 3348 of LNCS, pages 343–355. Springer, Heidelberg, Dec. 2004. 26. N. Mouha and A. Luykx. Multi-key security: The Even-Mansour construction revisited. In R. Gennaro and M. J. B. Robshaw, editors, CRYPTO 2015, Part I, volume 9215 of LNCS, pages 209–223. Springer, Heidelberg, Aug. 2015. 27. C. Namprempre, P. Rogaway, and T. Shrimpton. Reconsidering generic composi- tion. In P. Q. Nguyen and E. Oswald, editors, EUROCRYPT 2014, volume 8441 of LNCS, pages 257–274. Springer, Heidelberg, May 2014. 28. J. Patarin. A proof of security in O(2n) for the xor of two random permutations. In R. Safavi-Naini, editor, ICITS 08, volume 5155 of LNCS, pages 232–248. Springer, Heidelberg, Aug. 2008. 29. J. Patarin. The “coefficients H” technique (invited talk). In R. M. Avanzi, L. Keli- her, and F. Sica, editors, SAC 2008, volume 5381 of LNCS, pages 328–345. Springer, Heidelberg, Aug. 2009. 30. J. Patarin. Introduction to mirror theory: Analysis of systems of linear equalities and linear non equalities for cryptography. Cryptology ePrint Archive, Report 2010/287, 2010. http://eprint.iacr.org/2010/287. 31. P. Rogaway and T. Shrimpton. A provable-security treatment of the key-wrap problem. In S. Vaudenay, editor, EUROCRYPT 2006, volume 4004 of LNCS, pages 373–390. Springer, Heidelberg, May / June 2006. 32. S. Tessaro. Optimally secure block ciphers from ideal primitives. In T. Iwata and J. H. Cheon, editors, ASIACRYPT 2015, Part II, volume 9453 of LNCS, pages 437–462. Springer, Heidelberg, Nov. / Dec. 2015. 33. J. Vance and M. Bellare. Delegatable Feistel-based Format Preserving Encryption mode. Submission to NIST, Nov 2015. 34. M. N. Wegman and L. Carter. New hash functions and their use in and set equality. Journal of Computer and System Sciences, 22:265–279, 1981.

A Biased Balls-Into-Bins

Consider the following game in which we throw 푞 balls into 2푚 bins. The throws can be inter-dependent, but for each 푖-th throw, conditioning on the result of the prior throws, the conditional probability that the 푖-th ball falls into any particular bin is at most 21−푚. Let Balls(푞, 푚) denote the random variable for the number of balls in the heaviest bin in this game. The following result gives a strong concentration bound on Balls(푞, 푚) when 푞 is quite smaller than 2푚. 34 Bose, Hoang and Tessaro

Lemma 10. Fix 0 < 휖 < 1. Let 푚, 푞 ∈ N such that 푞 ≤ 2(1−휖)푚−1. Then [︁ ]︁ Pr Balls(푞, 푚) ≥ ⌈1.5/휖⌉ ≤ 2−푚/2 .

Proof. Let 푠 = 푚 − 1 and 푟 = ⌈1.5/휖⌉. Since the adversary throws at most 푞 balls, there are (︂푞)︂ 푞푟 푞푟 ≤ ≤ 푟 푟! 2 sets of 푟 balls. For each set, the chance that all balls in the set land in the same bin is at most 2−(푟−1)푠. Hence the chance that there are 푟 balls landing in the same bin is at most

푞푟 2푟(1−휖)푠 ≤ = 2−1−(휖푟−1)푠 ≤ 2−(푠/2+1) ≤ 2−푚/2 . 2 · 2(푟−1)푠 2 · 2(푟−1)푠 This concludes the proof. ⊓⊔

Next, we give a concentration bound on Balls(푞, 푚) when 푞 might be quite bigger than 2푚.

Lemma 11. Fix 0 < 휖 < 1 and 푚 ∈ N such that 푚 ≥ 128. Let 푚, ℓ, 푐, 푞 ∈ N such that ℓ ≥ 2 and 푞 ≤ 푐 · 2푚. Then [︁ ]︁ Pr Balls(푞, 푚) ≥ ⌈푐ℓ푚/2⌉ ≤ 2−(3ℓ+2)푚 .

Proof. Let 푠 = 푚 − 1 and 푟 = ⌈푐ℓ푚/2⌉ ≥ 128푐. Since the adversary throws at most 푞 balls, there are (︂푞)︂ 푞푟 ≤ 푟 푟! sets of 푟 balls. For each set, the chance that all balls in the set land in the same bin is at most 2−(푟−1)푠. Hence the chance that there are 푟 balls landing in the same bin is at most 푞푟 (2푐)푟2푟푠 (2푐)푟2푠 (2푐)푟2푚 2푚 ≤ = ≤ = 푟! · 2(푟−1)푠 푟! · 2(푟−1)푠 푟! (푟/푒)푟 (푟/2푒푐)푟 2푚 2푚 ≤ ≤ ≤ 2−(3ℓ+2)푚 , (64/푒)ℓ푚 (8/푒)2푚 · 8ℓ푚 where the second inequality is due to the fact that 푛! ≥ (푛/푒)푛 for every integer 푛 ≥ 1, and the second last inequality is due to the hypothesis that ℓ ≥ 2. This concludes the proof. ⊓⊔ 35

PolyVal[F](퐾, 푀, 퐴) * * 푋 ← 퐴0 ‖ 푀0 ‖ [|퐴|]푛/2 ‖ [|푀|]푛/2 푋1 ··· 푋푚 ← 푋 // Each |푋푖| = 푛 // Interpret 퐾 and 푋1, . . . , 푋푚 as elements in F 푚 푚−1 푌 ← 푋1 ∙ 퐾 ⊕ 푋2 ∙ 퐾 ⊕ · · · ⊕ 푋푚 ∙ 퐾 Return 푌

Fig. 9: The POLYVAL hash function. For a string 푍, we write 푍0* to denote the string obtained by padding 0’s to 푍 until the next 푛-bit boundary. In particular, if |푍| * 푛 푛/2 is divisible by 푛 then 푍0 = 푍 ‖ 0 . For a number 푡 ∈ {0,..., 2 − 1}, we write [푡]푟 to denote an 푟-bit representation of 푡.

B The POLYVAL Hash Function

Let 푛 ≥ 2 be an even integer. Let F be a finite field of 2푛 elements, meaning that we can interpret a string in {0, 1}푛 as an element in F, and vice versa. Assume that the string 0푛 is interpreted as the zero element of F, and the addition operator in F is equivalent to xor in {0, 1}푛. Let ∙ denote the mul- tiplication operator of F. The POLYVAL hash function PolyVal[F]: {0, 1}푛 × {0, 1}* × {0, 1}* → {0, 1}푛 is defined as in Fig.9. Note that if 푀 = 퐴 = 휀 then PolyVal[F](퐾, 푀, 퐴) = 0푛 for any key 퐾.

Weak regularity of POLYVAL. We first show that PolyVal is weakly 1.5- regular. Consider arbitrary (푀, 퐴) ∈ ({0, 1}*)2∖(휀, 휀) and 푍 ∈ {0, 1}푛. Let * * * 푋 ← 퐴0 ‖ 푀0 ‖ [|퐴|]푛/2 ‖ [|푀|]푛/2, where 푍0 denotes the string obtained by padding 0’s to 푍 until the next 푛-bit boundary, and [푡]푛/2 denotes an 푛/2-bit representation of the number 푡. Note that

푚 = |푋|푛 = |퐴|푛 + |푀|푛 + 1 ≤ 1.5(|퐴|푛 + |푀|푛), since |퐴|푛, |푀|푛 ≥ 1. Let 푋1 ··· 푋푚 ← 푋, where each |푋푖| = 푛. Let

푚 푚−1 푓(푥) = 푋1 ∙ 푥 ⊕ 푋2 ∙ 푥 ⊕ · · · ⊕ 푋푚 ∙ 푥 ⊕ 푍 . Note that 푓 is a polynomial of degree at most 푚, and since (푀, 퐴) ̸= (휀, 휀), 푓 푛 is non-zero. Hence 푓 has at most 푚 roots. If we pick 퐾 ←$ {0, 1} , the chance 푛 푛 that 퐾 is one of those 푚 roots is at most 푚/2 ≤ 1.5(|푀|푛 + |퐴|푛)/2 . Hence Hence

푛 Pr [PolyVal[F](퐾, 푀, 퐴) = 푍] = Pr [푓(퐾) = 0 ] 퐾 ←$ {0,1}푛 퐾 ←$ {0,1}푛 1.5(|푀| + |퐴| ) ≤ 푛 푛 2푛 and thus PolyVal is weakly 1.5-regular.

XOR universality of POLYVAL. Next, we show that PolyVal is 1.5-AXU. Consider distinct (푀, 퐴) and (푀 ′, 퐴′) in ({0, 1}*)2, and fix 푍 ∈ {0, 1}푛. Let 36 Bose, Hoang and Tessaro

* * ′ ′ * ′ * ′ ′ 푋 ← 퐴0 ‖ 푀0 ‖ [|퐴|]푛/2 ‖ [|푀|]푛/2, and 푋 ← 퐴 0 ‖ 푀 0 ‖ [|퐴 |]푛/2 ‖ [|푀 |]푛/2. ′ Without loss of generality, assume that 푚 = |푋|푛 ≥ |푋 |푛 = ℓ. Note that

푚 = |푋|푛 = |퐴|푛 + |푀|푛 + 1 ≤ 1.5(|퐴|푛 + |푀|푛),

′ ′ ′ since |퐴|푛, |푀|푛 ≥ 1. Let 푋1 ··· 푋푚 ← 푋 and 푋1 ··· 푋ℓ ← 푋 , where |푋푖| = ′ |푋푗| = 푛. Let

푚 ′ ℓ ′ 푔(푥) = (푋1 ∙ 푥 ⊕ · · · ⊕ 푋푚 ∙ 푥) ⊕ (푋1 ∙ 푥 ⊕ · · · ⊕ 푋ℓ ∙ 푥) ⊕ 푍 .

Note that 푔(푥) is a polynomial of degree at most 푚, and since (푀, 퐴) ̸= (푀 ′, 퐴′), 푛 푔 is non-zero. Hence 푔 has at most 푚 roots. If we pick 퐾 ←$ {0, 1} , the chance 푛 푛 that 퐾 is one of those 푚 roots is at most 푚/2 ≤ 1.5(|푀|푛 + |퐴|푛)/2 . Hence

[︀ ′ ′ ]︀ Pr PolyVal[F](퐾, 푀, 퐴) ⊕ PolyVal[F](퐾, 푀 , 퐴 ) = 푍 퐾 ←$ {0,1}푛

푛 1.5(|푀|푛 + |퐴|푛) = Pr [푔(퐾) = 0 ] ≤ 푛 퐾 ←$ {0,1}푛 2 and thus PolyVal is 1.5-AXU.

Fixing the weak regularity of POLYVAL. As shown above, PolyVal is just weakly regular. There are several ways to make PolyVal regular. For example, instead of padding 푀 and 퐴 with 0’s, one can pad them with 1’s. The resulting construction would be 1.5-regular and 1.5-AXU.

C Proof of Proposition1

Without loss of generality, assume that if a verification query returns true then the adversary 풜0 will simply terminate and return 1. This can only increase its advantage. Assume that 풜0 never repeats an encryption query, and if it queries (푖, 푁, 푀, 퐴) to Enc for a ciphertext 퐶, then subsequently, it will not query (푖, 푁, 퐶, 퐴) to Vf. We now construct an adversary 풜 attacking the mrae security of AE, but it only calls Vf after finishing querying Enc. Adversary 풜 runs 풜0, and uses its Enc and New oracles to respond to the latter’s queries accordingly. For each verification query of 풜0, adversary 풜 simply returns false, but stores the query ′ in a set 푆. When 풜0 terminates and outputs a bit 푏 , adversary 풜 will iterate over queries in its set 푆. For each query (푖, 푁, 퐶, 퐴) in 푆, if there is an encryption query (푖, 푁, 푀, 퐴) with answer 퐶 (that is made after 풜0 queries Vf(푖, 푁, 퐶, 퐴)), then 풜 will terminate and output 1. Otherwise, it will query Vf(푖, 푁, 퐶, 퐴), and if the answer is true, it will again terminate and output 1. If all verification queries return false then 풜 outputs 푏′. Let 푎 and 푏 be the challenge bits of game mu-mrae mu-mrae GAE,KeyGen,훱 (풜) and GAE,KeyGen,훱 (풜0) respectively. Then

mu-mrae mu-mrae Pr[GAE,KeyGen,훱 (풜) | 푎 = 1] = Pr[GAE,KeyGen,훱 (풜0) | 푏 = 1] . 37

Indeed, in the real world, if some verification query (푖, 푁, 퐶, 퐴) of 풜0 can re- turn true then 풜0 will output 1, and so does 풜: either 풜 will eventually query (푖, 푁, 퐶, 퐴) to get answer true, or later there is an encryption query (푖, 푁, 푀, 퐴) of answer 퐶 that makes 풜 outputs 1. If no verification query of 풜0 can return true then 풜 correctly simulates the verification oracle for 풜0, and both will give the same answer 푏′. On the other hand, 2푞 Pr[Gmu-mrae (풜) | 푎 = 0] ≥ Pr[Gmu-mrae (풜 ) | 푏 = 0] − 푣 . AE,KeyGen,훱 AE,KeyGen,훱 0 2푛 Indeed, in the ideal world, adversary 풜 correctly simulates the verification or- acle for 풜0. The answer of the two adversaries will be different only if there is a verification query (푖, 푁, 퐶, 퐴) and a subsequent encryption query (푖, 푁, 푀, 퐴) with the same answer 퐶. For each verification query (푖, 푁, 퐶, 퐴), it can be “tar- geted” by at most 2푠+1 encryption queries, where 푠 = |퐶| − 푛, but the chance that some such encryption query can result in the same ciphertext 퐶 is at most 푠+1 |퐶| 푛 2 /2 = 2/2 . Summing this over 푞푣 verification queries gives us the bound 푛 2푞푣/2 . Hence 2푞 Advmu-mrae (풜) ≥ Advmu-mrae (풜 ) − 푣 . AE,KeyGen,훱 AE,KeyGen,훱 0 2푛 Recall that 풜 always makes verification queries after all encryption queries. We now construct adversaries 풜1 and 풜2. Adversary 풜1 runs 풜, and uses its Enc and New oracles to respond to the latter’s queries accordingly. For the verification queries of 풜, adversary 풜1 simply answers false. When 풜 outputs a ′ ′ bit 푎 , 풜1 also output 푎 . Adversary 풜2 runs 풜 and uses its oracles New and Enc to respond to the latter’s queries accordingly. For each verification query of 풜, adversary 풜2 queries it to its Vf oracle, but always returns false to 풜. Let mu-mrae game 퐺1 correspond to game GAE,KeyGen,훱 (풜) with challenge bit 푎 = 1. Let 퐺2 be identical to game 퐺1, except that the verification oracle will always return false. Let 퐺3 be identical to game 퐺2, except that now the encryption oracle will always return a fresh random answer of appropriate length. Then

mu-auth AdvAE,KeyGen,훱 (풜2) ≥ Pr[퐺1] − Pr[퐺2] because it is impossible for 풜 to distinguish 퐺1 and 퐺2, unless it manages to trigger the verification oracle to return a true answer in game 퐺1. On the other hand, mu-priv AdvAE,KeyGen,훱 (풜1) = Pr[퐺2] − Pr[퐺3] . Summing up,

mu-priv mu-auth AdvAE,KeyGen,훱 (풜1) + AdvAE,KeyGen,훱 (풜2) ≥ Pr[퐺1] − Pr[퐺3] mu-mrae = AdvAE,KeyGen,훱 (풜) 2푞 ≥ Advmu-mrae (풜 ) − 푣 . AE,KeyGen,훱 0 2푛 This concludes the proof. 38 Bose, Hoang and Tessaro

D Proof of Theorem1

Our proof uses the H-coefficient method. We let S0 and S1 be two systems mu-ind which models the oracles accessed by 풜 in the game GAE,퐵 (풜) in the cases where ciphertexts are real (푏 = 1) or random (푏 = 0), respectively. Here, 풜 is deterministic without loss of generality, and transcripts only contain two types of queries:

1. Encryption queries have form (enc, 푖, 푀, (IV, 퐶)), where 푖 indicates the user for which the query has been made, 푀 ∈ {0, 1}* is the plaintext, IV ∈ {0, 1}푛 is the first block of the ciphertext, whereas 퐶 are the remaining blocks. We will normally think of 퐶 as made of 푛-bit blocks 퐶[1], . . . , 퐶[ℓ] and 푀 of blocks 푀[1], . . . , 푀[ℓ]. 2. Ideal-cipher queries have form (prim, 퐾, 푢, 푣), and correspond to the ad- versary making a query to the ideal cipher, either (퐾, 푢) (in the forward direction) or (퐾, 푣) in the backward direction, returning 푢 and 푣, respec- tively.8

We do not record 풜’s New queries explicitly, but add the resulting keys 퐾1, . . . , 퐾푢 to the transcript (note that such keys are generated even in the ideal case, just never used). We also assume without loss of generality that if an encryption query for user 푖 appears, then 푣 has been previously increased beyond 푖 using New queries. Further, let us fix a transcript 휏 be a transcript with 푢 keys 퐾1, . . . , 퐾푢, and 9 let 풦 = 풦(휏) = {퐾1, . . . , 퐾푢}. Also, let q = (enc, 푖, 푀, (IV, 퐶)) ∈ 휏 such that 푀 and 퐶 are made of the 푛-bit blocks 푀[1], . . . , 푀[ℓ] and 퐶[1], . . . , 퐶[ℓ]. Then, we define the following multi-sets (i.e., elements are allowed to be repeated)

푈(q) = {IV + 1,..., IV + ℓ} , 푉 (q) = {퐶[1] ⊕ 푀[1], . . . , 퐶[ℓ] ⊕ 푀[ℓ]} , as well as 퐾(q) = 퐾푖. Then, for any 퐾 ∈ 풦, we let ⋃︁ ⋃︁ 푉 (퐾) = 푉 (q) , 푈(퐾) = 푈(q) . q:퐾(q)=퐾 q:퐾(q)=퐾

Here, union is on multisets. Finally, for each 퐾 ∈ {0, 1}푘, we also define 푃 (퐾) as the set of inputs 푈 such that there exists 푉 with (prim, 퐾, 푈, 푉 ) ∈ 휏.

Good transcripts, and ratio analysis. With this notation, we can give a definition of good/bad transcripts.

Definition 2 (Good and bad transcripts). We say that 휏 is good if the following conditions are satisfied, for all 퐾 ∈ 풦:

8 It will not be necessary for the transcript to record the direction of ideal-cipher queries, i.e., whether the query is in the forward or backward direction. 9 Note that as keys may be repeated, we can only ensure |풦| ≤ 푢. 39

(a) Each element in 푈(퐾) appears once, i.e., there are no repetitions. (b) Each element in 푉 (퐾) appears once, i.e., there are no repetitions. (c) 푃 (퐾) ∩ 푈(퐾) = ∅.

If 휏 is not good, then it is bad.

In the following, we prove that for all good transcript 휏, we have pS0 (휏) ≥ pS1 (휏). First off, define by pKeyGen(퐾1, . . . , 퐾푢) the probability that New query asked indeed generate these keys. Now, with 푁 = 2푛, and 푞 the number of encryption queries,

⎡ |푃 (퐾)|+|푈(퐾)|−1 ⎤ 1 ∏︁ ∏︁ 1 p (휏) = · p (퐾 , . . . , 퐾 ) · . S0 푁 푞 KeyGen 1 푢 ⎣ 푁 − 푖⎦ 퐾∈{0,1}푘 푖=0

where the first term takes into account the random choice of the IVs, thesec- ond the choice of the keys, the third the ideal-cipher evaluations within the ∑︀ encryption and direct primitive queries. Note that 퐾∈{0,1}푘 |푃 (퐾)| = 푝 and ∑︀ 퐾∈풦 |푈(퐾)| = 퐿. On the other hand, in the ideal world,

⎡ |푃 (퐾)|−1 ⎤ 1 1 ∏︁ ∏︁ 1 p (휏) = · p (퐾 , . . . , 퐾 ) · · , S1 푁 푞 KeyGen 1 푢 푁 퐿 ⎣ 푁 − 푖⎦ 퐾∈{0,1}푘 푖=0

since ciphertexts are random. Therefore, pS0 (휏)/pS1 (휏) ≥ 1, and we can use the H-coefficient technique with 휖 = 0, and only need the probability that a transcript generated in an ideal execution is bad.

Probability of a bad transcript. We now turn to computing the proba- bility of a transcript being bad in S1. In particular, denote by 풳1 the transcript generated by 풜’s interaction, and let ℬ푎, ℬ푏 and ℬ푐 be the sets of transcripts which violate (a), (b), or (c) in Definition2. Then, by the union bound,

Pr[풳1 ∈ ℬ] ≤ Pr[풳1 ∈ ℬ푎] + Pr[풳1 ∈ ℬ푏] + Pr[풳1 ∈ ℬ푐] .

We now upper bound the three probabilities separately. Note that because we are in the ideal world, and New queries do not return any output and the generated keys are not used, we can think of KeyGen without loss of generality being run at the end of the execution, and generating the resulting keys.

Case a). We let 퐿1, 퐿2, 퐿3,... be the individual lengths of each query performed by the attacker, which can be chosen adaptively. Also let ℬ푎,푖 for 푖 ∈ [푞] the set of transcripts where the 푖-th query, of length 퐿푖, generates an input to the ideal 40 Bose, Hoang and Tessaro cipher which is used in one of the previous queries using the same key. Then, 푞 ∑︁ Pr[풳1 ∈ ℬ푎] ≤ Pr[풳1 ∈ ℬ푎,푖] 푖=1 푞 ∑︁ ∑︁ [︂ 퐵 퐿 ]︂ ≤ Pr[퐿 = ℓ ] · ℓ · + 훼 푖 푖 푖 2ℎ 2ℎ 푖=1 ℓ푖≥1 푞 [︂ 퐵 퐿 ]︂ ∑︁ = + 훼 E[퐿 ] 2푛 2ℎ 푖 푖=1 [︃ 푞 ]︃ [︂ 퐵 퐿 ]︂ ∑︁ 퐿퐵 퐿2 = + 훼 E 퐿 ≤ + 훼 , 2ℎ 2ℎ 푖 2ℎ 2ℎ 푖=1 because upon generating a new IV for a query encrypting ℓ푖 blocks, there is ℎ probability at most ℓ푖퐵/2 that one of the ℓ푖 offsets add(IV, 0),..., add(IV, ℓ푖 −1) will collide with one of the offsets to encrypt a previous message forthe same ℎ user, and probability ℓ푖 ·퐿/2 that there is a collision with one of the offsets used by some other user. In the latter case, however, such a collision only contributes to the bad event if the keys associated with the two users also collide, thus incurring an additional 훼 multiplicative factor.

Case b). The argument is very similar to the one for Case a). Instead of looking at the offsets add(IV, 푖) generated during an encryption, and checking collisions with previously used offsets, we look at the actual ciphertext blocks which are output (independently and randomly), and make sure they do not provoke the transcript to be in ℬ푏. This gives us a bound of 퐿퐵 퐿2 Pr[풳 ∈ ℬ ] ≤ + 훼 . (5) 1 푏 2푛 2푛

Case c). Recall that when sampling 풳1, the keys 퐾1, . . . , 퐾푢 are sampled at the end of the execution, independently of it. For a transcript 휏, we define by 푍(휏, 푈) to be the maximal number of encryption queries q ∈ 휏 such that 푈 ∈ 푈(q). Also, let 푍 = max푥 푍(풳1, 푥). Then, ∑︁ Pr[풳1 ∈ ℬ푐] ≤ Pr[푍 = 푧] · 푧푝훼 + Pr [푍 > 푎] 푧≤푎 ≤ 푎푝훼 + Pr [푍 > 푎] because for every query (prim, 퐾, 푈, 푉 ), out of 푝 potential ones, there are at most 푍(풳1, 푈) ≤ 푍 queries q = (enc, 푖, 푀, (IV, 퐶)) such that 푈 ∈ 푈(q). Thus, the probability (over the choice of 퐾1, . . . , 퐾푢) that 퐾 = 퐾푖 is 훼. We have then ⌈︀ 1.5푛 ⌉︀ decided to cut the sum at 푧 = 푎 = 휖ℎ − 1, as we are going to justify next that the probability that 푍 exceeds this is negligible. This follows from the following lemma, whose proof is found below, and which follows a classical balls-into-bins approach, with some extra care needed to handle the adaptivity of the adversary. 41

Lemma 12. Let 퐿 ≤ 2(1−휖)ℎ−1. Then, [︂ ⌈︂1.5푛⌉︂]︂ Pr 푍 ≥ < 2−푛/2 . 휖ℎ This concludes the proof. Proof (Of Lemma 12). One can without loss of generality consider the following adaptive balls-into-bins game. There are 푁 = 2푛 bins, corresponding to the block-cipher inputs. At each query, the attacker chooses adaptively a length ℓ, 푛 then a random IV ←$ {0, 1} is chosen, and a ball is placed into the bins add(IV, 푖) for 푖 = 0, . . . , ℓ − 1. We assume without loss of generality the attacker’s lengths always sum up to 퐿, as this only increases 푍. Then, we let 푍푖 be the load of bin 푖, and 푍 = max푖 푍푖. Fix some bin 푖 ∈ [푁]. We will now upper bound 푍푖, for an arbitrary strategy. Note that with respect to the goal of maximizing 푍푖, without loss of generality, the adversary needs only to know whether a ball was thrown into bin 푖 or not after each move. (It can simulate the rest consistently.) Note that when the adversary chooses a random IV and some length ℓ, we have ℓ Pr[푖 ∈ {add(IV, 0),..., add(IV, ℓ − 1)}] ≤ , 2ℎ because each individual item has min-entropy ℎ, and is equal 푖 with probability at most 1/2ℎ. Imagine we modify the game so that when the adversary selects ℓ, we make 2 × ℓ independent attempts to throw a ball into bin 푖, each of them succeeding with probability 1/2ℎ, and if any of this lands into bin 푖, the adversary learns this (for now, we will not reveal how many balls land in bin 푖, only none, or at least one). Then, the probability that one ball lands in 푖 is (︂ )︂2ℓ 1 − 2ℓ 1 2ℓ ℓ 1 − 1 − ≥ 1 − 푒 2ℎ ≥ · = 2ℎ 2 2ℎ 2ℎ

−푥 푥 ℓ where we have used the fact that 푒 ≤ 1 − 2 whenever 푥 ≤ 1, and 2ℎ ≤ 1. ′ Therefore, let 푍푖 be load of 푖 in the modified game, we clearly have Pr[푍푖 ≥ ′ 푚] ≤ Pr[푍푖 ≥ 푚] for every 푚. But note that because all balls are thrown independently, the latter probabil- ity does not become smaller if we consider the setting where 2퐿 balls are thrown ℎ ′′ independently, each hitting 푖 with probability 1/2 . Call the resulting value 푍푖 ′′ denoting the load of 푖, we thus have Pr[푍푖 ≥ 푚] ≤ Pr[푍푖 ≥ 푚]. By repeatedly applying the union bound, we have

푁 푚 푚 ∑︁ (︂2퐿)︂ (︂ 1 )︂ (︂2퐿)︂ Pr[푍 ≥ 푚] ≤ Pr[푍′′ ≥ 푚] ≤ 푁 < 푁 ≤ 2푛−휖ℎ푚 , 푖 푚 2ℎ 2ℎ 푖=1

(︀푎)︀ 푏 (1−휖)ℎ−1 where we have used the fact that 푏 < 푎 and 퐿 ≤ 2 . Now, set 푚 = 1.5푛 −푛/2 ⌈ 휖ℎ ⌉. Then, the above is upper bounded by 2 . ⊓⊔ 42 Bose, Hoang and Tessaro

E Proof of Theorem2

We shall use H-coefficient technique to prove the claimed bound. We define two systems S0 and S1 that represents the real game (푏 = 1) and ideal (푏 = 0) game mu-prf of GGMAC+[퐻,퐸],퐵,퐸(풜). Also, without loss of generality we assume that our adversary 풜 is deterministic and it does not repeat queries. After the adversary 푖 푖 finishes querying, we grant the adversary the key pairs {퐾in, 퐾out}푖=1,...,푢 for all 푢 users spawned by calls to New. (Note that in the ideal world, these keys do not influence the behavior of the system and can be thought as being generated at the end, consistent with the earlier inputs to the New queries.) In addition to the key pairs granted to the adversary, a mu-prf transcript 휏 contains the following two types of queries: – Evaluation queries are the entries of type (eval, 푖, 푀, 퐴, 푁, 푇 ), where 푖 indicates the user that this query targets, 푀 is message, 퐴 the associated data, 푁 is the nonce, and 푇 is corresponding tag. – Primitive queries are of type (prim, 퐾, 푈, 푉 ), which result from a forward ideal-cipher query (퐾, 푈) returning 푉 , or a backward ideal-cipher query (퐾, 푉 ) returning 푈. Again, for a query q, we write q ∈ 휏 to denote its appearance in the transcript. Also, for each 퐾 ∈ {0, 1}푘 we define the following numbers:

⃒ 푖 ⃒ 푞(퐾) = ⃒{(eval, 푖, 푀, 퐴, 푁, 푇 ) ∈ 휏 | 퐾out = 퐾}⃒ 푝(퐾) = |{(prim, 퐾′, 푈, 푉 ) ∈ 휏 | 퐾′ = 퐾}| Defining bad transcripts. We say a transcript 휏 is bad if it satisfies one of the following constraints (it is called good otherwise).

1. There exist two entries (eval, 푖, 푀1, 퐴1, 푁1, 푇1) and (eval, 푗, 푀2, 퐴2, 푁2, 푇2) 푖 푗 such that (푖, 푀1, 퐴1, 푁1) ̸= (푗, 푀2, 퐴2, 푁2) , 퐾out = 퐾out and

xor(퐻퐾푖 (푀1, 퐴1), 푁1) = xor(퐻 푗 (푀2, 퐴2), 푁2) . (6) in 퐾in

2. There exist two entries (eval, 푖, 푀1, 퐴1, 푁1, 푇1) and (eval, 푗, 푀2, 퐴2, 푁2, 푇2) 푖 푗 such that 푇1 = 푇2 and 퐾out = 퐾out. 푖 3. There exist entries (prim, 퐾, 푈, 푉 ) and (eval, 푖, 푀, 퐴, 푁, 푇 ) such that 퐾out = 퐾 and xor(퐻 푖 (푀, 퐴), 푁) = 푈. 퐾in

Transcript ratio. We now need to compare pS1 (휏) and pS0 (휏) for a good tran- script 휏. First off, note that in the ideal world, all queries are replied randomly and independently, and moreover, keys are also chosen independently of the rest * 푖 푖 of the transcript, with a certain probability 푝 = pKeyGen({퐾in, 퐾out}푖=1,...,푢). Further, ideal-cipher queries are also answered independently of evaluation queries. Therefore, 푝(퐾)−1 ∏︁ ∏︁ 1 p (휏) = 푝* · 2−푛푞 · . S1 2푛 − 푖 퐾∈{0,1}푘 푖=0 43

In the real world, note that because the transcript 휏 is good, no two queries to the ideal cipher within Eval queries with the same outer key 퐾 are on the same input, and moreover, such inputs do not appear as inputs of direct Prim queries (even though the outer key itself might). The keys are also generated with probability 푝*. For this reason,

푝(퐾)+푞(퐾)−1 ∏︁ ∏︁ 1 p (휏) = 푝* · , S0 2푛 − 푖 퐾∈{0,1}푘 푖=0

∑︀ and thus in particular, because 퐾 푞(퐾) = 푞, we see that pS0 (휏) replaces the −푛 1 1 푞 factors 2 in the product in pS1 (휏) with other factors of form 2푛−푖 ≥ 2푛 for some 푖 ≥ 0, and therefore, pS0 (휏) ≥ pS1 (휏). Thus, we can use the H-coefficient technique with 휖 = 0, and only need to upper bound the probability that an ideal transcript is bad, which we do next.

Probability of bad transcripts. Let 풳1 be the random variable for the transcript in the ideal system. Let ℬ1, ℬ2, ℬ3 be the sets of transcripts that satisfies1 ( ), (2) and (3) according to the definition of bad transcripts. Then by the union bound,

Pr[풳1 ∈ ℬ] ≤ Pr[풳1 ∈ ℬ1] + Pr[풳1 ∈ ℬ2] + Pr[풳1 ∈ ℬ3]

We now upper bound the three probabilities on the RHS separately.

For (1) and (3), we can assume wlog that the execution has terminated, and 1 1 2 2 the transcript so far is fixed, and the keys 퐾in, 퐾out, 퐾in, 퐾out,... are generated, independently of the execution, and are the only random variables. Assume in particular the execution has involved 푢 users and there are 푞푖 evaluation queries ∑︀푢 for user 푖, and thus 푖=1 푞푖 ≤ 푞. Also, assume that the 푞푖 queries intended for user 푖 have each lengths ℓ푖,1, ℓ푖,2, . . . , ℓ푖,푞푖 blocks where we intentionally arrange ∑︀푞푖 them to be sorted as ℓ푖,1 ≤ ℓ푖,2 ≤ ... ≤ ℓ푖,푞푖 . Clearly, for all 푖, 푗=1 ℓ푖,푗 ≤ 퐵.

We define two subsets ℬ11 and ℬ12 of ℬ1. The first consists of transcripts in which there are two entries (eval, 푖, 푀1, 푁1, 퐴1, 푇1) and (eval, 푖, 푀2, 푁2, 퐴2, 푇2) with (푀1, 퐴1, 푁1) ̸= (푀2, 퐴2, 푁2), and

xor(퐻 푖 (푀1, 퐴1), 푁1) = xor(퐻 푖 (푀2, 퐴2), 푁2) . 퐾in 퐾in

The second set consists of the transcripts with two entries (eval, 푖1, 푀1, 퐴1, 푁1, 푇1) 푖1 푖2 and (eval, 푖2, 푀2, 퐴2, 푁2, 푇2) with 푖1 ̸= 푖2, 퐾out = 퐾out, and

xor(퐻 푖1 (푀1, 퐴1), 푁1) = xor(퐻 푖2 (푀2, 퐴2), 푁2) . 퐾in 퐾in 44 Bose, Hoang and Tessaro

Note that here (푀1, 퐴1, 푁1) = (푀2, 퐴2, 푁2) is allowed.

We start with ℬ11. Then, for any two (푀1, 퐴1, 푁1) ̸= (푀2, 퐴2, 푁2) with (푀1, 퐴1) ̸= (푀2, 퐴2),

Pr[xor(퐻 푖 (푀1, 퐴1), 푁1) = xor(퐻 푖 (푀2, 퐴2), 푁2)] 퐾in 퐾in 푛 = Pr[xor(퐻 푖 (푀1, 퐴1) ⊕ 퐻 푖 (푀2, 퐴2), 푁1 ⊕ 푁2) = 0 ] 퐾in 퐾in 푐 · 휆 · 훽 · max{|푀 | + |퐴 | , |푀 | + |퐴 | } ≤ 1 푛 1 푛 2 푛 2 푛 , 2푛 for the following reasons. The first equality follows by linearity of xor. Then, by 푛 휆-regularity of xor, there are at most 휆 strings 훥 such that xor(훥, 푁1 ⊕푁2) = 0 . By 푐-xor-universality, for any such 훥, the number of keys 푘 in {0, 1}푛 that make the xor of the hashes equal 훥 is at most 푐·max{|푀1|푛 +|퐴1|푛, |푀2|푛 +|퐴2|푛}. By 푛 훽-AU, the probability of each such key is at most 훽/2 . Clearly, if (푀1, 퐴1) = (푀2, 퐴2), but 푁1 ̸= 푁2, then the upper bound also holds vacuously by injectivity of xor. Therefore, by taking the union bound, and exploiting our ordering of queries according to their lengths (recall 퐶 := 훽푐휆),

푢 푢 푞푖 ∑︁ ∑︁ 퐶 · ℓ푖,푗 ∑︁ ∑︁ ℓ푖,푗 Pr[풳1 ∈ ℬ11] ≤ 푛 = 퐶 (푗 − 1) · 푛 ′ 2 2 푖=1 1≤푗 <푗≤푞푖 푖=1 푗=1

푢 푞푖 푢 ∑︁ ∑︁ ℓ푖,푗 ∑︁ 푞푖퐵 퐶푞퐵 ≤ 퐶 푞 ≤ 퐶 ≤ . 푖 2푛 2푛 2푛 푖=1 푗=1 푖=1

We move on to ℬ12. Note that for any two relevant entries, we have that

푖1 푖2 Pr[xor(퐻 푖1 (푀1, 퐴1), 푁1) = xor(퐻 푖2 (푀2, 퐴2), 푁2) ∧ 퐾out = 퐾out] ≤ 퐾in 퐾in 퐶 · min{|푀 | + |퐴 | , |푀 | + |퐴 | } ≤ 1 푛 1 푛 2 푛 2 푛 , 2푛+푘 because of the following reasons: Assume wlog |푀1|푛 + |퐴1|푛 ≥ |푀2|푛 + |퐴2|푛 푖1 (otherwise the argument is symmetric). Then, for every for every 퐾in , there are at most 휆 values 푌 such that xor(퐻 푖1 (푀1, 퐴1), 푁1) = xor(푌, 푁2) by 휆-regularity 퐾in of xor, and for each such 푌 , by 푐-regularity of 퐻, at most 푐·(|푀2|푛 +|퐴2|푛) values 푖2 of 퐾in are such that 퐻 푖2 (푀2, 퐴2) = 푌 . Thus there are at most 퐶 · (|푀2|푛 + 퐾in 푛+푘 푖1 푖1 푖2 푖2 |퐴2|푛) · 2 tuples (퐾in , 퐾out, 퐾in , 퐾out) of keys that provoke the event, and each one of them appears with probability at most 훽/22(푛+푘) by 훽-AU. Thus, overall 푢 푞푖 ∑︁ ∑︁ 푞 · ℓ푖,푗 퐶푞퐿 Pr[풳 ∈ ℬ ] ≤ 퐶 ≤ . 1 12 2푛+푘 2푛+푘 푖=1 푗=1 Hence, we conclude that 퐶푞퐵 퐶푞퐿 Pr[풳 ∈ ℬ ] ≤ + (7) 1 1 2푛 2푛+푘 45

Now, for ℬ3, we use a similar argument. For any one of the 푝 Prim queries (prim, 퐾, 푈, 푉 ) and any of the 푞 Eval queries (eval, 푖, 푀, 퐴, 푁, 푇 ), we have

푖 퐶(|푀|푛 + |퐴|푛) Pr[퐾out = 퐾 ∧ xor(퐻퐾푖 (푀, 퐴), 푁) = 푈] ≤ . in 2푛+푘 Taking a union bound over all 푝 and 푞 queries, yields

푢 푞푖 ∑︁ ∑︁ ℓ푖,푗 퐶푝퐿 Pr[풳 ∈ ℬ ] ≤ 퐶 · 푝 · ≤ . 1 3 2푛+푘 2푛+푘 푖=1 푗=1

Finally, we turn to ℬ2. We partition the set of transcripts into two subsets ℬ21 and ℬ22. The first subset ℬ21 consists of the transcripts which contain two entries (eval, 푖, 푀1, 퐴1, 푁1, 푇1) and (eval, 푖, 푀2, 퐴2, 푁2, 푇2) such that 푇1 = 푇2. The sec- ond subset ℬ22 consists of transcripts with two entries (eval, 푖1, 푀1, 퐴1, 푁1, 푇1) 푖1 푖2 and (eval, 푖2, 푀2, 퐴2, 푁2, 푇2) such that 푇1 = 푇2, 푖1 ̸= 푖2, and 퐾out = 퐾out. Then, for each new query, the probability that the output collides with one of the pre- viously issued queries for the same user is at most 퐵/2푛. Therefore, by the union bound, 푞퐵 Pr[풳 ∈ ℬ ] ≤ ; . 1 21 2푛 In contrast, to enter the second set, note that for each new query, there is proba- bility at most 푞/2푛 that the output collides with one of the previous queries, and the probability that additionally the outer keys collide is at most 훽/2푘. Thus,

훽푞2 Pr[풳 ∈ ℬ ] ≤ . 1 22 2푛+푘 Summing up we get, 푞퐵 훽푞2 Pr[풳 ∈ ℬ ] ≤ + 1 2 2푛 2푛+푘 This concludes the proof.

E.1 Proof of Theorem3

We merely discuss how to adapt the proof of Theorem2 to accommodate the case 푛 that 퐻퐾in (휀, 휀) = 0 for all keys 퐾in, where 휀 denotes the empty string. Further, this yields a proof for Theorem3. The bad transcripts are exactly the same as in Theorem2, the changes are the probabilities that these bad transcripts occur, specifically for the events ℬ12 and ℬ3. Note that we assume an upper bound 푑 on the number of users re-using a particular nonce 푁, and this is going to be used below. 46 Bose, Hoang and Tessaro

Analysis of ℬ12. Recall that we are looking at the probability that there are two transcript entries (eval, 푖1, 푀1, 퐴1, 푁1, 푇1) and (eval, 푖2, 푀2, 퐴2, 푁2, 푇2) with 푖1 푖2 푖1 ̸= 푖2, 퐾out = 퐾out, and

xor(퐻 푖1 (푀1, 퐴1), 푁1) = xor(퐻 푖2 (푀2, 퐴2), 푁2) . 퐾in 퐾in

Note that here (푀1, 퐴1, 푁1) = (푀2, 퐴2, 푁2) is allowed. There are three sub-cases resulting in three different probability terms:

– If (푀1, 퐴1) ̸= (휀, 휀), (푀2, 퐴2) ̸= (휀, 휀), then we are in the same situation as 퐶푞퐿 in Theorem2 above, and get an upper bound . 2푛+푘 – If (푀1, 퐴1) = (휀, 휀) and (푀2, 퐴2) ̸= (휀, 휀), then

푖1 푖2 Pr[xor(퐻 푖1 (푀1, 퐴1), 푁1) = xor(퐻 푖2 (푀2, 퐴2), 푁2) ∧ 퐾out = 퐾out] ≤ 퐾in 퐾in 퐶 · (|푀 | + |퐴 | ) ≤ 2 푛 2 푛 , 2푛+푘

where we have used the regularity of the function output on (푀2, 퐴2). Taking 퐶푞퐿 a union bound over all such pairs, this results in a term . 2푛+푘 – Finally, we consider the case of (푀1, 퐴1) = (푀2, 퐴2) = (휀, 휀). Here, the prob- 푖1 푖2 ability Pr[xor(퐻 푖1 (푀1, 퐴1), 푁1) = xor(퐻 푖2 (푀2, 퐴2), 푁2) ∧ 퐾out = 퐾out] is 퐾in 퐾in −푘 either zero if 푁1 ̸= 푁2, or 2 if 푁1 = 푁2. (This follows from the injective property of xor.) Let now 푞푁 be the number of queries (휀, 휀) with nonce 푁, ∑︀ and thus in particular 푁 푞푁 ≤ 푞, and further 푞푁 ≤ 푑. Then, the overall probability that ℬ12 occurs due to such a pair is at most

∑︁ ∑︁ 푑푞 푞2 /2푘 ≤ 푑 · 푞 /2푘 ≤ . 푁 푁 2푘 푁 푁

Analysis of ℬ3. As in Theorem2, the probability that for one of the 푝 Prim queries (prim, 퐾, 푈, 푉 ) and one of the 푞 Eval queries (eval, 푖, 푀, 퐴, 푁, 푇 ) with 푖 퐶푝퐿 (푀, 퐴) ̸= (휀, 휀) we have 퐾 = 퐾 and xor(퐻 푖 (푀, 퐴), 푁) = 푈 is at most 푛+푘 . out 퐾in 2 In contrast, for a nonce 푁, let 푁 ′ = xor(0푛, 푁). Then, for every Prim query (prim, 퐾, 푁 ′, 푉 ), there are at most 푑 Eval queries (eval, 푖, 휀, 휀, 푁, 푇 ), and the 푖 −푘 probability that 퐾in = 퐾 for any of these is 2 . Therefore, the overall proba- 푝푑 bility that the transcript is in ℬ3 because of such a pair is at most 2푘 .

F Proof of Theorem4

We will use the H-coefficient technique. The real system S0 and ideal system mu-mrae S1 implement game AdvAE,KeyGen,퐸(풜) with challenge bit 1 and 0 respectively. Assume that 풜 does not repeat a prior query (except for New ones), and it does not make redundant ideal-cipher queries. Assume that if the adversary makes 47 an encryption query (푖, 푁, 푀, 퐴) for an answer 퐶 then later it will not make a verification query (푖, 푁, 퐶, 퐴). Since we consider computationally unbounded adversaries, without loss of generality, assume that the adversary is deterministic. When the adversary finishes querying, we grant it all the keys 퐾1, 퐾2, ··· . This should only help the adversary. Beside the revealed keys and the information of the New queries, the transcript stores the following information: – Ideal-cipher queries: for each query 퐸(퐾, 푋) with answer 푌 , create an entry (prim, 퐾, 푋, 푌, +). Likewise, for each query 퐸−1(퐾, 푌 ) with answer 푋, create an entry (prim, 퐾, 푋, 푌, −). – Encryption queries: for each encryption query (푖, 푁, 푀, 퐴) with answer 퐶, store an entry (enc, 푖, 푁, 푀, 퐴, 퐶). Additionally, in the real world, grant the adversary the table of triples (퐾, 푋, 퐸퐾 (푋)) for any query 퐸(퐾, 푋) that CTR[퐸].E(퐽푖, 푀; 푇 ) makes, where 퐽푖 is the 푘-bit suffix of the key 퐾푖 of user 푖, and 푇 is the IV of 퐶. In the ideal world, generate the corresponding fake table as described in Section5. The extra information in the table will only help the adversary. – Verification queries: for each verification query (푖, 푁, 퐶, 퐴) with answer 푏, store an entry (vf, 푖, 푁, 퐶, 퐴, 푏).

+ Now, from such a transcript 휏, we can extract a transcript R1(휏) for GMAC , and another transcript R2(휏) for CTR as follows. The transcript R1(휏) con- sists of the revealed keys, information of the New queries, and all prim entries of 휏, and for each entry (enc, 푖, 푁, 푀, 퐴, 퐶) of 휏, we accordingly store an entry (eval, 푖, 푁, 푀, 퐴, 푇 ) in R1(휏), where 푇 is the IV of 퐶. The transcript R2(휏) con- sists of the 푘-bit suffixes of the revealed keys, information ofthe New queries, and all prim entries of 휏, and for each entry (enc, 푖, 푁, 푀, 퐴, 퐶) of 휏, we accord- + ingly store an entry (enc, 푖, 푀, 퐶) in R2(휏). If (1) R1(휏) is GMAC -good and R2(휏) is CTR-good, and (2) 휏 contains no verification query of answer true, then we additionally grant the adversary the following information:

– In the real world, for each entry (vf, 푖, 푁, 퐶, 퐴, false), we run CTR[퐸].D(퐽푖, 퐶), where 퐽푖 is the 푘-bit suffix of 퐾푖, and grant the adversary the decrypted mes- sage 푀. For each query 퐸(퐾, 푋) of answer 푌 that CTR[퐸].D makes, if there is no entry (prim, 퐾, 푋, 푌, ·) or no triple (퐾, 푋, 푌 ) in all tables then we grant the adversary an entry (dec, 퐾, 푋, 푌 ). – In the ideal world, create a blockcipher 퐸˜ : {0, 1}푘 × {0, 1}푛 → {0, 1}푛 as follows. For each 퐾 ∈ {0, 1}푘, sample 퐸˜(퐾, ·) uniformly random from Perm(푛), subject to the constraint that (i) for any entry (prim, 퐾, 푋, 푌, ·) ˜ ′ ′ in R2(휏), we must have 퐸(퐾, 푋) = 푌 , and (ii) for any triple (퐾, 푋 , 푌 ) in ˜ ′ ′ the tables of R2(휏), we must have 퐸(퐾, 푋 ) = 푌 . This blockcipher can be generated, because R2(휏) is CTR-good. For each entry (vf, 푖, 푁, 퐶, 퐴, false), we run CTR[퐸˜].D(퐽푖, 퐶), where 퐽푖 is the 푘-bit suffix of 퐾푖 and grant the adversary the decrypted message 푀, and the entries (dec, 퐾, 푋, 푌 ) as above.

Defining bad transcripts. A transcript 휏 is bad if one of the following hap- pens: 48 Bose, Hoang and Tessaro

+ + 1. The GMAC -transcript R1(휏) of 휏 is GMAC -bad. 2. The CTR-transcript R2(휏) of 휏 is CTR-bad. 3. There is a table in R2(휏) that contains a triple (퐾, 푋, 푌 ), and there is an entry (eval, 푖, 푁, 푀, 퐴, 푇 ) in R1(휏) such that 퐾 = 퐾out and 푌 = 푇 , where 퐾in ‖ 퐾out is the key 퐾푖 of user 푖. 4. There is an entry (dec, 퐾, 푋, 푌 ) in 휏 and an entry (eval, 푖, 푁, 푀, 퐴, 푇 ) in R1(휏) such that 퐾 = 퐾out and 푌 = 푇 , where 퐾in ‖ 퐾out is the key 퐾푖 of user 푖. 5. There is an entry (vf, 푖, 푁, 퐶, 퐴, false) in 휏 and an entry (eval, 푗, 푁 ′, 푀 ′, 퐴′, 푇 ) ′ in R1(휏) such that 푇 is the IV of 퐶, 퐾out = 퐾out, and xor(퐻(퐾in, 푀, 퐴), 푁) = ′ ′ ′ ′ ′ ′ xor(퐻(퐾in, 푀 , 퐴 ), 푁 ), where 퐾in ‖ 퐾out and 퐾in ‖ 퐾out are the keys 퐾푖 and 퐾푗 of users 푖 and 푗 respectively, and 푀 is the decrypted message associated with the vf entry above. 6. There are entries (prim, 퐾, 푋, 푌 ) and (vf, 푖, 푁, 퐶, 퐴, false) in 휏 such that 퐾 = 퐾out, 푌 = 푇 , and xor(퐻(퐾in, 푀, 퐴), 푁) = 푋, where 퐾in ‖ 퐾out is the key 퐾푖 of user 푖, and 푇 is the IV of 퐶, and 푀 is the decrypted message associated with the vf entry above.

If a transcript is not bad then we say that it is good. Below, let 휖1 be the number that the GMAC+ proof uses to upper bound the probability of bad transcripts, for any adversary 풜 that makes at most 푞 evaluation queries whose total block length is at most 퐿, at most 퐵-block queries per user, and 푝 ideal-cipher queries, and for any 훽-pairwise AU key-generation algorithm. Applying Theorem2 with 휆 = 2, and note that 푞 ≤ 퐿,

(1 + 2훽푐)퐿퐵 2훽푐퐿푝 + (2훽푐 + 훽)퐿2 휖 ≤ + (8) 1 2푛 2푛+푘

훽 Define 휖2 for CTR similarly, for a 2푘 -smooth key-generation algorithm, where the notion of smoothness can be found in Section 4.1. Applying Theorem1 with 훼 = 훽/2푘 and ℎ = 푛 − 1, and note that 푞 ≤ 퐿,

1 3퐿퐵 3훽퐿2 훽푎푝 휖 ≤ + + + (9) 2 2푛/2 2푛 2푛+푘 2푘

Probability of bad transcripts. Let 풳1 be the random variable for the transcript in the ideal system. Let ℬ푗 denote the set of transcripts that vio- lates the 푗th constraint in badness. For the first constraint of badness, consider the following adversary 풜 attacking the mu-prf security of GMAC+[퐻, 퐸], with respect to the key-generation algorithm KeyGen. It runs 풜 and uses its New oracle to respond to the latter’s queries of the same type. For each encryption query (푖, 푁, 푀, 퐴) of 풜, adversary 풜 queries Eval(푖, 푁, 푀, 퐴) to get an answer 푇 , generates a ciphertext core 퐶′ of appropriate length, and then returns 푇 ‖ 퐶′ to 풜. For each verification query of 풜, adversary 풜 simply returns false. When 풜 finishes querying and asks for the keys, 풜 also finishes querying and gives 풜 what it receives. Then the transcript of 풜 in the ideal world has the same distribution as R1(풳1). Since adversary 풜 uses at most 푞 evaluation queries 49 whose total block length is at most 퐿, at most 퐵-block queries per user, and 푝 ideal-cipher queries, Pr[풳1 ∈ ℬ1] ≤ 휖1 . Next, for the second constraint of badness, let KeyGen[푘] be the key-generation algorithm such that, on input (st, aux), runs (퐾, st′) ← KeyGen(st, aux), and ′ 훽 then outputs (퐽, st ), where 퐽 is the 푘-bit suffix of 퐾. Then KeyGen[푘] is 2푘 - smooth. Consider the following adversary 풜* attacking the mu-ind security of SE, with respect to the key-generation algorithm KeyGen[푘]. It runs 풜 and uses its oracle New and Enc to respond to the latter’s queries of the same type. For each verification query of 풜, adversary 풜* simply returns false. When 풜 finishes querying and asks for the keys, then 풜* also finishes querying and gives 풜 the keys that it receives. Then the transcript of 풜* in the ideal world has the same * distribution as R2(풳1). Since adversary 풜 uses at most 푞 encryption queries whose total block length is at most 퐿, at most 퐵-block queries per user, and 푝 ideal-cipher queries, Pr[풳1 ∈ ℬ2] ≤ 휖2 . For the third constraint of badness, consider a sequence of encryption queries (푖1, 푀1, 푁1, 퐴1),..., (푖푞, 푀푞, 푁푞, 퐴푞) with answers 퐶1, . . . , 퐶푞 respectively. Fix ′ ′ 1 ≤ 푟, 푠 ≤ 푞. Let 퐾in ‖ 퐾out and 퐾in ‖ 퐾out be the keys of users 푖푟 and 푖푠 respectively. Consider the table generated by the 푟-th encryption query, and the eval entry generated by the 푠-th encryption query. Recall that this table is generated by (1) padding 퐶푟 with random bits to have full block length, and padding 푀푟 to have full block length, (2) parsing IV ‖ 퐶푟,1 ‖···‖ 퐶푟,푚 ← 퐶푟, and 푀푟,1 ‖···‖ 푀푟,푚 ← 푀푟, with |퐶푟,ℓ| = |푀푟,ℓ| = 푛, and (3) producing (퐾out, 푋1, 퐶푟,1 ⊕ 푀푟,1),..., (퐾out, 푋푚, 퐶푟,푚 ⊕ 푀푟,푚), with 푋ℓ ← add(IV, ℓ − 1). ′ We now compute the probability that 퐾out = 퐾out and 퐶푟,ℓ ⊕푀푟,ℓ = 푇 . Consider the following cases.

Case 1: 푟 ≥ 푠. Hence 퐶푟,ℓ is picked at random, independent of 푀푟,ℓ and 푇 . Since ′ KeyGen is 훽-pairwise AU, the chance that 퐾out = 퐾out and 퐶푟,ℓ ⊕ 푀푟,ℓ = 푇 is 푘+푛 −푛 at most 훽/2 if 푖푟 ̸= 푖푠, and at most 2 if 푖푟 = 푖푠.

Case 2: 푟 < 푠. Then 푇 is picked at random, independent of 푀푟,ℓ and 퐶푟,ℓ. Since ′ KeyGen is 훽-pairwise AU, the chance that 퐾out = 퐾out and 퐶푟,ℓ ⊕ 푀푟,ℓ = 푇 is 푘+푛 −푛 at most 훽/2 if 푖푟 ̸= 푖푠, and at most 2 if 푖푟 = 푖푠. ′ Thus in any case, the chance that 퐾out = 퐾out and 퐶푟,ℓ ⊕ 푀푟,ℓ = 푇 is at most 푘+푛 −푛 훽/2 if 푖푟 ̸= 푖푠, and at most 2 if 푖푟 = 푖푠. Since the total number of triples in all tables is at most 퐿, and there are at most 퐵 eval entries created due to encryp- tion queries for user 푖푠, and note that 푞 ≤ 퐿/2 (as each encryption/verification query consists of at least two blocks, one due to the associated data, and another due to the message/ciphertext), ∑︁ 훽퐿 훽퐵 훽퐿푞 훽푞퐵 훽퐿2 0.5훽퐿퐵 Pr[풳 ∈ ℬ ] ≤ + = + ≤ + . 1 3 2푘+푛 2푛 2푘+푛 2푛 2푘+푛 2푛 1≤푠≤푞 For the fourth constraint of badness, fix an entry (dec, 퐾, 푋, 푌 ) created by de- crypting a verification query of user 푗. Consider an entry (eval, 푖, 푁, 푀, 퐴, 푇 ). 50 Bose, Hoang and Tessaro

Let 퐾out be the 푘-bit suffix of the key 퐾푖 of user 푖, and note that 퐾 is the 푘-bit suffix of the key 퐾푗 of user 푗. There are two cases: Case 1: The verification query above is made before the the encryption query corresponding to the eval entry. Then 푇 is a random string, independent of 푌 . −푛 If 푖 = 푗 then 퐾 = 퐾out, and the chance that 푌 = 푇 is 2 . If 푖 ̸= 푗 then since KeyGen is 훽-pairwise AU, the chance that 퐾 = 퐾out and 푌 = 푇 is at most 훽/2푘+푛. Case 2: The verification query above is made after the the encryption query corresponding to the eval entry. Since there are at most 퐿 dec entries and at most 퐿 triples in the tables, given 푇 , there are still at least 2푛 − 2퐿 − 푝 ≥ 2푛−1 equally likely choices of 푌 . Hence if 푖 = 푗 then 퐾 = 퐾out, and the chance that 푌 = 푇 is at most 2/2푛. On the other hand, since KeyGen is 훽-pairwise AU, if 푘+푛 푖 ̸= 푗 then the chance that 퐾 = 퐾out and 푌 = 푇 is at most 2훽/2 . 푘+푛 Thus in both cases, the chance that 퐾 = 퐾out and 푌 = 푇 is at most 2훽/2 if 푖 ̸= 푗, and at most 2/2푛 if 푖 = 푗. Summing this over at most 푞 eval entries and at most 퐿 dec entries, and note that there are at most 퐵 eval entries per user,

2퐿퐵 2훽퐿푞 2퐿퐵 2훽퐿2 Pr[풳 ∈ ℬ ] ≤ + ≤ + . 1 4 2푛 2푘+푛 2푛 2푘+푛

For the fifth constraint of badness, consider an entry (vf, 푖, 푁, 퐶, 퐴, false) in 풳1, and let 푇 be the IV of 퐶 and 푀 be the associated decrypted message. Note that + if 풳1 ∈ ℬ5 then R1(풳1) is GMAC -good and R2(풳1) is CTR-good. Fix 푗 ≤ 푞. ′ ′ ′ There is at most one entry (eval, 푗, 푁 , 푀 , 퐴 , 푇 ) in R1(휏); otherwise R1(풳1) ′ ′ is not good, and thus 풳1 ̸∈ ℬ5. Let 퐾푖 = 퐾in ‖ 퐾out and 퐾푗 = 퐾in ‖ 퐾out. ′ If 푗 ̸= 푖 then the probability that 퐾out = 퐾out and xor(퐻(퐾in, 푀, 퐴), 푁) = ′ ′ ′ ′ 2푐훽·E[|푀|푛+|퐴|푛] xor(퐻(퐾in, 푀 , 퐴 ), 푁 ) is at most 2푛+푘 , because 퐻 is 푐-regular, xor ′ ′ is 2-regular, and KeyGen is 훽-pairwise AU. If 푖 = 푗 then 퐾in ‖ 퐾out = 퐾in ‖ 퐾out, and we consider three following cases. Case 1: (푀, 푁, 퐴) = (푀 ′, 푁 ′, 퐴′). Let 퐶′ be the answer of Enc(푗, 푁 ′, 푀 ′, 퐴′) as ′ ′ indicated in 풳1. For the blockcipher 퐸˜ above, since 퐶 = CTR[퐸].E(퐾out, 푀 ; 푇 ), ′ ′ we also have 퐶 = CTR[퐸˜].E(퐾out, 푀 ; 푇 ), due to the consistency between 퐸 퐸˜ and 퐸˜. On the other hand, recall that 푀 is generated by running SE.D (퐾out, 퐶). Since 푀 = 푀 ′ and 퐶 and 퐶′ share the same IV, we must have 퐶 = 퐶′. This means that the adversary queries Vf(푖, 푁, 퐶, 퐴) first, and then later queries Enc(푖, 푁 ′, 푀 ′, 퐴′) and accidentally gets the same answer 퐶. This case happens with probability at most 2−|퐶| ≤ 2−푛. Case 2: (푀, 퐴) = (푀 ′, 퐴′), but 푁 ̸= 푁 ′. Due to the injectivity of xor, this case cannot happen. Case 3: (푀, 퐴) ̸= (푀 ′, 퐴′). Since KeyGen is 훽-pairwise AU, 퐻 is 푐-AXU and xor is 2-regular and linear and injective,

′ ′ ′ Pr[xor(퐻(퐾in, 푀, 퐴), 푁) = xor(퐻(퐾in, 푀 , 퐴 ), 푁 )] 2푐훽 · E[︀|푀| + |퐴| + |푀 ′| + |퐴′| ]︀ 2푐훽 · (︀퐵 + E[︀|푀| + |퐴| ]︀)︀ ≤ 푛 푛 푛 푛 ≤ 푛 푛 . 2푛 2푛 51

As the three cases above are mutually exclusive, if 푖 = 푗 then the chance that (︀ [︀ ]︀)︀ ′ ′ ′ 2푐훽· 퐵+E |푀|푛+|퐴|푛 xor(퐻(퐾in, 푀, 퐴), 푁) = xor(퐻(퐾in, 푀 , 퐴 ), 푁 ) is at most 2푛 . Sum over all 푗 ≤ 푞, and then over all 푞 vf entries, and note that 퐵 ≥ 2 and 푞 ≤ 퐿/2 (as each encryption/verification query consists of at least two blocks, one due to the associated data, and another due to the message/ciphertext),

2푐훽푞퐿 2푐훽(퐿 + 푞퐵) 푐훽퐿2 2푐훽퐿퐵 Pr[풳 ∈ ℬ ] ≤ + ≤ + . 1 5 2푛+푘 2푛 2푛+푘 2푛 Finally, for the last constraint, consider an entry (vf, 푖, 푁, 퐶, 퐴, false) and let 푀 be the decrypted message associated with this entry. Let 퐾in ‖ 퐾out be the key of user 푖, and let 푇 be the IV of 퐶. Consider one entry (prim, 퐾, 푋, 푇, ·). Since KeyGen is 훽-pairwise AU, 퐻 is 푐-regular, and xor is 2-regular, the chance that 2훽푐·E[|푀|푛+|퐴|푛] 퐾 = 퐾out and 푋 = xor(퐻(퐾in, 푀, 퐴), 푁) is at most 2푛+푘 . Sum that over all vf entries and 푝 prim entries, 2푐훽퐿푝 Pr[풳 ∈ ℬ ] ≤ . 1 6 2푛+푘 Summing up,

6 ∑︁ Pr[풳1 is bad] ≤ Pr[풳1 ∈ ℬ푗] 푗=1 (2푐훽 + 0.5훽 + 2)퐿퐵 훽(푐 + 3)퐿2 + 2푐훽퐿푝 ≤ 휖 + 휖 + + 1 2 2푛 2푛+푘 1 훽푎푝 (3푐훽 + 7훽)퐿2 + 4훽푐퐿푝 (4푐훽 + 0.5훽 + 6)퐿퐵 ≤ + + + . 2푛/2 2푘 2푛+푘 2푛

Bounding transcript ratio. Fix a good transcript 휏 such that pS1 (휏) > 0. In particular, this means that there is no vf of answer true. Create the multisets 푆1, . . . , 푆5 as follows.

– For each entry (prim, 퐾, 푋, 푌, ·) in 휏, add a triple (퐾, 푋, 푌 ) to 푆1. – For each triple (퐾, 푋, 푌 ) in tables of 휏, add it to 푆2. – For each entry (dec, 퐾, 푋, 푌 ), if (퐾, 푋, 푌 ) ̸∈ 푆3 then add (퐾, 푋, 푌 ) to 푆3. – For each entry (eval, 푖, 푁, 푀, 퐴, 푇 ) in R1(휏), add (퐾out, 푋, 푇 ) to 푆4, where 퐾in ‖ 퐾out is the key of user 푖 in 휏, and 푋 = xor(퐻(퐾in, 푀, 퐴), 푁). – For each entry (vf, 푖, 푁, 퐶, 퐴, false) in 휏, if (퐾out, 푋, 푇 ) ∈/ 푆5 then add this triple to 푆5, where 푇 is the IV of 퐶, 퐾in ‖ 퐾out is the key of user 푖 in 휏, 푀 is the decrypted message associated with this entry indicated by 휏, and 푋 = xor(퐻(퐾in, 푀, 퐴), 푁). Due to (1) the goodness of 휏, (2) the fact that add can only produce outputs starting with 1 but xor produces output starting with 0, and (3) the way we generate dec entries,

– For each 푗 ≤ 5, the multiset 푆푗 contains no item twice, meaning that it is actually a set. 52 Bose, Hoang and Tessaro

– The sets 푆1, ··· , 푆5 are pairwise disjoint. ′ ′ – There are no triples (퐾, 푋, 푌 ) and (퐾, 푋 , 푌 ) in 푆1 ∪ 푆2 ∪ 푆3 ∪ 푆4 such that 푋 = 푋′ or 푌 = 푌 ′.

Now, the probability pS0 (휏) is the chance that all the following events happen: – Samp: If we query New using the queries as indicated in 휏, the generated keys will be the values indicated by 휏. – Real푗, for 1 ≤ 푗 ≤ 4: For each (퐾, 푋, 푌 ) ∈ 푆푗, querying 퐸퐾 (푋) returns 푌 . – Real5: For each (퐾, 푋, 푌 ) ∈ 푆5, querying 퐸퐾 (푋) does not return 푌 .

On the other hand, the probability pS1 (휏) is the chance Samp and Real1 and the following events happen:

– Ideal1: For the padding version of CTR, let 퐶1, . . . , 퐶푞 be the ciphertexts indicated by 휏. For the padding-free version of CTR, let 퐶1, . . . , 퐶푞 be the pre-truncated ciphertexts indicated by 휏.10 Then, if we sample 푞 random strings of length |퐶1|, ··· , |퐶푞| respectively, then we get 퐶1, . . . , 퐶푞 respec- tively. Note that |퐶1| + ··· + |퐶푞| = 푛(|푆2| + |푆4|). 푘 푛 푛 – Ideal2: Create a blockcipher 퐸˜ : {0, 1} × {0, 1} → {0, 1} as follows: for 푘 every 퐾 ∈ {0, 1} , sample 퐸˜(퐾, ·) ←$ Perm(푛), subject to the constraint that for every (퐾, 푋, 푌 ) ∈ 푆1 ∪ 푆2, we have 퐸˜(퐾) = 푌 . Now, for every ′ ′ ′ ′ ′ ′ (퐾 , 푋 , 푌 ) ∈ 푆3, if we query 퐸(퐾 , 푋 ) then we get 푌 .

For each 2 ≤ 푗 ≤ 5, let 푃푗 denote Pr[Real푗 | Real1 ∩ · · · Real푗−1]. As KeyGen does not use 퐸, event Samp is independent of other events, and thus p (휏) Pr[Real ∩ · · · ∩ Real ] 푃 · 푃 · 푃 · 푃 S0 = 5 1 = 2 3 4 5 . pS1 (휏) Pr[Ideal1 ∩ Ideal2 ∩ Real1] Pr[Ideal1 ∩ Ideal2 | Real1]

In the last ratio, since Ideal1 is independent of other events, the denominator can be factored to Pr[Ideal1] · Pr[Ideal2 | Real1]. Moreover, note that Pr[Ideal2 | Real1] = Pr[Real3 | Real1 ∩ Real2] = 푃3. Hence p (휏) 푃 · 푃 · 푃 S0 = 2 4 5 . pS1 (휏) Pr[Ideal1]

푘 For each 퐾 ∈ {0, 1} , let 푍1(퐾), 푍2(퐾), 푍3(퐾), 푍4(퐾) denote the number of triples (퐾, 푋, 푌 ) in 푆1, 푆2, 푆1 ∪ 푆2 ∪ 푆3, 푆4 respectively. Then

푍1(퐾)+푍2(퐾)−1 푗=푍3(퐾)+푍4(퐾)−1 ∏︁ ∏︁ 1 ∏︁ 1 푃 · 푃 = 2 4 2푛 − 푖 2푛 − 푗 푘 퐾∈{0,1} 푖=푍1(퐾) 푗=푍3(퐾)

∏︁ −푛·(푍2(퐾)+푍4(퐾)) −푛(|푆2|+|푆4|) ≥ 2 = 2 = Pr[Ideal1] . 퐾∈{0,1}푘

10 Given a table 풯 and a message 푀, the pre-truncated ciphertext can be obtained as follows. Suppose that 풯 contains (퐾, 푋1, 푌1),..., (퐾, 푋푚, 푌푚). Then the pre- ′ ′ truncated ciphertext is (푌1 ‖···‖ 푌푚) ⊕ 푀 , where 푀 is obtained by padding 0’s to 푀 to have full block length. 53

Thus

pS0 (휏) ≥ 푃5 . pS1 (휏)

We now give a lower bound for 푃5. Note that |푆1 ∪ · · · ∪ 푆4| ≤ 푝 + 퐿 + 푞 ≤ 2푛−1, because (i) there are 푝 ideal-cipher queries in 휏, contributing 푝 triples in 푆1, (ii) each encryption query (푖, 푁, 푀, 퐴) contributes one triple in 푆4, and at most (|푀|푛 + |퐴|푛) triples in 푆2, and (iii) each verification query (푖, 푁, 퐶, 퐴) contributes at most (|퐶|푛 + |퐴|푛) triples in 푆3. Now, for each (퐾, 푋, 푌 ) ∈ 푆5, there are only two cases. ′ ′ ′ Case 1: There is a triple (퐾, 푋 , 푌 ) ∈ 푆1 ∪ · · · ∪ 푆4 such that either (i) 푋 = 푋 but 푌 ′ ̸= 푌 , or (ii) 푌 ′ = 푌 but 푋′ ̸= 푋. In this case, given that 퐸 is consistent 푆1 ∪ · · · ∪ 푆4, if we query 퐸퐾 (푋) then the answer will not be 푌 . ′ ′ ′ Case 2: There is no triple (퐾, 푋 , 푌 ) ∈ 푆1 ∪ · · · ∪ 푆4 such that either 푋 = 푋 ′ or 푌 = 푌 . Hence, conditioning that 퐸 is consistent with 푆1 ∪ · · · ∪ 푆4, since 푛 푛−1 there are at least 2 − |푆1 ∪ · · · ∪ 푆4| ≥ 2 equally likely choices for 퐸퐾 (푋), the conditional probability that 퐸(퐾, 푋) = 푌 is at most 2/2푛.

Hence in both case, conditioning that 퐸 is consistent with 푆1 ∪ · · · ∪ 푆4, if we 푛 query 퐸퐾 (푋) then the conditional probability that we get 푌 is at most 2/2 . 푛 푛 By union bound, 푃5 ≥ 1 − |푆5| · 2/2 ≥ 1 − 2푞/2 . Hence

pS0 (휏) 2푞 0.5퐿퐵 ≥ 1 − 푛 ≥ 1 − 푛 . pS1 (휏) 2 2

F.1 Proof of Theorem5 We now discuss how to adapt the proof of Theorem4 to deal with a weakly regular hash 퐻. The definition of bad transcripts is exactly the same, andso is the bound on the transcript ratio; the changes are the probabilities that bad transcripts occur, specifically for events ℬ1, ℬ5, and ℬ6. Note that we assume an upper bound 푑 on the number of users re-using a particular nonce 푁, and this is going to be used below. Let 풳1 is the random variable for the transcript in the ideal system.

+ Analysis of ℬ1. Let 휖1 be the value that the GMAC proof uses to upper-bound the probability of bad transcripts, for any adversary 풜 that makes at most 푞 evaluation queries whose total block length is at most 퐿, at most 퐵-block queries per user, and 푝 ideal-cipher queries, and for any 훽-pairwise AU key-generation algorithm, assuming that each nonce is reused across at most 푑 users. As in the proof of Theorem4, Pr[풳1 ∈ ℬ1] ≤ 휖1 . The only change here is that now we need to use Theorem3 (instead of Theo- rem2) to obtain 휖1. In particular, applying Theorem3 with 휆 = 2 and note that 푞 ≤ 퐿/2, (1 + 2훽푐)퐿퐵 2훽푐퐿푝 + (2훽푐 + 훽)퐿2 푑(푝 + 퐿) 휖 ≤ + + . 1 2푛 2푛+푘 2푘 54 Bose, Hoang and Tessaro

Analysis of ℬ5. First, consider the case that 풳1 falls into ℬ5 due to some en- tries (vf, 푖, 푁, 퐶, 퐴, false) and (eval, 푗, 푁 ′, 푀 ′, 퐴′, 푇 ) such that either (1) (푀, 퐴) ̸= (휀, 휀) or (2) (푀 ′, 퐴′) ̸= (휀, 휀) or (3) (푀, 퐴) = (푀 ′, 퐴′) and 푖 = 푗, where 푀 is the decrypted message of the verification entry. As in the proof of Theorem4, 푐훽퐿2 2푐훽퐿퐵 this case happens with probability at most 2푛+푘 + 2푛 . Next consider an entry (vf, 푖, 푁, 퐶, 퐴, false) such that both decrypted message 푀 and associated data 퐴 are empty. Consider an entry (eval, 푗, 푁 ′, 푀 ′, 퐴′, 푇 ) ′ ′ such that (푀 , 퐴 ) = (휀, 휀), 푗 ̸= 푖, and 푇 is the IV of 퐶. Let 퐾in ‖ 퐾out and ′ ′ 퐾in ‖ 퐾out be the keys of users 푖 and 푗 respectively. Since 퐻 is weakly regular, ′ ′ ′ 푛 퐻(퐾in, 푀, 퐴) = 퐻(퐾in, 푀 , 퐴 ) = 0 . For these pair of entries to cause 풳1 to fall 푛 푛 ′ ′ into ℬ5, we must have xor(0 , 푁) = xor(0 , 푁 ), meaning that 푁 = 푁 , due to the injectivity of xor. Since the nonce 푁 is used across at most 푑 users, there are ′ at most 푑 choices for the index 푗. On the other hand, the chance that 퐾out = 퐾out is at most 2−푘. Summing this over 푑 choices of 푗, and over 푞 verification queries, we obtain a bound 푞푑/2푘 ≤ 퐿푑/2푘. Hence

푐훽퐿2 2푐훽퐿퐵 퐿푑 Pr[풳 ∈ ℬ ] ≤ + + . 1 5 2푛+푘 2푛 2푘

Analysis of ℬ6. First consider the case that some verification entry, in which either the decrypted message or the associated data is non-empty, causes 풳1 to fall into ℬ6. As in the proof of Theorem4, one can bound the chance that this 2푐훽퐿푝 case happens by 2푛+푘 . Next, consider an entry (vf, 푖, 푁, 퐶, 퐴, false), in which both the decrypted message 푀 and the associated data 퐴 are the empty string. For each entry (prim, 퐾, 푋, 푌, +), view it as throwing a ball into bin 푌 . Likewise, for each entry (prim, 퐾, 푋, 푌, −), view it as throwing a ball into bin 푋. Thus there are at most 푝 ≤ 2(1−휖)푛−1 throws. For each 푗-th throw, given the result of the prior throws, the conditional probability that the 푗-th ball lands into any particular bin is at most 21−푛. From Lemma 10, with probability at least 1 − 2−푛/2, each bin contains at most 푎 balls.

Let 푇 be the IV of 퐶 and let 퐾in ‖ 퐾out be the key of user 푖. Since 퐻 is 푛 weakly regular, 퐻(퐾in, 푀, 퐴) = 0 . From the balls-into-bins result above, there are at most 푎 balls in bin 푇 , and also at most 푎 balls in bin xor(0푛, 푁). Thus there are at most 2푎 entries (prim, 퐾, xor(0푛, 푁), 푇, ·). For each such entry, the −푘 chance that 퐾 = 퐾out is at most 2 . Hence the chance that the verification 푘 entry above causes 풳1 to fall into ℬ6 is at most 2푎/2 . Summing this across at most 푞 verification queries, we obtain a bound 2푎푞/2푘 ≤ 푎퐿/2푘. Hence

2푐훽퐿푝 푎퐿 Pr[풳 ∈ ℬ ] ≤ + . 1 6 2푛+푘 2푘

G Proof of Lemma3

Let 푟 = 푘/푛 ∈ {1, 2}. Suppose that 푅0, . . . , 푅5 are sampled uniformly without 15 푛 replacement from a set 푆 of size at least 16 · 2 . Pick an arbitrary string 퐾 ∈ 55

푛+푘 {0, 1} . Since KD1.Map outputs (푅0 ‖ 푅1 ‖ 푅2)[1 : 푛 + 푘], the chance that KD1.Map(푅0, . . . , 푅5) = 퐾 is at most 1 1 1 2 ≤ ≤ ≤ . 15 푛 푟+1 7 푛 푟+1 3 푛(푟+1) 푘+푛 ( 16 · 2 − 2) ( 8 · 2 ) (7/8) · 2 2

On the other hand, the chance that KD0.Map(푅0, . . . , 푅5) = 퐾 is at most 1 1 1 2 ≤ ≤ ≤ . 15 푛 푛/2 2(푟+1) 29 푛/2 2(푟+1) 6 푛(푟+1) 푘+푛 (⌊( 16 · 2 − 5)/2 ⌋) ( 32 · 2 ) (29/32) · 2 2 This concludes the proof.

H Proof of Proposition2

We will first construct adversaries 풜1 and 풜2 such that

mu-priv mu-mrae Adv (풜1) ≤ Adv (풜1), and AE,퐸 AE,KeyGen,퐸 mu-auth mu-mrae AdvAE,퐸 (풜2) ≤ 2 AdvAE,KeyGen,퐸(풜2) .

If we can do that, one can construct 풜 as follows. It picks a number 푎 ←$ {0, 1, 2}. If 푎 = 0 then it runs 풜1, uses its oracles to answer the latter’s queries accordingly, and outputs the same bit that 풜1 outputs. If 푎 ∈ {1, 2} then it runs 풜2, uses its oracles to answer the the latter’s queries accordingly, and outputs the same bit that 풜2 outputs. Then 1 2 Advmu-mrae (풜) = Advmu-mrae (풜 ) + Advmu-mrae (풜 ) AE,KeyGen,퐸 3 AE,KeyGen,퐸 1 3 AE,KeyGen,퐸 2 1 mu-priv 1 mu-auth ≥ Adv (풜1) + Adv (풜2) . 3 AE,퐸 3 AE,퐸

We now construct 풜1. Without loss of generality, assume that 풜1 does not re- peat a prior query, and assumes that for each encryption query (푖, 푁, 푀, 퐴), it must call New(·) at least 푖 times before, so that user 푖 was initialized. Ad- versary 풜1 initializes a counter 푣 ← 0 and a map 푉 = ⊥, and then runs 풜1. For each encryption query (푖, 푁, 푀, 퐴) of 풜1, if 푉 [푖, 푁] = ⊥ then 풜1 calls New(aux) with aux = (푖, 푁), updates 푉 [푖, 푁] ← 푣 + 1, and increments 푣. It returns Enc(푗, 푁, 푀, 퐴) to 풜1, with 푗 ← 푉 [푖, 푁]. Finally, when 풜1 outputs a bit then 풜1 outputs the same bit. Then

mu-priv mu-mrae Adv (풜1) ≤ Adv (풜1) . AE,퐸 AE,KeyGen,퐸

Next, we construct 풜2 as follows. Without loss of generality, assume that 풜2 does not repeat a prior query, and assumes that for each encryption/verification query (푖, 푁, ·, 퐴), it must call New(·) at least 푖 times before, so that user 푖 was initialized. Adversary 풜2 initializes a counter 푣 ← 0 and a map 푉 = ⊥, 56 Bose, Hoang and Tessaro and then runs 풜2. For each encryption/verification query (푖, 푁, 푋, 퐴) of 풜2, if 푉 [푖, 푁] = ⊥ then 풜2 calls New(aux) with aux = (푖, 푁), updates 푉 [푖, 푁] ← 푣+1, and increments 푣. If this is an encryption query then it returns Enc(푗, 푁, 푋, 퐴) to 풜2 with 푗 ← 푉 [푖, 푁]. Otherwise it calls Vf(푗, 푁, 푋, 퐴), with 푗 ← 푉 [푖, 푁]. Finally, 풜2 will output 1 if and only if some verification query returns true. Let 푐 be the challenge bit of game Gmu-auth(풜 ). Then AE,퐸 2 Pr[Gmu-mrae (풜 ) | 푐 = 1] = Pr[Gmu-auth(풜 )] . AE,KeyGen,퐸 2 AE,퐸 2

On the other hand, if 푐 = 0 then 풜2 always receives false for any verification query. Thus 1 Pr[Gmu-mrae (풜 ) | 푐 = 0] = . AE,KeyGen,퐸 2 2 Summing up, 1 Advmu-mrae (풜 ) = Advmu-auth(풜 ) AE,KeyGen,퐸 2 2 AE,퐸 2 as claimed.

I Proof of Lemma4 For two outputs 퐾 and 퐾′ generated by KeyGen, by symmetry, there are only four cases. Case 1: 퐾 and 퐾′ are independent, random strings. For any two strings (퐽, 퐽 ′) ∈ ({0, 1}푘+푛)2, the chance that (퐾, 퐾′) = (퐽, 퐽 ′) is 1/22(푘+푛). ′ 푘+푛 Case 2: 퐾 = KD[푘](휋푖, 푁) for some 휋푖 ←$ Perm(푛), and 퐾 ←$ {0, 1} . For any two strings (퐽, 퐽 ′) ∈ ({0, 1}푘+푛)2, since KD[퐸] is 2-unpredictable, the chance that (퐾, 퐾′) = (퐽, 퐽 ′) is at most 2 1 2 · = . 2푘+푛 2푘+푛 22(푘+푛) ′ ′ Case 3: 퐾 = KD[푘](휋푖, 푁) for some 휋푖 ←$ Perm(푛), and 퐾 ←$ KD[푘](휋푖, 푁 ), with 푁 ̸= 푁 ′. For any two strings (퐽, 퐽 ′) ∈ ({0, 1}푘+푛)2, since KD[퐸] is 2- unpredictable, the chance that 퐾 = 퐽 is at most 2/2푛+푘. For 푠 ∈ {0,..., 5}, let ′ ′ 푅푠 ← pad(푁, 푠) and 푅푠 ← pad(푁 , 푠). Given (푅0, 휋푖(푅0)),..., (푅5, 휋푖(푅5)), the ′ ′ values of 휋푖(푅0), . . . , 휋푖(푅5) are sampled uniformly without replacement from a 푛 15 푛 set of at least 2 −6 ≥ 16 ·2 . Since KD[퐸] is 2-unpredictable, given that 퐾 = 퐽, the conditional probability that 퐾′ = 퐽 ′ is at most 2/2푘+푛. Hence the chance that 퐾 = 퐽 and 퐾′ = 퐽 ′ is at most 2 2 4 · = . 2푘+푛 2푘+푛 22(푘+푛) ′ ′ Case 4: 퐾 = KD[푘](휋푖, 푁) and 퐾 ←$ KD[푘](휋푗, 푁 ), for 휋푖, 휋푗 ←$ Perm(푛). For any two strings (퐽, 퐽 ′) ∈ ({0, 1}푘+푛)2, since KD[퐸] is 2-unpredictable, the chance that (퐾, 퐾′) = (퐽, 퐽 ′) is at most 2 2 4 · = . 2푘+푛 2푘+푛 22(푘+푛) Combining all cases, KeyGen is indeed 4-pairwise AU. 57

J Proof of Lemma5

We shall use the H-coefficient technique. Let the real system implement game dist GKD[퐸] for challenge bit 푏 = 1 (meaning Eval is always implemented via KD[퐸]), dist and let the ideal system implement game GKD[퐸] for challenge bit 푏 = 0, (mean- ing Eval is always implemented via KD[푘]). Without loss of generality, sup- pose that 풜 never repeats a prior query. Since we consider computationally unbounded adversaries, without loss of generality, assume that the adversary is deterministic. Assume that 풜 makes no redundant queries, meaning that if 풜 queries (퐾, 푥) to 퐸 to get 푦, then it will not query (퐾, 푦) to 퐸−1 to get 푥, and vice versa. Assume that for each evaluation query (푖, 푁), the adversary called New() at least 푖 times before, so that the key 퐾푖 was initialized. When the adversary finishes querying, in the real world, we grant it thekeys 푘 퐾1, 퐾2, ··· . In the ideal world, we instead grant it strings 퐾1, 퐾2, · · · ←$ {0, 1} independent of anything else. This can only help the adversary. Besides the revealed keys, a transcript consists of the following information:

– Evaluation queries: For each query (푖, 푁) to Eval, we will store six entries (eval, 푖, 푥0, 푦0),...,, (eval, 푖, 푥5, 푦5), where each 푥푠 = pad(푁, 푠). For each 푠 ∈

{0,..., 5}, in the real world, 푦푠 = 퐸퐾푖 (푥푠), and in the ideal world, 푦푠 = 휋푖(푥푠), where 휋푖 is the secret permutation of user 푖. Clearly, the answer for this query in both worlds is KD.Map(푦0, . . . , 푦5). Thus we may grant the adversary more information that what it is supposed to receive, but this only helps the adversary. There are 6푞 eval entries. – Ideal-cipher queries: For each query (퐾, 푥) to 퐸 for answer 푦, we store a corresponding entry (prim, 퐾, 푥, 푦, +). Likewise, for a query (퐾, 푦) to 퐸−1 for answer 푥, we store a corresponding entry (prim, 퐾, 푥, 푦, −).

A transcript does not explicitly record the New() queries of 풜, because the adversary is deterministic, and New returns no output.

Defining bad transcripts. We say that a transcript is bad if one of the following happens:

1. There are entries (eval, 푖, 푥′, 푦′) and (prim, 퐾, 푥, 푦, +) such that 퐾 is the key of user 푖 as indicated by the transcript, and 푥 = 푥′. 2. There are entries (eval, 푖, 푥′, 푦′) and (prim, 퐾, 푥, 푦, −) such that 퐾 is the key of user 푖 as indicated by the transcript, and 푥 = 푥′. 3. There are entries (eval, 푖, 푥′, 푦′) and (prim, 퐾, 푥, 푦, +) such that 퐾 is the key of user 푖 as indicated by the transcript, and 푦 = 푦′. 4. There are entries (eval, 푖, 푥′, 푦′) and (prim, 퐾, 푥, 푦, −) such that 퐾 is the key of user 푖 as indicated by the transcript, and 푦 = 푦′. 5. There are entries (eval, 푖, 푥, 푦) and (eval, 푗, 푥, 푦′) of the same input 푥, with 푖 ̸= 푗, such that according to the transcript, users 푖 and 푗 have the same key. 6. There are entries (eval, 푖, 푥, 푦) and (eval, 푗, 푥′, 푦) of the same output 푦, with 푖 ̸= 푗, such that according to the transcript, users 푖 and 푗 have the same key. 58 Bose, Hoang and Tessaro

If a transcript is not bad then we say that it is good.

Probability of bad transcripts. Let 풳1 be the random variable for the transcript in the ideal system. We now bound the probability that 풳1 is bad. Let ℬ푗 be the set of transcripts that violates the 푗th constraint of badness. We first bound the probability that 풳1 meets the first constraint of badness. Consider an entry (prim, 퐾, 푥, 푦, +) in 풳1. Since the adversary is 푑-repeating, there are at most 푑 entries (eval, 푖, 푥, 푦′) of the same input 푥, and for such an −푘 entry, the chance that 퐾푖 = 퐾 in the ideal system, is exactly 2 . Summing this over at most 푝 ideal-cipher queries, we have 푑푝 Pr[풳 ∈ ℬ ] ≤ . 1 1 2푘

Next, we bound the probability that 풳1 meets the second constraint of bad- ness. View each entry (eval, 푖, 푥, 푦) as throwing a ball into bin 푦. Note that our 6푞 ≤ 2(1−휖)푛−1 throws are inter-dependent, but for each 푗-th throw, condition- ing on the result of the prior throws, the chance that the 푗-th ball falls into any particular bin is at most 1/(2푛 − 6푞) ≤ 21−푛. Suppose that each bin contains at most 푎 balls, which happens with probability at least 1 − 2−푛/2, according to Lemma 10. Consider an entry (prim, 퐾, 푥, 푦, −) in 풳1. There are at most 푎 entries (eval, 푖, 푥′, 푦) of the same output 푦, and for such an entry, the chance −푘 that 퐾푖 = 퐾 in the ideal system, is exactly 2 . Summing this over at most 푝 ideal-cipher queries, we have 1 푎푝 Pr[풳 ∈ ℬ ] ≤ + . 1 2 2푛/2 2푘 Next, for the third constraint of badness, consider entries (prim, 퐾, 푥, 푦, +) and ′ ′ (eval, 푖, 푥 , 푦 ) in 풳1. If the ideal-cipher query is made after the evaluation query then given 푦′, the random variable 푦 can take at least 2푛 −6푞 −푝 ≥ 2푛−1 equally likely values. If the ideal-cipher query is made before the evaluation query then given 푦, the random variable 푦′ can take at least 2푛 − 6푞 − 푝 ≥ 2푛−1 equally ′ likely values. In either case, the chance that 푦 = 푦 and 퐾 = 퐾푖 in the ideal system is at most 2/2푘+푛. Summing this over at most 푝 ideal-cipher queries and 6푞 evaluation entries, 12푝푞 Pr[풳 ∈ ℬ ] ≤ . 1 3 2푛+푘 For the fourth constraint of badness, consider entries (prim, 퐾, 푥, 푦, −) and ′ ′ (eval, 푖, 푥 , 푦 ) in 풳1. If the ideal-cipher query is made after the evaluation query then given 푥′, the random variable 푥 can take at least 2푛 −6푞 −푝 ≥ 2푛−1 equally likely values. If the ideal-cipher query is made before the evaluation query then given 푥, the random variable 푥′ can take at least 2푛 − 6푞 − 푝 ≥ 2푛−1 equally ′ likely values. In either case, the chance that 푥 = 푥 and 퐾 = 퐾푖 in the ideal system is at most 2/2푘+푛. Summing this over at most 푝 ideal-cipher queries and 6푞 evaluation entries, 12푝푞 Pr[풳 ∈ ℬ ] ≤ . 1 4 2푛+푘 59

For the fifth constraint of badness, consider a nonce 푁. For each 푁 ∈ {0, 1}푟, let 푄푁 ≤ 푑 be the random variable for the number of evaluation queries that use nonce 푁. For each 푥 ∈ {pad(푁, 0),..., pad(푁, 5)}, there are at most

(︂푄 )︂ (푄 )2 푑푄 푁 ≤ 푁 ≤ 푁 2 2 2 pairs of entries (eval, 푖, 푥, 푦) and (eval, 푗, 푥, 푦′), with 푖 ̸= 푗. For each such pair, −푘 the chance that 퐾푖 = 퐾푗 in the ideal system is 2 . Hence

∑︁ 6푑 · E[푄푁 ] 1 3푑 [︁∑︁ ]︁ 3푑푞 Pr[풳 ∈ ℬ ] ≤ · = · E 푄 ≤ , 1 5 2 2푘 2푘 푁 2푘 푁 푁 ∑︀ where the last inequality is due to the fact that 푁 푄푁 is exactly the total number of evaluation queries. For the last constraint of badness, note that for any entries (eval, 푖, 푥, 푦) and (eval, 푗, 푥′, 푦′), if 푖 ̸= 푗 then 푦 and 푦′ are independent, 푛 uniformly distributed over {0, 1} . For each such pair, the chance that 퐾푖 = 퐾푗 and 푦 = 푦′ is 2−(푘+푛). Summing over at most

(︂6푞)︂ ≤ 18푞2 2 pairs of eval entries, 18푞2 Pr[풳 ∈ ℬ ] ≤ 1 6 2푘+푛 Summing up,

6 ∑︁ 1 24푝푞 + 18푞2 푎푝 + 푑(푝 + 3푞) Pr[풳 is bad] ≤ Pr[풳 ∈ ℬ ] ≤ + + . 1 1 푗 2푛/2 2푘+푛 2푘 푗=1

Transcript ratio. Let S0 be the real system, and S1 be the ideal system. We now show that

pS0 (휏) ≥ pS1 (휏)

for any good transcript 휏 such that pS1 (휏) > 0. Fix such a transcript 휏. Note that in computing pS(휏), for S ∈ {S0, S1}, we can ignore the sign of the ideal- cipher entries, and treat that as a forward query. For a key 퐾 ∈ {0, 1}푘, let 푆1(퐾) = {(eval, 푖, 푥, 푦) | 퐾푖 = 퐾}, and 푆2(퐾) = {(prim, 퐾, 푥, 푦, ·)}. Since 휏 is 푘 good, |푆1(퐾)| is divisible by 6 for every 퐾 ∈ {0, 1} . Suppose that 휏 contains exactly 푢 users, 푞 evaluation queries, and 푝 ideal-cipher queries. Then

|푆2(퐾)|−1 1 ∏︁ ∏︁ 1 p (휏) = 2−푘푢 · · . S1 (︁ )︁푞 푛 푛 푛 2 − 푖 2 ··· (2 − 5) 퐾∈{0,1}푘 푖=0 60 Bose, Hoang and Tessaro

In the real world, because the transcript is good, for each key 퐾, the sets 푆1(퐾) and 푆2(퐾) call 퐸퐾 on different inputs, and thus

|푆1(퐾)|+|푆2(퐾)|−1 ∏︁ ∏︁ 1 p (휏) = 2−푘푢 S0 2푛 − 푖 퐾∈{0,1}푘 푖=0

|푆2(퐾)|−1 ∏︁ 1 ∏︁ 1 ≥ 2−푘푢 . (︁ )︁|푆1(퐾)|/6 2푛 − 푖 퐾∈{0,1}푘 2푛 ··· (2푛 − 5) 푖=0

Since ∑︁ |푆1(퐾)| = 6푞, 퐾∈{0,1}푘 it follows that pS0 (휏) ≥ pS1 (휏).

K Proof of Lemma7

Without loss of generality, assume that if 풜0 queries Enc(푖, 푁, 푀, 퐴) for an answer 퐶, then later it will not query Vf(푖, 푁, 퐶, 퐴). We now construct 풜1. Adversary 풜1 runs 풜0, and uses its Enc and New oracles to respond to the latter’s queries accordingly. For each verification query (푖, 푁, 퐶, 퐴) of 풜0, if there ′ is no prior encryption query (푖, 푁, 푀, 퐴 ) then 풜1 simply ignores it. Otherwise, ′ 풜1 uses its Vf oracle to respond. Finally, when 풜0 outputs a bit 푏 , 풜1 outputs the same bit. Next, we construct 풜2. Adversary 풜2 runs 풜0, and uses its Enc and New oracles to respond to the latter’s queries accordingly. For each verification ′ query (푖, 푁, 퐶, 퐴) of 풜0, if there is some prior encryption query (푖, 푁, 푀, 퐴 ) then 풜2 simply ignores it. Otherwise, 풜2 uses its Vf oracle to respond. When 풜0 outputs a bit, 풜2 outputs the same bit. Then

mu-auth mu-auth mu-auth AdvAE,퐸 (풜0) ≤ AdvAE,퐸 (풜1) + AdvAE,퐸 (풜2) .

mu-auth We now bound AdvAE,퐸 (풜1) by building an adversary 풜 for distinguishing KD[퐸] and KD[푘]. Adversary 풜 simulates game Gmu-auth(풜 ), but each time it AE,퐸 1 needs to generate a session key, it uses its Eval oracle instead of KD[퐸]. However, if 풜 previously queried Eval(푖, 푁) for an answer 퐾, next time it simply uses 퐾 without querying. When 풜1 outputs a bit, adversary 풜 outputs the same dist answer. Let 푐 be the challenge bit in game GKD[퐸](풜). Then

Pr[Gdist (풜) ⇒ true | 푐 = 1] = Pr[Gmu-auth(풜 )], and KD[퐸] AE 1 dist mu-auth Pr[GKD[퐸](풜) ⇒ false | 푐 = 0] = Pr[GKtE[KD[푘],AE],퐸(풜1)] . Subtracting, we get 1 Advdist (풜) = (︀Advmu-auth(풜 ) − Advmu-auth (풜 ))︀ . KD[퐸] 2 AE,퐸 1 KtE[KD[푘],AE],퐸 1 61

Note that 풜 makes at most 푝 + 퐿 ≤ 2푛−4 ideal-cipher queries, and 푞 Eval queries. Moreover, 풜 is 푑-repeating. Hence using Lemma5, 1 24(퐿 + 푝)푞 + 18푞2 푎(퐿 + 푝) + 푑(퐿 + 푝 + 3푞) Advdist (풜) ≤ + + . KD[퐸] 2푛/2 2푘+푛 2푘 Putting this all together,

mu-auth mu-auth mu-auth AdvAE,퐸 (풜0) ≤ AdvKtE[KD[푘],AE],퐸(풜1) + AdvAE,퐸 (풜2) 2 48(퐿 + 푝)푞 + 36푞2 2(푎 + 푑)퐿 + 2(푎 + 푑)푝 + 6푑푞 + + + . 2푛/2 2푘+푛 2푘 This concludes the proof.

L Proof of Lemma8

Without loss of generality, assume that if the adversary makes an encryption query (푖, 푁, 푀, 퐴) for an answer 퐶, then later it will not query Vf(푖, 푁, 퐶, 퐴). Assume that it always calls all 푞 New() queries at the beginning. Assume that the adversary does not repeat prior queries, and does not make redudant ideal- cipher queries.

Transition to game 퐺0. We will construct from 풜 another adversary 풜 that plays game 퐺0 as shown in Fig. 10. The adversary is given oracle access to 퐸 and its inverse, and New() as usual; the latter initializes a user 푣 with mas- 푛 ter key 퐾푣 ←$ {0, 1} upon each call. It is also given another evaluation or- acle Eval(푖, 푁) that implements KD[퐸](퐾푖, 푁). The adversary has to output a tuple (퐼, 푁, 퐶, 퐴) of vectors. We require that 퐼[푗] ∈ {1, . . . , 푣} for every 푗, and if the adversary previously queried Eval(푖, 푁) and then (퐼[푗], 푁[푗]) ̸= (푖, 푁) for all 푗 ≤ |퐼|. The game then iterates through every verification query 퐸 (퐼[푗], 푁[푗], 퐶[푗], 퐴[푗]), and try to decrypt AE.D (퐾퐼[푗], 푁[푗], 퐶[푗], 퐴[푗]). If some verification query results in a non-⊥ answer then the game returns true, meaning the adversary wins the game. Otherwise the game returns false. We now construct the adversary 풜 as shown in Fig. 11. It runs 풜 and uses its New oracle to respond to the latter’s queries of the same type, and maintains a counter 푣 starting at 0. For each encryption query (푖, 푁, 푀, 퐴) of 풜, adversary 풜 will first query Eval(푖, 푁) to get the session key 퐾, and then computes 퐶 ← AE.E퐸(퐾, 푁, 푀, 퐴) and returns the answer 퐶 to 풜. (However, if it previously queried Eval(푖, 푁) before then it simply reuses the prior answer as 퐾.) For each decryption query (푖, 푁, 퐶, 퐴) of 풜, adversary 풜 increments 푣, and updates (퐼[푣], 푁[푣], 퐶[푣], 퐴[푣]) ← (푖, 푁, 퐶, 퐴). Finally, when 풜 terminates, 풜 outputs (퐼, 푁, 퐶, 퐴) that it has maintained. Since 풜 is simple, 풜 does not violate the requirements of game 퐺0. Moreover, 풜 makes at most 푞 evaluation queries, 퐿+푝 ideal-cipher queries (but only 푝 of them are backward queries), and 푞 verification queries. For this constructed adversary 풜,

mu-auth Pr[퐺0] = AdvAE,퐸 (풜) . 62 Bose, Hoang and Tessaro

Game 퐺0 New() New,Eval,퐸,퐸−1 푘 푣 ← 0; 푆 ←$ ∅; (퐼, 푁, 퐶, 퐴) ←$ 풜 푣 ← 푣 + 1; 퐾푣 ←$ {0, 1} For 푗 = 1 to |퐼| do Eval(푖, 푁) If 퐼[푗] ̸∈ {1, . . . , 푣} then return false //Implement KD[퐸](퐾푖, 푁) If (퐼[푗], 푁[푗]) ∈ 푆 then return false If 푖 ̸∈ {1, . . . , 푣} then return ⊥ For 푗 = 1 to |퐼| do 퐾 ← KD[퐸](퐾 , 푁) 퐸 푖 푀 ← AE .D(퐾퐼[푗], 푁[푗], 퐶[푗], 퐴[푗]) 푆 ← 푆 ∪ {(푖, 푁)} If 푀 ̸= ⊥ then return true Return 퐾 Return false

Fig. 10: Game 퐺0 in the proof of Lemma8.

New,Eval,퐸,퐸−1 Adversary 풜 Enc(푖, 푁, 푀, 퐴) −1 푣 ← 0; 풜New,Enc,Vf,퐸,퐸 퐾 ← Eval(푖, 푁) 퐸 Return (퐼, 푁, 퐶, 퐴) 퐶 ← AE.E (퐾, 푁, 푀, 퐴) Return 퐶 Vf(푖, 푁, 퐶, 퐴) 푣 ← 푣 + 1; 퐼[푣] ← 푖 푁[푣] ← 푁; 퐶[푣] ← 퐶; 퐴[푣] ← 퐴

Fig. 11: Constructed adversary 풜 in the proof of Lemma8.

Bounding Pr[퐺0]. Each time 풜 makes a forward query 퐸퐾 (푋), if (1) there is some 푁 ∈ 풩 such that 푋 ∈ 훺 = {pad(푁, 0),..., pad(푁, 5)}, (2) there is no prior backward query 퐸−1(퐾, 푌 ) for answer 푋′ ∈ 훺∖{푋}, then we immediately * ′ grant the adversary the free queries 퐸퐾 (푋 ), for all 푋 ∈ 훺∖{푋}. Thus the adversary makes at most 6(퐿 + 푝) ideal-cipher queries, but at most 푝 of them are backward ones. Assume that 풜 does not repeat prior queries, and it does not make redundant ideal-cipher queries: if it gets 퐸(퐾, 푋) for answer 푌 then it will not later query 퐸−1(푌 ) to get answer 푋 again, and vice versa. Recall that this adversary makes all 푞 New() queries at the beginning.

Let 퐺1 be the following variant of game 퐺0. In 퐺1, the evaluation oracle imple- ments KD[푘]. Specifically, when New() initializes user 푣, it also samples a secret permutation 휋푣 ←$ Perm(푛) along with the key 퐾푣. On query Eval(푖, 푁), the oracle lets 푋푠 ← pad(푁, 푠) for every 푠 ∈ {0,..., 5}, computes 푌푠 ← 휋푖(푋푠), cre- ates internal entries (prim, 퐾푖, 푋푠, 푌푠, +), and returns KD.Map(푌0, . . . , 푌5). For each query 퐸(퐾, 푋) for answer 푌 , the game creates an entry (prim, 퐾, 푋, 푌, +). Likewise, for each query 퐸−1(퐾, 푌 ) for answer 푋, the game creates an entry (prim, 퐾, 푋, 푌, −). So there are at most 6푄 prim entries, where 푄 = 퐿+푝+푞. We say that the entries are incompatible if there are different entries (prim, 퐾, 푋, 푌, ·) and (prim, 퐾, 푋′, 푌 ′, ·) of the same key 퐾 such that either 푋 = 푋′ or 푌 = 푌 ′. If incompatibility happens then game 퐺1 returns false, meaning that the adversary loses the game. Otherwise, the game programs 퐸 to be consistent with the prim 63 entries, and then processes the verification queries as in game 퐺0. In particular, 퐸 in decrypting AE .D(퐾퐼[푗], 푁[푗], 퐶[푗], 퐴[푗]), the KDF of AE is still KD[퐸]. * To bound the gap between games 퐺0 and 퐺1, we construct an adversary 풜 that dist * tries to distinguish KD[퐸] and KD[푘], as defined in game GKD[퐸](풜 ), but it will be additionally granted the master keys 퐾1, . . . , 퐾푞 when it finishes querying. Note that Lemma5 still applies to this key-revealing setting. The goal of 풜* is to simulate 퐺0 if it interacts with KD[퐸], and simulate 퐺1 if it interacts with KD[푘]. To achieve that, it runs 풜, uses its oracles to answer the queries of the latter of the same type, and maintains prim entries for the ideal-cipher queries. When 풜 * finishes querying, 풜 asks to be granted master keys 퐾1, . . . , 퐾푞; at this point it is not allowed to make further queries. Adversary 풜* now creates prim entries corresponding to the past evaluation queries. If there are entries (prim, 퐾, 푋, 푌, ·) and (prim, 퐾, 푋′, 푌 ′, ·) of the same key 퐾, but either (1) 푋 = 푋′ but 푌 ̸= 푌 ′, or (2) 푋 ̸= 푋′ but 푌 = 푌 ′, then 풜* terminates and outputs 0, indicating that it has been interacting with KD[푘]. Otherwise, it will process the verification queries of 풜 as in game 퐺0. However, recall that at this point it cannot make further queries to 퐸/퐸−1. So during the handling of the verification queries, when it is supposed to query 퐸(퐾, 푋), if there is an entry (prim, 퐾, 푋, 푌, ·) then it simply uses the answer 푌 without querying 퐸 at all. If there is no such entry then it 푛 * picks an answer 푌 ←$ {0, 1} ∖푆, where 푆 is the set of all strings 푌 such that there is an entry (prim, 퐾, 푋*, 푌 *, ·), and creates a new entry (prim, 퐾, 푋, 푌, +). If there is a verification query that results in anon-⊥ answer then 풜* outputs 1, indicating that it has been interacting with KD[퐸]. Otherwise it outputs 0. We now analyze the advantage of 풜*. Let 푏 be the challenge bit in the game dist * GKD[퐸](풜 ). Then

dist * Pr[GKD[퐸](풜 ) ⇒ true | 푏 = 1] = Pr[퐺0] . On the other hand, we claim that

푞(18푞 + 144퐿 + 144푝) Pr[Gdist (풜*) ⇒ false | 푏 = 0] ≤ Pr[퐺 ] + . (10) KD[퐸] 1 2푘+푛 Subtracting, we obtain

푞(18푞 + 144퐿 + 144푝) Advdist (풜*) ≥ Pr[퐺 ] − Pr[퐺 ] − . KD[퐸] 0 1 2푘+푛

We now justify Equation (10). Note that the difference between Pr[퐺1] and dist * Pr[GKD[퐸](풜 ) ⇒ false | 푏 = 0] is bounded by the chance that there are distinct entries (prim, 퐾, 푋, 푌, ·) and (prim, 퐾′, 푋′, 푌 ′, ·) such that 퐾 = 퐾′, 푋 = 푋′, ′ and 푌 = 푌 , because in that case, game 퐺1 will terminate prematurely due to incompatibility, but 풜* still proceeds into handling the verification queries. Due to symmetry, there are only three cases. Case 1: The first entry is created by some Eval(푖, 푁) and the second entry ′ by Eval(푗, 푁), with 푖 ̸= 푗. Since 휋푖 and 휋푗 are independent, 푌 and 푌 are 64 Bose, Hoang and Tessaro independent, uniformly random strings. Since 퐾푖 and 퐾푗 are independent with ′ −(푘+푛) 휋푖 and 휋푗, the chance that 퐾푖 = 퐾푗 and 푌 = 푌 is at most 2 . Summing over at most (︂6푞)︂ ≤ 18푞2 2 pairs of entries created by the 푞 evaluation queries, the chance this case happens is at most 18푞2/2푘+푛. Case 2: The first entry is created by some Eval(푖, 푁) and the second entry by querying 퐸(퐾′, 푋′). If the evaluation query is made first then given 푌 , there are still at least 2푛 − 6푄 ≥ 2푛−1 equally likely choices for 푌 ′, and thus the chance ′ ′ 푘+푛 that 퐾푖 = 퐾 and 푌 = 푌 is at most 2/2 . Likewise, if the ideal-cipher query is made first then given 푌 ′, there are still at least 2푛−1 equally likely choices ′ ′ 푘+푛 for 푌 , and thus the chance that 퐾푖 = 퐾 and 푌 = 푌 is also at most 2/2 . Summing this over 36(퐿 + 푝)푞 possible pairs, the chance that this case happens is at most 72(퐿 + 푝)푞/2푘+푛. Case 3: The first entry is created by some Eval(푖, 푁) and the second entry by querying 퐸−1(퐾′, 푌 ′). Similar to case 2, the chance that this case happens is at most 72(퐿 + 푝)푞/2푘+푛. Combining all cases leads us to Equation (10). On the other hand, 풜* is 푑- repeating and makes at most 푞 evaluation queries, and 6(퐿 + 푝) ≤ 2푛−4 ideal- cipher queries. From Lemma5,

1 144(퐿 + 푝)푞 + 18푞2 6푎(퐿 + 푝) + 푑(6퐿 + 6푝 + 3푞) Advdist (풜*) ≤ + + . KD[퐸] 2푛/2 2푘+푛 2푘 Hence 1 288(퐿 + 푝)푞 + 36푞2 6푎(퐿 + 푝) + 푑(6퐿 + 6푝 + 3푞) Pr[퐺 ] − Pr[퐺 ] ≤ + + . 0 1 2푛/2 2푛+푘 2푘

What remains is to bound Pr[퐺1].

Bounding Pr[퐺1]. Consider the following balls-into-bins game. For each entry (prim, 퐾, 푋, 푌, +), view this as throwing a ball to bin 푌 . Likewise, for each entry (prim, 퐾, 푋, 푌, −), view this as throwing a ball to bin 푋. Thus there are at most 6푄 ≤ 2(1−휖)푛−1 throws. For each 푗-th throw, given the result of the prior throws, the conditional probability that the 푗-th ball lands into any particular bin is at most 21−푛. From Lemma 10, with probability at least 1 − 2−푛/2, each bin contains at most 푎 balls. Consider the following balls-into-bins game. For each query Eval(푖, 푁) for answer 푍, we view this as throwing a ball into bin 푍[푛 + 1 : 푛 + 푘]. Likewise, for each 6-tuple (prim, 퐾, pad(푁, 0), 푅0, +),..., (prim, 퐾, pad(푁, 1), 푅5, +) created by querying 퐸, we view this as throwing a ball into bin 푍[푛 + 1 : 푛 + 푘], where (1−휖)푘−1 1 푛 푍 ← KD.Map(푅0, . . . , 푅5). Thus there are at most 푄 ≤ min{2 , 16 · 2 } throws. For each 푗-th throw, given the result of the prior throws, since KD is 2-unpredictable, the conditional probability that the 푗-th ball lands into any 65 particular bin is at most 21−푘. From Lemma 10, with probability at least 1 − 2−푘/2 ≥ 1 − 2−푛/2, each bin contains at most 푎 balls. We claim that for each 푗 ≤ 푞, the probability that the verification query (퐼[푗], 푁[푗], 퐶[푗], 퐴[푗]) makes game 퐺1 return true is at most

11 8푎 + 7푎2 (︁푛푎 48푐(퐿 + 푝 + 푞))︁ + + E[︀|퐶[푗]| + |퐴[푗]| ]︀ + . (11) 2푛 2푘 푛 푛 2푘 2푛+푘 This claim will be justified later. Summing over 푞 verification queries, and ac- counting for the 2/2푛/2 probability due to the two balls-into-bins games,

2 11푞 (8푎 + 7푎2)푞 푛푎퐿 48푐(퐿 + 푝 + 푞)퐿 Pr[퐺 ] ≤ + + + + . 1 2푛/2 2푛 2푘 2푘 2푛+푘 Hence 3 11푞 288(퐿 + 푝)푞 + 36푞2 + 48푐(퐿 + 푝 + 푞)퐿 Advmu-auth(풜) ≤ + + AE,퐸 2푛/2 2푛 2푛+푘 (8푎 + 7푎2 + 3푑)푞 + (푛푎 + 6푎 + 6푑)퐿 + 6(푎 + 푑)푝 + . 2푘 Below, we will prove Equation (11).

Handling each verification query. Fix 푗* ≤ 푞, and let (푖, 푁, 퐶, 퐴) be the 푗*-th verification query. Let 푇 be the IV in 퐶 and let 푀 be the random variable for the decrypted message CTR[퐸].D(퐾out, 푁, 퐶, 퐴), where 퐾in ‖ 퐾out is the random variable for the session key of user 푖 for nonce 푁. The message 푀 is determined from 퐶 if we know all pairs (푋, 퐸(퐾out, 푋)) for every string 푋 ∈ 1{0, 1}푛−1. Consider the following cases.

Case 1: There is no entry (prim, 퐾푖, 푋, ·, ·) with 푋 ∈ {pad(푁, 0),..., pad(푁, 5)}. Assume that all prim entries are compatible; otherwise the adversary simply loses the game. Let 푉 ← xor(퐻(퐾in, 푀, 퐴), 푁). Processing this verification query will return true only if 퐸(퐾out, 푉 ) = 푇 . After programming 퐸, given all prim entries 푘−1 and 퐾푖, since KD is 2-unpredictable, there are at least 2 equally choices for the 퐾out. Assume that 퐾out ̸= 퐾푖, which happens with probability at least 1 − 2/2푘 ≥ 1 − 2/2푛.

Suppose that there is no entry (prim, 퐾out, 푉, ·, ·). Note that 푉 starts with 0. Given all prim entries, (푁, 퐶, 퐴, 퐾푖, 퐾in, 퐾out) and all pairs (푋, 퐸(퐾out, 푋)) for every 푋 ∈ 1{0, 1}푛−1, (i) one can determine 푉 , but (ii) there are at least 2푛−1 − 푛−2 6푄 ≥ 2 equally likely choices for 퐸(퐾out, 푉 ), and thus the chance that 푇 = 푛 퐸(퐾out, 푉 ) is at most 4/2 . ′ ′ Suppose that there is an entry (prim, 퐾out, 푉, 푇 , ·), with 푇 ̸= 푇 . Then after programming 퐸, the chance that 푇 = 퐸(퐾out, 푉 ) is 0.

We now bound the chance that there is an entry (prim, 퐾out, 푉, 푇, ·). Clearly this entry, if exists, is not created by Eval(푖, ·) queries since 퐾out ̸= 퐾푖. First consider the case that either (1) 퐴 ̸= 휀, or (2) 푀 ̸= 휀, or (3) 퐻 is 푐-regular. 66 Bose, Hoang and Tessaro

After programming 퐸, given all prim entries, (푁, 퐶, 퐴, 퐾푖, 퐾out) and all pairs 푛−1 (푋, 퐸(퐾out, 푋)) for every 푋 ∈ 1{0, 1} , (i) one can uniquely determine 푀, but (ii) since KD is 2-unpredictable, for any string 퐾, the conditional probability 1−푛 that 퐾in = 퐾 is at most 2 . Hence, for any entry (prim, 퐾, 푋, ·, ·) that is not created by Eval(푖, ·) queries, due to the (possibly weak) 푐-regularity of 퐻 and the 2-regularity of xor, the chance that 퐾out = 퐾 and 푉 = 푋 is at most 2 2 · 푐 · E[︀|퐶| + |퐴| ]︀ 8푐 · E[︀|퐶| + |퐴| ]︀ · 푛 푛 = 푛 푛 . 2푘 2푛−1 2푘+푛 [︀ ]︀ 48푄푐·E |퐶|푛+|퐴|푛 Summing over 6푄 entries (prim, 퐾, 푋, ·, ·), we obtain a bound 2푛+푘 . Next we consider the case that both 퐴 and 푀 are the empty string, and 퐻 is weakly 푐-regular. Note that one can check if 푀 = 휀 without knowing the key 퐾out, by comparing |퐶| with 푛. Now since 퐻 is weakly regular, 퐻(퐾in, 푀, 퐴) = 0푛, and thus 푉 = xor(0푛, 푁). From the balls-into-bins assumption above, there are at most 2푎 entries (prim, 퐾, 푉, 푇, ·), and the chance that one such 퐾 is 퐾out is at most 2푎/2푘. Summing up, for this case, the chance that the verification query (푖, 푁, 퐶, 퐴) makes game 퐺1 return true is at most 6 48푄푐 · E[︀|퐶| + |퐴| ]︀ 2푎 + 푛 푛 + . 2푛 2푛+푘 2푘

Case 2: There is an entry (prim, 퐾푖, 푍, ·, −) for 푍 ∈ 훺 = {pad(푁, 0),... pad(푁, 5)}. Regardless of how the adversary chooses (푖, 푁), there are at most 6푎 entries (prim, ·, 푉, ·, −) with 푉 ∈ 훺. Since the key 퐾푖 is sampled independent of 6푎 the (prim, ·, ·, ·, −) entries, this case happens with probability at most 2푘 .

Case 3: Entries (prim, 퐾푖, pad(푁, 0), ·, +),..., (prim, 퐾푖, pad(푁, 5), ·, +) exist. Note that those entries are not created by queries Eval(푖, ·), because the ad- versary is not allowed to query Eval(푖, 푁) and then output a verification query (푖, 푁, 퐶, 퐴). Except for entries created by Eval(푖, ·), the key 퐾푖 is independent of the remaining entries. Since KD is 2-unpredictable, assume that 퐾out ̸= 퐾푖, 2푄 1 2푄 which happens with probability at least 1 − 2푘 · 2푘 ≥ 1 − 2푘+푛 . We consider the following sub-cases.

Case 3.1: There is no entry (prim, 퐾out, 푍, 푇, ·), where 푍 is a string starting with 0. Assume that the entries are compatible; otherwise the adversary simply loses the game. After programming 퐸, given all entries, (퐾푖, 퐾in, 퐾out), and 푛−1 all pairs (푋, 퐸(퐾out, 푋)) for every 푋 ∈ 1{0, 1} , (i) one can determine 푀 from 퐶, and then compute 푉 ← xor(퐻(퐾in, 푀, 퐴), 푁), but (ii) either there ′ ′ ′ is an entry (prim, 퐾out, 푉, 푇 , ·) with 푇 ̸= 푇 , meaning 퐸(퐾out, 푉 ) = 푇 ̸= 푇 , or there is no entry (prim, 퐾out, 푉, ·, ·), meaning that there are still at least 푛−1 푛−2 2 −6푄 ≥ 2 equally likely choices for 퐸(퐾out, 푉 ), and therefore the chance 푛 that 퐸(퐾out, 푉 ) = 푇 is at most 4/2 . Hence in this case the chance that the 푛 verification query above can make 퐺1 answer true is at most 4/2 .

Case 3.2: There is an entry (prim, 퐾out, 푍, 푇, +) where 푍 is a string start- ing with 0. Now, the entry (prim, 퐾out, 푍, 푇, +), if exists, does not come from 67

Eval(푖, ·) queries, since 퐾푖 ̸= 퐾out as mentioned above. Note that due to the balls-into-bins assumption above, there are at most 푎2 septets (prim, 퐽, ·, 푇, +), ′ ′ (prim, 퐾, pad(푁 , 0), 푅0, +),..., (prim, pad(푁 , 5), 푅5, +) such that (i) 퐽 is the 푘-bit suffix of KD.Map(푅0, . . . , 푅5), and (ii) those entries are not from the eval- −푘 uation queries of user 푖. For each such septet, the chance that 퐾 = 퐾푖 is 2 . Hence this case happens with probability at most 푎2/2푘.

푛−1 Case 3.3: There is an entry (prim, 퐾out, 푍, 푇, −), where 푍 ∈ 0{0, 1} . Assume that 푍 ̸∈ 훺 = {pad(푁, 0),..., pad(푁, 5)}. This happens with probability at least 1 − 6푎2/2푘, since due to the balls-into-bins assumption above, there are 2 ′ at most 6푎 septets of entries (prim, 퐽, 푍 , 푇, −), (prim, 퐾, pad(푁, 0), 푅0, +),..., (prim, 퐾, pad(푁, 5), 푅5, +) such that 퐽 is the 푘-bit suffix of KD.Map(푅0, . . . , 푅5) and 푍′ ∈ 훺.

If incompatibility does not happen then either (prim, 퐾푖, pad(푁, 0), ·, +),..., (prim, 퐾푖, pad(푁, 5), ·, +) all belong to the ideal-cipher queries, or they all are created by queries Eval(푗, ·) for some 푗 ̸= 푖. Consider all septets 풯 of entries (prim, 퐾, pad(푁, 0), 푅0, +),..., (prim, 퐾, pad(푁, 5), 푅5, +), (prim, 퐽, 푈, 푇, −), in ′ which 퐽 ‖ 퐽 ← KD.Map(푅0, . . . , 푅5), such that (1) either the first six entries belong to the ideal-cipher queries, or they are created by queries Eval(푗, ·), with 푗 ̸= 푖, and (2) 퐽 ̸= 퐾 and (3) 푈 ∈ 0{0, 1}푛−1∖{pad(푁, 0),..., pad(푁, 5)}. For such a septet 풯 , denote 퐾 = MKey(풯 ), 퐽 = OKey(풯 ), 퐽 ′ = IKey(풯 ) and 푈 = Input(풯 ), and let Msg(풯 , 퐶) be the message obtained by decrypt- ing CTR[퐸].D(OKey(풯 ), 퐶). This query (푖, 푁, 퐶, 퐴) makes game 퐺1 return true only if incompatibility does not happen, and there is a septet 풯 such that xor(퐻(IKey(풯 ), Msg(풯 , 퐶), 퐴), 푁) = Input(풯 ) and 퐾푖 = MKey(풯 ).

Now consider the following game 퐺2 that is equivalent of 퐺1, but adds some extra bookkeeping. In the first phase, we let bad ← false, and then run New,Eval,퐸,퐸−1 풜 as usual. After the adversary finishes querying and outputs its verification queries, let (푖, 푁, 퐶, 퐴) be the 푗*-th verification query. Excluding the prim entries created by Eval(푖, ·) queries, we check for compatibility of the remaining entries. If incompatibility happens we terminate the game and re- turn false. Otherwise, among septets 풯 such that MKey(풯 ) = 퐾푖, we check if xor(퐻(IKey(풯 ), Msg(풯 , 퐶), 퐴), 푁) = Input(풯 ), and if this happens, we set bad to true. Note that here OKey(풯 ) ̸= MKey(풯 ) = 퐾푖, and thus the additional calls to 퐸(OKey(풯 ), ·) to compute Msg(풯 , 퐶) will not create further incompat- ibility with the prim entries created by Eval(푖, ·) queries. In the second phase, we check the compatibility of all entries, program 퐸, and proceed to handle the verification queries.

Note that in this case, the verification query (푖, 푁, 퐶, 퐴) makes game 퐺1 return 1 only if game 퐺2 sets bad. Thus it suffices to prove that

1 푛푎 · E[︀|퐶| + |퐴| ]︀ Pr[퐺 sets bad] ≤ + 푛 푛 . 2 2푛 2푘 68 Bose, Hoang and Tessaro

Analyzing game 퐺2. Let game 퐺3 be identical to game 퐺2, but in 퐺3, we drop the second phase. Then

Pr[퐺3 sets bad] = Pr[퐺2 sets bad] since the part we drop does not modify bad. Consider the following game 퐺4 that is identical to game 퐺3, except the following. In game 퐺3, recall that we check xor(퐻(IKey(풯 ), Msg(풯 , 퐶), 퐴), 푁) = Input(풯 ) for only septets 풯 such that MKey(풯 ) = 퐾푖. In 퐺4, we instead check xor(퐻(IKey(풯 ), Msg(풯 , 퐶), 퐴), 푁) = Input(풯 ) for all septets, and then among septets 풯 of affirmative answers, we set bad if there is some 풯 such that MKey(풯 ) = 퐾푖. The two games are equivalent, and thus Pr[퐺4 sets bad] = Pr[퐺3 sets bad]

Let Bad be the event that in game 퐺4, the adversary can find 푛푎(|퐶|푛 + |퐴|푛) or more septets of affirmative answers. Note that the septets and their checking 푛 are independent of the key 퐾푖. Hence it suffices to prove that Pr[Bad] ≤ 1/2 . The difficulty here is that the adversary can adaptively make (푖, 푁, 퐶, 퐴) after seeing the queries and answers. This creates an issue in using the regularity of 퐻 in checking xor(퐻(IKey(풯 ), Msg(풯 , 퐶), 퐴), 푁) = Input(풯 ), since (퐶, 퐴, 푁) might depend on IKey(풯 ). However, we claim that for any fixed choice (푖*, 푁 *, 퐶*, 퐴*),

Pr[Bad ∩ (︀(푖, 푁, 퐶, 퐴) = (푖*, 푁 *, 퐶*, 퐴*))︀] ≤ 21−(3푛ℓ+2푛) (12)

* * where ℓ = |퐶 |푛 + |퐴 |푛 ≥ 2. By union bound, the chance that Bad happens is at most ∞ ∞ ∞ ∑︁ ∑︁ ∑︁ ∑︁ 2 1 21−(3푛ℓ+2푛) ≤ 22푛ℓ+2푛 · 21−(3푛ℓ+2푛) = ≤ . 2푛ℓ 2푛 ℓ=2 (푖*,푁 *,퐶*,퐴*) ℓ=2 ℓ=2 * * |퐶 |푛+|퐴 |푛=ℓ

To justify Equation (12), fix such a value (푖*, 푁 *, 퐶*, 퐴*). Consider the follow- New,Eval,퐸,퐸−1 ing game 퐺5. Initially, we let bad ← false, and run 풜 as usual. After the adversary finishes querying, we ignore its verification queries. Exclud- ing prim entries created by Eval(푖, ·) queries, we check for compatibility of the remaining entries. If incompatibility happens, we terminate the game and re- * turn false. Otherwise, consider all septets 풯 of (prim, 퐾, pad(푁 , 0), 푅0, +),..., * * ′ (prim, 퐾, pad(푁 , 5), 푅2, +), (prim, 퐽, 푈, 푇 , −), with 퐽 ‖퐽 ← KD.Map(푅0, . . . , 푅5) such that (1) either the first six entries belong to the ideal-cipher queries, or they are created by queries Eval(푗, ·), with 푗 ̸= 푖, and (2) 퐽 ̸= 퐾 and (3) 푈 ∈ 0{0, 1}푛−1∖{pad(푁 *, 0),..., pad(푁 *, 5)}. Check

xor(퐻(IKey(풯 ), Msg(풯 , 퐶*), 퐴*), 푁 *) = Input(풯 ) for all such septets 풯 , and if there are 푛푎ℓ or more septets of affirmative answers then we set bad to true. Then

(︀ * * * * )︀ Pr[Bad ∩ (푖, 푁, 퐶, 퐴) = (푖 , 푁 , 퐶 , 퐴 ) ] ≤ Pr[퐺5 sets bad] . 69

Consider the following game 퐺6. It is identical to game 퐺5, but we grant the * adversary the keys 퐾푗, for every 푗 ̸= 푖 , at the beginning. This should only help the adversary. Then

Pr[퐺5 sets bad] ≤ Pr[퐺6 sets bad] .

Consider the following game 퐺7. It is identical to game 퐺6 but here we lazily * implement 휋푗 for every 푗 ̸= 푖 , and also lazily implement 퐸. That is, initially 휋푗 is undefined on all inputs. Only when we need to compute 휋푗(푋) do we sample the output from a proper distribution. Likely, initially 퐸(·, ·) is undefined on all inputs. Only when we need to compute 퐸(퐾, 푋) or 퐸−1(퐾, 푌 ) do we sample the outputs from proper distributions. Moreover, we eagerly do the compatibility checking and programming 퐸. That is, each time the adversary makes an ideal- cipher query we immediately do the compatibility checking and terminate the game if incompatibility is detected. Likewise, each time the adversary makes a query to Eval(푗, ) with 푗 ̸= 푖*, we immediately do the compatibility checking, terminate the game if incompatibility is detected, and program 퐸 otherwise. Since 퐺7 is simply a different implementation of 퐺6,

Pr[퐺7 sets bad] = Pr[퐺6 sets bad] .

* ′ Next, in game 퐺7, for any 푗 ̸= 푖 , querying Eval(푗, 푁 ) either causes premature ′ termination without setting bad, or effectively calls 푅푠 ← 퐸(퐾푗, pad(푁 , 푠)) for every 푠 ∈ {0,..., 5}, and returns KD.Map(푅0, . . . , 푅5). Since the adver- sary knows the keys 퐾푗, without loss of generality, assume that the adversary makes no evaluation query Eval(푗, ·) for any 푗 ̸= 푖*, and makes at most 6푄 ideal-cipher queries: whenever it is supposed to query Eval(푗, 푁 ′), it will query ′ 퐸(퐾푗, pad(푁 , 푠)) for every 푠 ∈ {0,..., 5} instead. (If those queries are redundant then the adversary can reuse the past answers without querying 퐸.) Now, we say that an entry (prim, 퐽, 푈, 푇 *, −) is bad if it belongs to an affirmative septet. Consider the following game 퐺8. It is essentially the same as game 퐺7, but we set bad if there are at least 푛ℓ bad entries. Note that each entry (prim, 퐽, 푈, 푇 *, −) can belong to at most 푎 septets, due to the balls-into-bins assumption above. Hence Pr[퐺7 sets bad] ≤ Pr[퐺8 sets bad] .

Game 퐺9 is essentially the same as game 퐺8, but at the beginning, we grant the adversary free queries 퐸(퐾, pad(푁 *, 0)), . . . , 퐸(퐾, pad(푁 *, 5)), for every 퐾 ∈ {0, 1}푘, and then grant it free queries 퐸(퐾, 푋) for every 퐾 ∈ {0, 1}푘 and 푋 ∈ 1{0, 1}푛−1. (However, if some query 퐸(퐾, 푋) repeats a prior query then we will not grant this query.) Note that our free queries may prohibit the adversary from making some backward queries as those now become redundant. However, the entries (prim, 퐾, 푈, ·, −) corresponding to those backward queries are not bad, because those 푈’s will belong to 1{0, 1}푛−1 ∪ {pad(푁 *, 0),..., pad(푁 *, 5)}. Hence Pr[퐺8 sets bad] ≤ Pr[퐺9 sets bad] .

What is left is to analyze the chance that game 퐺9 sets bad. 70 Bose, Hoang and Tessaro

Analyzing game 퐺9. Consider the following balls-into-bins game. For each 6- * * tuple of queries 퐸(퐾, pad(푁 , 0)), . . . , 퐸(퐾, pad(푁 , 5)) with answer 푅0, . . . , 푅5 respectively, we view them as throwing a ball into bin 푍[푛 + 1 : 푛 + 푘], where 푘 푍 ← KD.Map(푅0, . . . , 푅5). So totally we throw 2 balls. Since KD[퐸] is 2- unpredictable, for the 푗-th throw, given the result of prior throws, the condi- tional probability that the 푗-th ball lands in any particular bin is at most 21−푘. Assume that each bin contains at most ⌈푘ℓ/2⌉ ≤ 푛ℓ balls, which happens with probability at least 1 − 2−(3ℓ+2)푘 ≥ 1 − 2−(3ℓ+2)푛, according to Lemma 11. Now, partition the septets according to their backward queries. Each partition con- tains at most 푛ℓ septets, due to the balls-into-bins assumption above. Since there are at most 푝 backward queries, there are at most 푝 partitions. Now, for each entry (prim, 퐽, 푈, 푇 *, −), for it to be bad, at least one of the 푛ℓ septets in its partition must be an affirmative one. Given all prior prim entries, the random variable 푈 has at least 2푛−1 − 6푄 − 6 ≥ 2푛−2 equally likely values, and thus the conditional probability that this entry is bad is at most 푛ℓ/2푛−2. Hence the chance that there are 푛ℓ bad entries is at most (︂ 푝 )︂(︁ 푛ℓ )︁푛ℓ 푝푛ℓ (︁ 푛ℓ )︁푛ℓ (푛ℓ/64)푛ℓ (푛ℓ/64)푛ℓ 1 ≤ ≤ ≤ ≤ 푛ℓ 2푛−2 (푛ℓ)! 2푛−2 (푛ℓ)! (푛ℓ/푒)푛ℓ 16푛ℓ 1 ≤ , 2(3ℓ+2)푛 where the second equality is based on the hypothesis that 푝 ≤ 2푛−8, the third inequality is due to the fact that 푚! ≥ (푚/푒)푚 for any integer 푚 ≥ 1, and the last inequality is due to the fact that ℓ ≥ 2. Summing up,

1−(3ℓ+2)푛 Pr[퐺9 sets bad] ≤ 2 . This concludes the proof.

M Proof of Theorem6

From Proposition1, we can construct 푑-repeating adversaries 풜1 and 풜2 such that mu-mrae mu-priv mu-auth 2푞 Adv (풜) ≤ Adv (풜1) + Adv (풜2) + . AE,퐸 AE,퐸 AE,훱 2푛 Moreover, any query of 풜1 or 풜2 is also a query of 풜. In particular, 풜1 makes at most 푞 encryption queries of total 퐿 blocks, and at most 퐵 blocks per (user, nonce) pair, and 푝 ideal-cipher queries. Likewise, 풜2 makes at most 푞 encryp- tion/verification queries of total 퐿 blocks, and encryption queries of 퐵 blocks per (user, nonce) pair, and 푝-ideal cipher queries. Let AE* = KtE[KD[푘], AE].

Privacy analysis. We first bound the privacy advantage of 풜1. From Lemma6, 2 mu-priv mu-priv 2 48(퐿 + 푝)푞 + 36푞 Adv (풜1) ≤ Adv * (풜1) + + AE,퐸 AE ,퐸 2푛/2 2푘+푛 2푎(퐿 + 푝) + 2푑(퐿 + 푝 + 3푞) + . 2푘 71

Now, since 푞 ≤ 퐿/2 (as each query consist of at least two blocks, one from associated data, another from plaintext/ciphertext), the bound above can be simplified as 2 mu-priv mu-priv 2 33퐿 + 24퐿푝 (2푎 + 5푑)퐿 + (2푎 + 2푑)푝 Adv (풜1) ≤ Adv * (풜1)+ + + . AE,퐸 AE ,퐸 2푛/2 2푘+푛 2푘

Authenticity analysis. We next bound the authenticity advantage of 풜2. From Lemma9, we can construct an adversary 풜3 such that 2 mu-auth mu-auth 5 11푞 336(퐿 + 푝)푞 + 72푞 Adv (풜 ) ≤ Adv * (풜 ) + + + AE,퐸 2 AE ,퐸 3 2푛/2 2푛 2푛+푘 48푐(퐿 + 푝 + 푞)퐿 (8푎 + 7푎2 + 9푑)푞 + (푛푎 + 8푎 + 8푑)퐿 + 8(푎 + 푑)푝 + + . 2푛+푘 2푘

Moreover, any query of 풜3 is also a query of 풜2. In particular, 풜3 makes at most 푞 encryption/verification queries of total 퐿 blocks, and encryption queries of 퐵 blocks per (user, nonce) pair, and 푝-ideal cipher queries. Again, since 푞 ≤ 퐿/2, the bound above can be simplified as 2 mu-auth mu-auth 5 (186 + 72푐)퐿 + (168 + 48푐)퐿푝 Adv (풜 ) ≤ Adv * (풜 ) + + AE,퐸 2 AE ,퐸 3 2푛/2 2푛+푘 11푞 (12푎 + 4푎2 + 13푑 + 푛푎)퐿 + 8(푎 + 푑)푝 + + . 2푛 2푘

Combining privacy and authenticity. From the analysis above, on the one hand,

mu-mrae mu-priv mu-auth 7 13푞 Adv (풜) ≤ Adv * (풜 ) + Adv * (풜 ) + + AE,퐸 AE ,퐸 1 AE ,퐸 3 2푛/2 2푛 (14푎 + 4푎2 + 18푑 + 푛푎)퐿 + 10(푎 + 푑)푝 + 2푘 (219 + 72푐)퐿2 + (192 + 48푐)퐿푝 + . 2푛+푘 On the other hand, using Proposition2, we can construct an adversary 풜 such that mu-priv mu-auth mu-mrae * AdvAE*,퐸 (풜1) + AdvAE ,퐸 (풜3) ≤ 3 AdvAE,KeyGen,퐸(풜), where the key-generation algorithm KeyGen is given in Fig.7. Adversary 풜 makes at most 푞 encryption/verification queries of total 퐿 blocks, and encryption queries of 퐵 blocks per user, and 푝-ideal cipher queries. Hence 7 13푞 Advmu-mrae(풜) ≤ 3 Advmu-mrae (풜) + + AE,퐸 AE,KeyGen,퐸 2푛/2 2푛 (14푎 + 4푎2 + 18푑 + 푛푎)퐿 + 10(푎 + 푑)푝 + 2푘 (219 + 72푐)퐿2 + (192 + 48푐)퐿푝 + . 2푛+푘 72 Bose, Hoang and Tessaro

Finally, using Theorem4 with 훽 = 4 if 퐻 is 푐-regular, or Theorem5 with 훽 = 4 if 퐻 is weakly 푐-regular

1 (4푎 + 푑)푝 + (2푑 + 푎)퐿 Advmu-mrae (풜) ≤ + AE,KeyGen,퐸 2푛/2 2푘 (12푐 + 28)퐿2 + 16푐퐿푝 (16푐 + 8.5)퐿퐵 + + 2푛+푘 2푛 and note that 푞 ≤ 퐿퐵/4,

10 (17푎 + 4푎2 + 24푑 + 푛푎)퐿 + (22푎 + 13푑)푝 Advmu-mrae(풜) ≤ + AE,퐸 2푛/2 2푘 (48푐 + 30)퐿퐵 (303 + 108푐)퐿2 + (192 + 96푐)퐿푝 + + . 2푛 2푛+푘 This concludes the proof.