This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2017.2710125

1 A New Look at Counters: Don’t Run Like Marathon in a Hundred Meter Race

Avijit Dutta, Ashwin Jha and Mridul Nandi

Abstract—In cryptography, counters (classically encoded as bit strings of fixed size for all inputs) are employed to prevent collisions on the inputs of the underlying primitive which helps us to prove the security. In this paper we present a unified notion for counters, called counter function family. Based on this generalization we identify some necessary and sufficient conditions on counters which give (possibly) simple proof of security for various counter-based cryptographic schemes. We observe that these conditions are trivially true for the classical counters. We also identify and study two variants of the classical counter which satisfy the security conditions. The first variant has message length dependent counter size, whereas the second variant uses universal coding to generate message length independent counter size. Furthermore, these variants provide better performance for shorter messages. For instance, when the message size is 219 bits, AES-LightMAC with 64-bit (classical) counter takes 1.51 cycles per byte (cpb), whereas it takes 0.81 cpb and 0.89 cpb for the first and second variant, respectively. We benchmark the software performance of these variants against the classical counter by implementing them in MACs and HAIFA hash function.

Index Terms—counter, XOR-MAC, protected counter sum, LightMAC, HAIFA, universal hash, cryptography, AES-NI. !

1 INTRODUCTION In cryptography, counters are classically viewed as se- counter-based encoding of message into a collection of s quences of binary strings, h0is, h1is,..., h2 − 1is, where blocks; (2) application of a pseudorandom function (or PRF) hiis is the s-bit binary representation of i for some fixed f to each of the blocks and XORing them together into a integer s. The integer s is an application parameter which hash value, h; and (3) XORing h with f 0(R) where f 0 is remains fixed for all invocations of a scheme. In general a PRF independent of f, and R is either a nonce (non- counters are employed to prevent collisions among the repeating internal state) or a random salt. The original inputs to the underlying cryptographic primitive. This in security proof of XOR-MACs is based on the assumption turn helps in proving the security of the overall scheme. that f and f 0 are (pseudo)random functions. Later Bernstein Counter-based schemes can be broadly classified into two [13] provided an improved analysis when f and f 0 are categories, (1) Counter-as-Input (or CaI) Schemes: where (pseudo)random permutations. the counter value is used as a standalone input to the In 1999, Bernstein proposed the protected counter sum underlying primitive; (2) Counter-as-Encoding (or CaE) (or PCS) [5], to construct stateless and deterministic MACs. Schemes: where the counter is encoded within the input Similar to XOR-MACs, PCS also computes a hash value h to the underlying primitive. CTR [1], GCM [2], and SIV [3], using the first two steps. But in the last step, it computes are some schemes that fall under the former category, while f 0(h), where f 0 is a PRF independent of f. Recently Luykx XOR-MACs [4], PCS [5], and LightMAC [6] fall under the et al. have extended the idea of PCS, where they replaced later category. There are some schemes, such as HAIFA [7], PRFs with PRPs, to construct LightMAC [6]. which can be included in both the categories. In this paper All these MAC constructions have a common underlying we solely focus on CaE schemes. function, that we call h (illustrated in Fig. 1(b)). In fact XOR-MACs are based on the Hash-then-Mask (or HtM) paradigm [14] (illustrated in Fig. 1(c)) by Wegman and 1.1 Some Counter-as-Encoding Schemes Carter. More precisely, if h is an AXU universal hash [15] Counter-based encoding is used in HAIFA [7] (illustrated (defined later in subsection 2.3) and f 0 is a random function in Fig. 1(a)), to enhance the (second preimage) security (or permutation) independent of H, the hash-then-mask of iterated hash functions. Notably, the popular second paradigm is defined as h(M) ⊕ f 0(R). The security of preimage attacks [8], [9] on Merkle-Damgard˚ (or MD) hash these schemes mainly rely on the non-repeating nature of function [10], [11] do not apply on HAIFA. In fact Bouil- R and the universal property of h. PCS and LightMAC lagauet and Fouque [12] have shown that HAIFA has full are based on the Hash-then-PRF (HtPRF) paradigm [14], second preimage security in the random oracle model. [16] (illustrated in Fig. 1(d)), which is simply defined as In 1995, Bellare et al. proposed the so-called XOR- f 0(h(M)). The security of schemes based on this paradigm MACs [4], to construct stateful and probabilistic MACs. rely on the universal property of h. At the highest level, XOR-MAC consist of three steps: (1)

1.2 Motivation • A. Dutta, A. Jha and M. Nandi are with the Indian Statistical Institute, Kolkata, India. Consider the three MAC schemes discussed in the preceding E-mail: [email protected] subsection. All these schemes apply a counter-based input

Copyright (c) 2017 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2017.2710125

2

...... X1 := h1iskM1 X2 := h2iskM2 Xb := hbiskMb One possible reason is that the security guarantee is not obvious. One has to analyze the counter-as-encoding schemes with variable length messages. In this case a gen- C C C iv .... H(M) eral view of counters could have made this analysis much h0 h1 h2 hb−1 hb easier. (a) In [4] the authors have given a general framework for ...... X1 := h1iskM1 X2 := h2iskM2 Xb := hbiskMb constructing XOR-MACs. But even this general framework is somewhat lacking and does not give a unified yet simple treatment for all possible counter-based encodings. In the f f f main theorem of [5], it is assumed that given a message, ...... h(M) the encoding produces distinct encoded blocks for distinct (b) primitive calls. Although this points towards a general re- R f 0 quirement from any counter-based encoding, the indication is rather implicit. Thus, a formal treatment of counters is 0 M h T M h f T required. Furthermore, this generalization should allow a (c) (d) generic proof technique that applies to all counter-based encodings. Fig. 1: Some CaE schemes. (a) HAIFA Hash Function H based on a A disadvantage of the aforementioned message length compression function C (primitive for this construction); (b) Counter dependent counter scheme is the requirement of prior based AXU hash h based on a random function f (primitive for this knowledge of the message length. In many scenarios this construction); (c) Hash-then-Mask (HtM) MAC; and (d) Hash-then-PRF (HtPRF) MAC based on counter-based AXU hash h as shown in (b); may not be possible. What can be a good approach when Note that the input message M := (M1,...,Mb) is encoded into b the message length is not known beforehand? blocks, where each block Xi := hiiskMi is a concatenation of the s-bits Let us consider an analogous problem from athletics. i M n − s binary representation of and the message block i of size . Consider a race over an unknown distance, where the ath- letes would only know the finishing point once they actually reach there. What should be a good strategy for running this encoding to the message. Given a message M ∈ {0, 1}`, race? One may consider to run like a marathoner hoping ` ≤ L (for fixed integers s and L), the encoding works as that it is a marathon. In this case, clearly the athlete loses if follows: Let M 0 = Mk1k0d where d is the smallest integer the race is a hundred meter sprint. The better solution would such that n − s divides the bit-length (number of bits) of be to start like a sprinter and then gradually reduce speed 0 0 M . Suppose M = (M1,M2,...,Mb), such that for all i ∈ as the race progresses. This would definitely be optimal for {1, . . . , b}, Mi is of bit-length n − s. Then X := (X1 := short runs. h1iskM1,...,Xb := hbiskMb) is the encoded input of M. Our problem is similar to the one just described. We want Recall that, the encoding uses a classical counter with to find a variable length counter that offers near optimal a fixed parameter s. The choice of s may depend on the counter size. Although similar problems are well-studied in nature of the application. More precisely, s should be chosen the field of prefix codes [17], [18] and [19], in such a way that the number of encoded blocks generated to the best of our knowledge it has not seen any interest in for the longest permissible message should not be more than cryptography. It would be interesting to investigate the tech- 2s, to avoid the possible reseting of counter values. niques used in prefix coding and construct a near optimal A typical choice for the maximum permissible message counter scheme when the message length is not known. length would allow 264 encoded blocks, i.e., s is exactly 64 bits. Suppose AES cipher (with block size 128 bits) is used 1.3 Our Contributions to instantiate the LightMAC [6] scheme. Then each call of In this paper we define a general notion of counters and AES will process 64 bits of message, as each encoded block closely study the security of CaE schemes based on this contains 64 bits of counter value and 64 bits of message. So it general view. More precisely, we view a counter as a func- would take 2 AES calls per 128 bits of message, independent tion from non-negative integers to bits-strings (may not be of whether the message size is 1 KB (213 bits), or 1 GB (233 of same sizes). An appropriate part of a message and the bits). In short for a 1 KB message, LightMAC makes about counters are fed into the primitive. We describe a sufficient 128 AES calls. On the other hand when s = 8, the number condition for counters to ensure security guarantee (see sec- of AES calls reduces to 69 (almost twice faster). But this tion 3). We further show that these properties give straight- limits the maximum number of encoded blocks to 28. So a forward generic security proofs (see section 5) for counter- classical counter cannot be efficient over a wide range of based hash function (HAIFA), counter-based universal hash message lengths. functions and MACs (see section 4). When the message size is known beforehand, one can Our generalization of counters, can serve as a general simply choose the best choice of s for the given message guideline for designers as well as developers on how to length, instead of fixing it throughout all the invocations. choose or create a secure counter-based encoding scheme This will definitely speed up the performance for shorter for various applications. In other words, this work can be messages and provide very similar performance when the used as a theoretical starting point in creating/choosing a message size approaches the maximum size. Although this counter-based encoding scheme for practical purposes. idea is very natural, it has not been applied in any algo- For instance, based on our generalizations we are able rithms so far. to identify two efficient alternatives (described in section 3)

Copyright (c) 2017 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2017.2710125

3

for the standard counter scheme. We have implemented all 2) Collision resistance: It is hard to find a pair (x, x0) these counters (see section 6) for (1) Davis-Meyer based of distinct elements, called collision pair for H, such HAIFA, (2) LightMAC, and (3) XOR-MACs. In this work that H(x) = H(x0). we solely focus on software implementation based com- 3) Second preimage (2PI) resistance: Given a chal- parison. Our implementations use Intel’s AES-NI and SSE4 lenge x, it is hard to find a collision pair of the form instruction set [20]. We have observed that the practical (x, x0) (in this case x0 is also called a second preimage results support the theoretical speed up. For instance, for of x). 1 MB message size, MAC implementations based on new counter schemes use around 35% less clock cycles per The most widely used mode of iteration is Merkle-Damgard˚ byte as compared to the standard MAC implementations hash [10], [11], which is also known as the MD hash func- with 64-bit counter. Similarly HAIFA based on new counter tion. The simple iteration method maintains the collision schemes uses approximately 32% less clock cycles per byte resistance and preimage resistance of the compression func- as compared to the standard HAIFA implementation with tion. But there are multiple second preimage attacks [8], 64-bit counter. These performance characteristics indicate [9] on MD hash function. In [7], Biham and Dunkelman that the two new counter schemes are indeed much more suggested the HAsh Iterative FrAmework (HAIFA) to re- flexible and efficient as compared to the classical counter. place the MD hash function, which would resist the earlier As noted earlier, we do not consider CaI schemes [1], attack strategies. The main idea behind HAIFA is the use [2], [3], in this paper. In these schemes the counter size of counter-based (number of bits hashed so far) encoded f : does not really affects the number of primitive calls. So the input in the iterated hash mode. More formally, let {0, 1}c×{0, 1}n → {0, 1}c w ≤ n counter schemes considered in this paper may not give any be a compression function, 2w ≤ L h := IV improvements. be a fixed integer such that , and 0 be a fixed initial value. Then, given a message M ∈ {0, 1}`, we first d n−w b parse Mk10 kh`iw as (M1,...,Mb) ∈ ({0, 1} ) where d 2 PRELIMINARIES is the smallest nonnegative integer such that ` + w + 1 + d 2.1 Notation is divisible by (n − w). The hash output is defined as hb We write the set of non-negative numbers as N. For a binary where hi’s are defined recursively as hi = f(hi−1, hiiwkMi), s 1 ≤ i ≤ b − 1 h = f(h , h0i kM ) 1 string x = x1x2 ··· xs ∈ {0, 1} , we call |x| := s the , and b b−1 w b (see Fig. 2). length or bit-size of x. For 1 ≤ r ≤ s, let lsbr(x) := For a fixed invocation of the hash function, this method xs−r+1xs−r+2 ··· xs (or msbr(x) := x1x2 ··· xr) denote the differentiates the two inputs of the underlying compression least significant (respectively most significant) r bits of x. function. This fact is used by Bouillaguet and Fouque [12] to s 2n For an integer i < 2 , hiis represents the canonical s-bit show that HAIFA has full (i.e. ) second preimage security binary representation of i. For two binary strings x and y, in the random oracle model. x||y represents the concatenation of two strings x and y. For ≤m m i 2.2.1 Davis-Meyer Compression Function any positive integer m, we write {0, 1} = ∪i=1{0, 1} . s Given a binary string x ∈ {0, 1} we can also view it as a Both MD hash function and HAIFA require an underlying s s nonnegative integer less than 2 . Suppose x 6= 1 then we compression function. An old but popular method of con- define (x + 1) to denote the s-bit binary string after adding structing a compression function out of a blockcipher is due one to x (viewed as an integer). to Davis and Meyer [21], [22]. The Davis-Meyer function has been proved [23] to be a secure compression function in Convention. Throughout this paper, we fix two positive n integers n and L. the ideal cipher model (discussed below). Let e : {0, 1} × {0, 1}c → {0, 1}c be a blockcipher, i.e., for any K ∈ {0, 1}n, n c 1) The integer represents the block size (bit-size) of eK := e(K, ·) is a permutation over {0, 1} . Davis-Meyer c n c the underlying primitive (e.g., a blockcipher), and compression function DMe : {0, 1} × {0, 1} → {0, 1} is L 2) represents the length (bit-size) of the largest mes- defined as DMe(x, K) = eK (x) ⊕ x. sage which can appear for a specific application. n ∗ n ∗ . .... All elements of {0, 1} are called blocks. {0, 1} , ({0, 1} ) h1iwkM1 h2iwkM2 h0iwkMb denote the set of all finite length binary strings, and block ∗ h0 h1 h2 hb−1 hb strings respectively. For any x ∈ {0, 1} , if ` = |x|, then iv e e . .... e h `ˆ= d`/ne denotes the number of blocks in x. B, KB and MB represent bytes, kilo bytes and mega bytes, respectively. Fig. 2: HAIFA based on Davis-Meyer compression function.

2.2 HAIFA Framework in Hash Function The ideal cipher model by Coron et al. [24], [25] is widely Hash function H is a function from {0, 1}∗ to {0, 1}c. Hash used as the de facto model for a blockcipher based hash functions are usually constructed by means of iterating a function. In this model, the adversary has access to e and it’s c n −1 −1 −1 cryptographic compression function f : {0, 1} × {0, 1} → inverse e (defined as e (K, y) = eK (y)). For any fresh {0, 1}c, while trying to maintain the following three require- query (K, x) to e (i.e. (K, x) has not been queried to e before, −1 ments: and there is no e -query (K, y) whose response is x), the adversary obtains a response which is chosen uniformly 1) Preimage resistance: Given a challenge y, it is hard to find x such that H(x) = y. Here x is called 1. For administrative reason, we present a simpler variant of counter preimage of y. in HAIFA. We refer [7] for the actual definition.

Copyright (c) 2017 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2017.2710125

4

c from {0, 1} \(S1 ∪S2) where S1 and S2 denote the set of all all queries have exactly one block). The prf-prp switching outputs of eK -query (i.e. key is K), and the set of all inputs lemma [33], [34] says that for a keyed function (or permuta- −1 −1 n of eK -query, respectively. Similarly a fresh query to e can tion) FK over the domain {0, 1} , be defined. Thus, a second preimage adversary A, against q2 the HAIFA hash H based on the Davis-Meyer function can prp prf |AdvF (t, q) − AdvF (t, q)| ≤ n+1 . (1) make some e and e−1 queries, and finally it has to return a 2 second preimage for a previously given challenge message. ACOMPOSITION THEOREM:PRF(U) ≡ PRF. It is well- We write, known [14], [16] that composition of an  universal hash −1 function h and a PRF F is a PRF. More precisely, we have the Adv2PI(A) = Pr[M 0 ← Ae,e (M): H(M 0) = H(M)] following result for HtPRF due to Wegman and Carter [14], to denote the second preimage advantage of A against H. and Shoup [16],

Theorem 1 ( [14], [16]). Let GK1,K2 := FK2 ◦ hK1 : D → 2.3 (Almost-XOR) Universal Hash Function {0, 1}n where h is an -universal hash over D. Then, An n-bit hash function h is a (K, D)-family of functions ! prf prf q {h := h(k, ·): D → {0, 1}n} Adv (t, q) ≤ Adv (t0, q, `e) + × , k k∈K defined over its domain G F 2 or message space D and indexed by the key space K. Definition 1 (-AXU hash function [26]). A (K, D)-family where t0 = t + O(qT ) and T denotes the maximum time h is called -Almost-XOR Universal (or AXU) hash for computing h. function, if for any two distinct x and x0 in D and a δ ∈ {0, 1}n, the δ-differential probability 2.5 Message Authentication Code diff [x, x0] := Pr [h (x) ⊕ h (x0) = δ] ≤  h,δ K K K Message authentication codes or MACs are symmetric-key where the random variable K is uniformly distributed primitives which are used to ensure data integrity. The over the set K. working principle of MAC is simple. Whenever Alice wants to send a message M to Bob, she sends (M,T ) where the The maximum δ-differential probability, over all possible pairs tag T = F (M) is the output of tag-generation algorithm. of distinct inputs x, x0, is denoted by ∆ . The maximum K h,δ In case of deterministic MAC, when Bob receives a message differential probability ∆ := max ∆ . If ∆ ≤  then we h δ h,δ h tag pair (M 0,T 0), he verifies the equality T 0 = F (M 0) say that h is an -AXU hash. Multi-linear hash [27], [28], K (if verified, the pair is called a valid pair). We denote this and pseudo-dot-product or PDP hash [27], [29], [30], [31] verification algorithm by V (M,T ) which returns 1 on are some examples of AXU hash functions. In this paper we K valid pair and 0 for invalid. Usually, an adversary may will see an example of a CaE AXU hash function (similar to attempt (multiple times) to forge a tag for a message after PHASH [32]). having seen other tagged messages. We define the forgery Universal Hash Function. When δ = 0, the 0-differential 0 advantage of a forger A against FK as event is equivalent to collision. So we write diffh,0[x, x ] and 0 forge ∆h,0 by collh[x, x ] and collh, respectively, and we call them FK ,VK AdvF (A) = Pr[A wins] (2) collision probabilities. Definition 2 (-universal hash function). A hash family h is where A wins if it submits a valid and fresh (not obtained called -universal (or -U) if through previous FK queries) (M,T ) pair to the verification 0 oracle. The maximum advantage in forging F is defined as collh := maxx6=x0 PrK [hK (x) = hK (x )] ≤ . forge forge AdvF (t, qm, qv) := maxA AdvF (A) 2.4 Pseudorandom Function and Permutation where the maximum is taken over all A which makes at An n-bit random function $D is a function chosen uniformly most qm tag-generation queries and qv verification queries, from the set of all functions from D to {0, 1}n. When, and runs in time at most t. The tag generation algorithm n forge D = {0, 1} , we simply write $n. Similarly, an n-bit random is called (t, qm, qv, )-mac if AdvF (t, qm, qv) ≤ . Note permutation Πn is a permutation chosen uniformly from that the tag-generation algorithm FK can be probabilistic the set of all permutations over {0, 1}n. Given an oracle in nature. When it is deterministic, the unforgeability is adversary A, we define the PRF-advantage of A against a implied by the PRF property of FK . F keyed function K as Lemma 1. [35] If FK is a deterministic keyed function then forge prf −n prf FK $n Adv (t, qm, qv) ≤ Adv (t, qm + qv) + qv2 . AdvF (A) = |Pr[A = 1] − Pr[A = 1]|. F F prf ˆ prf Let AdvF (t, q, `) denote maxA AdvF (A) where the max- 2.5.1 Categories of MAC imum is taken over all A running in time at most t, making CBC-MAC [36], PMAC [37], EMAC [38], HMAC [39], at most q queries such that the longest query has at most `ˆ PCS [5], LightMAC [6] etc. are some examples of determin- many blocks. We similarly define the PRP-advantage of A istic MACs. A probabilistic tag-generation algorithm takes as prp FK Πn an additional random input R along with the message M. AdvF (A) = |Pr[A = 1] − Pr[A = 1]|. $ The tag of the probabilistic MAC, denoted as FK (M) := prp The maximum PRP advantage is denoted as AdvF (t, q) (R,FK (R,M)), is the pair (R,T ). The verification algorithm n (note that for a keyed permutation the domain is {0, 1} , so VK (M, (R,T )) checks whether FK (R,M) = T or not.

Copyright (c) 2017 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2017.2710125

5

XMACR [4], RMAC [40], MACRX3 [41], EHtM [42], RW- independent of ` and hence for all `, the counter functions MAC [42] and FRMAC [43] are some examples of prob- are same. We call such CFFs message length independent. For a abilistic MACs. Stateful MACs can be viewed as variants message length independent CFF CTR, we simply write ctr of probabilistic MACs. In this case, R is treated as a nonce to denote the counter function ctr`. Note that the standard (unique for each invocation of the tag-generation algorithm) counter is a prefix-free counter. Prefix-free CFF is necessary and stored in an internal memory. We denote the output of to avoid repetitions among the inputs to the primitive (see a stateful MAC as F st(M). WMAC [44], XMACC [4], and Lemma 2 below). We also note that the size of the output of EWCDM [45] are some examples of stateful MACs. Many std` is fixed for all counter values. In general for a CFF CTR, probabilistic/stateful MAC schemes are based on the Hash- if ∀` ∀i, j |ctr`(i)| = |ctr`(j)|, then we say that the CFF has then-Mask paradigm by Wegman and Carter [14]. For these fixed size. Otherwise, we call it a variable size CFF. We will see schemes the MAC security can be bounded by the universal some examples of message length dependent and variable property of the underlying AXU hash function. size CFFs later in the section. n Theorem 2 ( [14]). Let hK : D → {0, 1} be an -AXU hash n and FK0 be a keyed function over {0, 1} . Let 3.1 Counter Function Family Based Message Encoding

GK,K0 (R,M) = FK0 (R) ⊕ hK (M). Recall the encoding scheme discussed in section 1.2. When we use the standard counter STDs with s2n−s ≤ L, we first Then, we have ` 0 parse a message M ∈ {0, 1} as (M1,...,Mb−1,Mb) where Advforge(t, q , q ) ≤ Advprf (t0, q + q ) + q  + `+1 0 1) G$ m v F m v v b = d n−s e, |M1| = · · · |Mb−1| = n − s and |Mb| < (n − q2 s) M = M k · · · kM kM 0 M = M 0k10d 2n+1 . such that 1 b−1 b. Let b b forge prf 0 0 where d = n − s − 1 − |M |. Thus, |Mb| = n − s. We also 2) AdvGst (t, qm, qv) ≤ AdvF (t , qm + qv) + qv. b denote the parsing of M as

n−s STDs 3 A NEW LOOK AT COUNTERS (M1,...,Mb) ←− M or (M1,...,Mb) ←− M In computer science, counters are used to count the number to emphasize the standard CFF. Now we define the b blocks s of times a particular event or process has occurred. In encoding as Xi = std (i)kMi for all 1 ≤ i ≤ b. These addition with counting, counters have a very special role in blocks are used as inputs to the underlying primitive, such cryptography. In CaE schemes, the counter values are used as blockcipher or compression function. to encode the message into distinct blocks. This freshness We extend the same methodology to define the encoding of inputs actually helps in proving (possibly improved) the for any other CFF CTR. For this, we first define the block security of CaE schemes. For example, the input block of function bCTR(`), which associates each message size to a the i-th execution of the underlying compression function unique number of blocks. We have seen that for standard of a HAIFA [7] hash function contains an encoding of counter, bSTD(`) = d(` + 1)/(n − s)e. i. Similarly, in counter-based MACs, e.g. XOR-MACs [4], Definition 4. For any CFF CTR, we define the block func- PCS [5], LightMAC [6], the i-th input of the underlying tion, bCTR(`) as the least integer b such that pseudorandom function or permutation contains an encod- i i hii s b ing of . The classical encoding of is s which is the -bit X binary representation of i for a fixed parameter s. Whenever ` + 1 ≤ (n − |ctr`(i)|) ≤ ` + n. i < 2s, we have distinct binary string corresponding to each i=1 counter value. For counter-based algorithms, the parameter It is easy to see that such a b exists and it is unique. Now s can be chosen as per the needs of the application domain. given a message M ∈ {0, 1}`, ` ≤ L, we parse it as M = For example, if we know beforehand that for an application, M k · · · kM kM 0 where |M | = n − |ctr (i)| for all i < b the message size can not be more than 232, then one can 1 b−1 b i ` and 0 ≤ |M 0| < n − |ctr (b)|. We similarly pad the last block choose s = 32. However, it must remain constant through- b ` to make it compatible with the corresponding counter. More out all the executions of the algorithm. This might affect precisely, we define M = M 0k10d where d = n − |ctr (b)| − the performance for smaller length messages. In this section b b ` 1 − |M 0|. Thus, |M | = n − |ctr (b)|. Finally, we define the we introduce a new and general way of looking at counters b b ` encoding which will provide some tools to improve the performances CTR(M) := (X ,...,X ) of CaE schemes without compromising with the security. 1 b

Definition 3 (Counter function family). A counter function where Xi = ctr`(i)kMi for all 1 ≤ i ≤ b. Thus, we also family (we also use CFF) CTR is a family of counter view the CFF as an encoding function CTR : {0, 1}≤L → n + functions {ctr` : ` ≤ L} where for all ` ≤ L, ctr` : N → ({0, 1} ) .

Copyright (c) 2017 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2017.2710125

6

Proof: We first prove the “only if” direction. Suppose the probability (or frequency) distribution over integers is there exists i 6= j such that i, j ≤ b and ctr`(i) is a prefix of monotonic, then the expected lengths of the codewords are ctr`(j). Thus, we can find x and y such that ctr`(i)kx = within a constant factor of the optimal expected lengths n CTR ` ctr`(j)ky ∈ {0, 1} . Let (M1,...,Mb) ← M ∈ {0, 1} . for that distribution. Note that a universal code works for 0 ` Then we define M ∈ {0, 1} by replacing Mi and Mj by any countably infinite set, and it only requires a monotonic 0 0 distribution. This is one of the main motivations behind x and y respectively. Thus, it is easy to see that Xi = Xj 0 0 0 the study of universal codes. Our problem falls precisely where CTR(M ) = (X1,...,Xb). To prove the other direction, let us assume Xi = Xj for in this case, as we can always assume a natural monotonic some i 6= j. Therefore, ctr`(i)||Mi = ctr`(j)||Mj. This clearly distribution over the number of blocks. For instance all mes- proves that either ctr`(i) is a prefix of ctr`(j), or ctr`(j) is a sages must have at least one block, so 1 has the maximum prefix of ctr`(i). frequency, followed by 2, and so on. In 1974, Elias proposed the first prefix code with univer- sal property [47]. In 1975, he followed it up by proposing 3.2 Some (Efficient) Alternatives to the STD CFF several new examples of universal codes [50]. Some of We have already demonstrated the classical or standard CFF the popular examples of universal codes are Elias gamma s STD . It is a message-length independent and fixed size coding [18], [50], Elias delta coding [18], [50], Elias omega CFF. In this section we study two new examples of CFFs coding [18], [50], [18], [51], Levenshtein and see their advantages over the standard one. coding [18] and Exp- [18]. Now we describe two examples of universal codes, viz., Elias gamma and opt 3.2.1 STD : A Message Length Dependent CFF delta codings. Counter function of our first CFF is an s bit classical en- Elias Gamma Coding, γ. To code a number i ≥ 1: coding of its argument, where s depends on the message 1) Let j = blog ic, i.e., 2j ≤ i < 2j+1. size instead of a pre-determined fixed parameter. In fact, 2 2) γ(i) := 0jkhii . s is defined as the smallest possible value such that we j+1 can represent all the counter values distinctly for the given Elias Delta Coding, δ. To code a number i ≥ 1: message length. Formally, the counter function is defined as j j+1 1) Let j = blog2 ic, i.e., 2 ≤ i < 2 . opt g std` (i) = hiis, where s = min{g : 2 (n − g) > `}. 2) δ(i) := γ(j + 1)klsbj(hiij+1).

It is an example of message-length dependent and fixed size To represent an integer i, Elias gamma uses 2 log2 i + 1 bits, CFF. Clearly, it is a prefix-free counter. and Elias delta uses log2 i+2 log2(log2 i+1)+1 bits. Clearly, Elias delta is more compact as compared to Elias gamma. r 3.2.2 VAR : A Variable Size CFF CFF from Universal Coding. Theoretically, any universal Till now we have seen counter functions which map integers code U can be used to construct a length-independent vari- s to their s bits binary representation. In case of STD , s is able size prefix-free CFF U-CTR. Suppose U is a universal opt fixed (approx. lg L), whereas in case of STD , s depends on code, then we define U-CTR as follows: the message length ` (approx. lg `). Both STDs and STDopt U-CTR = {u-ctr : ` ≤ L} u-ctr = U, 1 ≤ ` ≤ L. are fixed size CFFs. Now we will construct a CFF which ` where ` maps integers to binary strings of monotonically increasing Clearly, U-CTR is prefix-free as U is universal. Although lengths. A similar problem is well known in the field of one might be tempted to use U-CTR straightaway, there is source coding. We briefly discuss these techniques assuming still some scope of improvement in this generic scheme. that the set of symbols is N. Readers may refer to the Note that the motivation for universal coding is quite dif- references cited for a more detailed exposition. ferent than the cryptographic settings. ∗ Prefix codes. A mapping α : N → {0, 1} , is called a binary First, we are restricting the domain to some L, whereas prefix code [18] over integers, if for all x 6= y ∈ N, α(x) a universal code is defined over N. Intuitively it seems that is not a prefix of α(y) and vice-versa. Here α(x) is called the universal codes must have some redundant information the codeword corresponding to x. Huffman codes [17], [18], that we can avoid. Second, all the CFFs discussed so far Shannon-Fano codes [18], [19], [46] and Universal codes [18], have efficient counter update (generating u-ctr`(i + 1) from [47] are some popular techniques for getting prefix codes. u-ctr`(i)) mechanism, which is not a mandatory requirement Huffman codes and Shanon-Fano codes are in general for universal codes. So we should restrict our focus to better than universal codes, when the probability (or fre- only those universal codes which support efficient update quency) distribution over the integers is known. This is mechanism, such as Elias delta code. Now we present our generally the case in static data compression settings, such second CFF candidate, called VARr, which is a modified as JPEG File Interchange Format [48] which employs a mod- version of δ-CTR. ified form of , and the ZIP file format [49] VARr Counter Function Family. We first fix r, an applica- which uses a modified version of Shannon-Fano coding. Al- tion parameter chosen suitably (we will see very soon how though Huffman codes and Shannon-Fano codes are better to choose r). As VARr would be message length indepen- than universal codes, but they require exact probability (or dent, it would be sufficient to define a function over the frequency) distribution over the integers. domain {1, 2,...,L}. To begin with, we define varr(0) = 0r. Universal Codes. A universal code [18], [47], [50] is a Now, for all i ≤ L, we will recursively define varr(i) given binary prefix code, with the additional property that if that varr(j) has been defined for all j < i. Like STDs

Copyright (c) 2017 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2017.2710125

7

and STDopt, we increment the counter by one at every 3.2.3 Word Oriented Adaptation of Our Counters r step. In other words, we first define y = var (i − 1) + 1. The counter functions described in the preceding section If the r most significant bits get altered (this is easy to are aimed to minimize the counter size as much as pos- check, as the remaining bits would all be zero), then we sible, keeping all the counter values distinct (i.e., prefix- r r define var (i) = yk0, otherwise we define var (i) = y. free). However, there are different practical issues while r Mathematically, we can write the recursive definition of var implementing these counter functions. The most important for i ≥ 1, as issue is to parse the message into small chunks which are ( compatible with the counter-based encoding. In practice, we r x + 1 if msbr(x) = msbr(x + 1), var (i) = mostly receive messages as a sequence of words (e.g., 8-bit (x + 1)k0 if msb (x) 6= msb (x + 1). r r (byte), or 32-bit words). It would be easier for implementa- where x = varr(i − 1). tion if we can parse the messages in multiples of word size. w Clearly the size of the counter function output increases More formally, let us fix a parameter w (elements of {0, 1} opt,w slowly but monotonically with i. So VARr is an example are called words). We define STD (the word oriented opt of message length independent and variable size CFF. To adaption of STD ) as follows: understand this counter, we demonstrate how the counter opt,w std (i) = hii , where s = min{g : w|g, (n − g)2g > `}. values are computed for r = 3. A boldface zero at the least ` s significant bit represents the added zero bit on expansion. Note that STDopt = STDopt,1. Now we describe how we The underlined bits are the third most significant bits. can generalize our second proposal to VARr,w which fits into word oriented implementation, for r ≤ w. We define 000, 0010, 0011, 01000, 01001, 01010, 01011, 011000,... the counter function recursively as before except that we Now we will describe how one should choose the pa- add 0w instead of single 0. More formally, varr,w(0) = 0w. rameter r given that the limit on the message length is L. For i ≥ 1, let x = varr,w(i − 1). Then, Note that, when the size of the counter reaches r + i for ( x + 1 if msb (x) = msb (x + 1), the first time, the i least significant bits of the counter value varr,w(i) = r r i w are all zero. So the counter will be incremented 2 many (x + 1)k0 if msbr(x) 6= msbr(x + 1). times before the next expansion in counter size. Since we We choose r = 4 for w = 8 in our implementation.2 For need to keep the r most significant bits non-zero (to avoid this choice of r, the counter function is prefix-free. In the repetitions), the following equation must be satisfied: following example we describe how the counter expands 2r −1 X for r = 4 and w = 8. (n − r − i) · 2i > L. 8 8 i=0 00000000,..., 000100000 ,..., 00100000000000000 ,... After doing some algebraic simplifications one can show To the best of our knowledge such word-oriented defini- that the smallest r that satisfies the above equation is ap- tions are not available for delta codes (for obvious reasons). c(n)·n 1 prox. log2 log2 L for L < 2 where 2 ≤ c(n) < 1. Note Even if we use our word-oriented definition, the resulting that for any integer i, code will not give better counter function. This can be r,w r argued by the simple fact that in case of VAR , r is fixed, var` (i) = hblog2icirklsblog i(hiilog i+1), 2 2 so we can simply start with some fixed w and then move whereas as it is. But in case of delta code we have to consider the underlying gamma code also which is variable in nature. δ(i) = γ(blog i + 1c)klsblog i(hiilog i+1). 2 2 2 Handling two variable components may result in overheads r Clearly |var` (i)| is less than |δ(i)| when in the counter size as well as the update mechanism. r−1 i ≥ 22 2 −1. 3.3 Comparison of Rates of Counter Function Families 64 We generally fix the maximum message length to be 2 bits, Recall that we have defined the number of blocks in a r which gives r ≈ 6. So, the average counter size in VAR message of length ` as `ˆ = d`/ne. As we execute some will be less than the average counter size in δ-CTR, when costly primitive function on these blocks, it would be good 6 the number of generated blocks is at least 2 . Specifically to minimize the number of blocks as much as possible. 16 r 2 VAR CTR for around blocks, the average counter size in is Definition 5. We define the rate function rate (`) associ- δ-CTR shorter than the average counter size in by more than ated with a counter-based encoding CTR, as the ratio of 3 bits. Now we show that VARr is a prefix-free CFF. r the number of blocks in the message to the total number Theorem 3. Let r be defined as above then VAR is prefix- of blocks in the encoded message produced by CTR, i.e., free. CTR `ˆ rate (`) = bCTR(`) . i 6= j x = msb (varr(i)) Proof: Let be non-zero and r CTR ` r As `ˆ ≥ `/n, we have rate (`) ≤ . So, one can achieve and y = msbr(var (j)). If x 6= y then x can not be prefix of bn y. So assume x = y. Because of our choice of r, the r most maximum rate one (in this case there is no counter). Now significant bits does not become 0r (except for the input we provide a comparison between the rate functions for the 0). So, varr(i) and varr(j) have same size. As i and j are three CFFs defined above. distinct, the rest of the bits must be different. This proves 2. Our implementation uses a bit different step size to leverage Intel’s the prefix-free property. AES-NI and SSE instructions.

Copyright (c) 2017 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2017.2710125

8

1) For the standard counter STDs, the rate function is 3.4.2 Estimating the Block Function of VARr,w d`/ne n − lg L For a given message length `, we are interested in a lower ≈ . d(` + 1)/(n − s)e n bound on b(`).3 For simplicity we also assume c = r/w, i.e., r0 = 0. We restrict our analysis to only those `, for which When L is large, we need to choose large s ≈ log L, 2 ∃ k ≤ 2r − 1 such that the number of blocks, which reduces the rate. opt 2) For STD , the rate function is k−1 k ! 0 X w˙ − 1 b(`) = 2r w˙ i = . (8) d`/ne n − log2 ` w˙ − 1 ≈ . i=0 d(` + 1)/(n − log2 `)e n s We find an upper bound for b(`) in the following derivation, Clearly, the rate of this counter is better than STD for all choices of ` < L. k−1 k−1 ! r X X 3) For VAR , the rate function is a complex function in n0 w˙ i − w i · w˙ i ≤ ` + n n and r. We skip the exact algebraic expression here i=0 i=0 ! ! and give an approximation, ww˙ wkw˙ k w˙ k − 1 n0 + − ≤` + n (9) d`/ne w˙ − 1 w˙ k − 1 w˙ − 1 d(` + n − r + 2)/(n − r + 2 − log `)e ! 2 w˙ k − 1 n − r + 2 − log2 `   (n − r + w − wR) .` + n (10) ≈ for `  n . w˙ − 1 n k ! Clearly, the rate of this counter is better than STD w˙ − 1 (n − r + w − log (` + n)) ` + n (11) for small messages and large s. Further the rate is 2 w˙ − 1 . comparable with STDopt for n  r. 1 1 For W ≥ 2, w˙ −1 ≥ w˙ , and for moderately large k, we get 3.4 Word Oriented Rates of Counter Function Families w˙ k w˙ k−1 ≈ 1. Using these facts we get (10) from (9). Similarly We will assume that n, `, L ∈ Mw, where Mw denotes the wk . log2(` + n) (experimental results show that wk ≈ n set of all multiples of w. We assume that L = o(2 ). log2 `), which gives (11) from (10). Using (8) and (11) we s opt The word oriented rate functions for STD and STD get, are pretty similar to their bit oriented counterparts. For ` + n STDs s b(`). (12) instance, in , we can simply choose to be some n − r + w − log2(` + n) w multiple of . So we only derive the word oriented rate r,w function for VARr. Rate Function for Word Oriented VAR . Finally we get the following approximation for lower bound on the rate r,w 3.4.1 Estimating Parameter r in VAR function of word oriented VARr,w, Let c = dr/we. We know that the following inequality holds r,w n + w − r − log (` + n) for the correct value of r, rateVAR (`) 2 & n 2r −1 X (n − cw − iw)2iw+cw−r ≥ L (3) for w  n  `. Note that we derived this function for i=0 some restricted ` values. But our experimental results show 0 0 r w that this holds for all `. In Table 1 we provide the exper- Let n := n − cw, r := cw − r, r˙ = 2 , and w˙ = 2 . Now imental results for the word oriented rate functions of the we will get a lower bound on r, three candidate counter functions. Fig. 3 gives a graphical r˙−1 X 0 comparison between the rates of the three CFFs. (n0 − iw)2iw+r ≥ L i=0 4,8 r˙−1 r˙−1 ! 1.4 VAR 0 X X STDopt,8 2r n0 w˙ i − w i · w˙ i ≥ L STD64 i=0 i=0 1.2 STD32    r˙  r˙  STD16 r0 0 ww˙ w˙ − 1 wr˙w˙ 2 n + − ≥ L (4) STD8 w˙ − 1 w˙ − 1 w˙ − 1 1 r0  0 r˙−1 r˙−1 Rate 2 (n + 2w) 2w ˙ − ww˙ ≥ L (5) 0.8 w˙ r˙ (2n − w)≥ 2L (6) 0.6 where (4) can be obtained by simple algebraic simplifica- 1 2 tions. As W ≥ 2, we have w˙ −1 ≤ w˙ . Using this we get (5). 0.4 Also c ≥ 1 and r0 ≤ w − 1, which gives us (6). Now simple 6 8 10 12 14 16 18 20 algebraic simplifications give us the upper bound on r, Input Length (in Log Base 2)   2L  Fig. 3: Rate plot for the three CFFs. log2 2n−w r≥ log (7) 2  w  3. Note that we have dropped VARr,w from the superscript as it is For w  n  L, we get r ≈ log2 log2 L − log2 w. obvious.

Copyright (c) 2017 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2017.2710125

9 TABLE 1: Comparison between the rates offered by the three candidate CFFs for block size, n = 128. subsections hold, when we instantiate these schemes with CTR ∈ {STD, STDopt, VARr}. STDs Length STDopt,8 VAR4,8 s = 8 bits s = 16 bits s = 32 bits s = 64 bits 128B 0.89 0.80 0.73 0.47 0.89 0.89 4.1 Counter Based HAIFA 256B 0.89 0.84 0.73 0.48 0.89 0.89 512B 0.91 0.86 0.74 0.49 0.91 0.89 Let CTR be a CFF. We present a counter based HAIFA hash 1KB 0.93 0.86 0.74 0.50 0.93 0.88 function, called CtHAIFA, based on an n-bit (both blocksize 2KB 0.93 0.87 0.75 0.50 0.93 0.88 and keysize is n bits) Davis-Meyer compression function 4KB - 0.87 0.75 0.50 0.87 0.88 ≤L DM. For a message M ∈ {0, 1} , let h0 = IV (an n bit 8KB - 0.87 0.75 0.50 0.87 0.88 constant) and 16KB - 0.87 0.75 0.50 0.87 0.88 32KB - 0.87 0.75 0.50 0.87 0.88 64KB - 0.87 0.75 0.50 0.87 0.86 hi = DM(hi−1,Xi), ∀ i ∈ {1, . . . , b} 128KB - 0.87 0.75 0.50 0.87 0.80 256KB - 0.87 0.75 0.50 0.87 0.77 where Xi is the i-th element of CTR(M). We define CtHAIFA 512KB - 0.87 0.75 0.50 0.87 0.76 based on CTR as, 1MB - - 0.75 0.50 0.75 0.76 CtHAIFA(M) = DM(hb, h|M|in).

4 COUNTER-AS-ENCODING CONSTRUCTIONS Note that this definition is a bit different than the one Now we will analyze various CaE schemes, such as AXU described earlier. Here we use h|M|in as the last block hash, MAC and HAIFA hash function based on our CFFs. instead of h0iwkMb. This has been done just for the sake In the last section, we defined prefix-free counter. For any of simplicity. The security result can also be proved for such prefix-free counter CTR and for all M, we know that the original definition with some additional notations. We X1,...,Xb are distinct where CTR(M) = (X1,...,Xb). present our main result on CtHAIFA in the following the- With a slight abuse of notation, we let CTR(M) to denote orem. We postpone the proof of this theorem to the next the set {X1,...,Xb}. Observe that prefix-free property does section. not say anything about the collisions between the encoded CTR blocks of two distinct messages. Clearly we will have some Theorem 4. Let be a prefix-free and injective CFF. CtHAIFA intersection between CTR(M) and CTR(M 0) (viewed as a Then, has full second preimage security. More A set) for any counter function family. specifically, for any second preimage adversary that makes at most q queries, we have Definition 6. A prefix-free CFF CTR is called injective if for 0 0 all M 6= M , CTR(M) 6= CTR(M ) (as a set). 2PI 3q AdvCtHAIFA(A) ≤ n . Now we provide a sufficient condition for injective counter 2 function families. Lemma 3. Let CTR be a prefix-free CFF. It is injective if it Remark 1. Although the fixed points in Davis-Meyer com- satisfies the following condition pression functions can be easily computed, this does not help in inverting any arbitrary hash value. This can 0 0 ∀ `, ` ≤ L, b(`) = b(` ) ⇒ ctr` = ctr`0 . be argued as follows: Let hi be some hash value and the adversary aims to compute a pair (hi−1, m) such ` 0 `0 Proof: Suppose for some M ∈ {0, 1} ,M ∈ {0, 1} , that DMe(hi−1, m) = hi. Suppose the adversary queries 0 0 CTR(M) = CTR(M ). We require to show that M = M . (m, y) to the decryption oracle, then the probability 0 −1 Note that, CTR(M) = CTR(M ) ⇒ {X1,...,Xb(`)} = that em (y) = x such that em(x) ⊕ x = hi is at most 0 0 0 n {X1,...,Xb(`0)}. So b(`) = b(` ) = b. By given condition, 1/(2 − i + 1). we have ctr` = ctr`0 and we denote it simply by ctr. Now 0 we first show that Xi = Xi for all i. As two sets are equal Remark 2. Dean [52] showed an attack on the MD hash 0 for any i ≤ b, there must exist j so that Xi = Xj. This function when the underlying compression function has implies that one of ctr(i) and ctr(j) is prefix of the other, and efficiently computable fixed points for random message hence i = j (as the counter function is block-wise collision- blocks. That attack cannot be extended to CtHAIFA as the 0 free). Since counter functions for ` and ` are same we have underlying CFF has block-wise collision free property. 0 0 Mi = Mi for all i. This proves that M = M . Recall that all our candidate CFFs are prefix-free. Further Remark 3. In Crypto 2005, Coron et al. [24] proved that MD it is obvious to see that they also satisfy the condition for hash with prefix-free encoding is indifferentiable from injectiveness. So we get the following corollary. a random oracle. Here prefix-free refers to the property opt r Corollary 1. CTR ∈ {STD, STD , VAR } is a prefix-free that the encoding of M is not a prefix of the encoding and injective CFF. of M 0, if M 6= M 0. Recall that a prefix-free CFF means ctr (i) is not a prefix of ctr (j) if i 6= j. Not all encoding In the following subsections we present various CaE ` ` schemes based on prefix-free counters are prefix-free schemes. The security of all these schemes follow directly encodings. In fact length padding is necessary to get from the prefix-free and injective property of the underlying prefix-free encodings. CFF. Specifically, all the security results in the following

Copyright (c) 2017 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2017.2710125

10 4.2 AXU Hash Function 4.4 Necessity of Prefix-free and Injectivity Let CTR be a CFF. We define a counter based AXU-hash In all the security results discussed in this section, the function, denoted CtH, based on an n-bit pseudorandom security proof is implied by the prefix-free and injective ` permutation EK and CTR. Let M ∈ {0, 1} . We define property of the CFF CTR. In other words, prefix-free and CtH (M) = E (X ) ⊕ · · · E (X ) injective properties are sufficient for security. Now we show EK ,CTR K 1 K b that these properties are also necessary in case of CtH, where CTR(M) = (X1,...,Xb). Note that this hash func- CtMAC1 and CTMAC2. tion is a generalization of the hash functions used in [4], Case 1: Suppose CTR doesn’t have injective prop- [5], [6]. It has a simple design that requires only E com- k erty. As CTR is a deterministic function (encodings putation and XOR operations. Furthermore the hash can be are deterministic), one can easily find two distinct computed completely in parallel. Now we show that if we messages M,M 0 such that CTR(M) = CTR(M 0). replace E by a random permutation, we get an AXU hash K Case 2: Suppose CTR is not prefix-free, i.e., for function. The proof is postponed to the next section. some ` ≤ L, we have i < j ∈ N such that ctr`(i) Theorem 5. Let CTR be a prefix-free and injective CFF. Then, (j) n CTR n−1 is a prefix of ctr` . Thus, we can find two pairs CtHΠ ,CTR is 2/2 -AXU hash if b (L) ≤ 2 , where 0 0 0 n (c, c ) and (d, d ) such that ctr`(i)kc = ctr`(j)kc and L is the length of the largest message. 0 ctr`(i)kd = ctr`(j)kd . Using these two pairs we can construct two distinct messages M and M 0 such 4.3 MAC that the i-th encoded block collides with the j-th 0 Now that we have an AXU hash function, we can apply encoded block for both M and M , i.e., they cancel 0 standard methods, such as Hash-then-PRF and Hash-then- out each other in CtH(M) and CtH(M ). Mask, to obtain MAC schemes from CtH. Let CTR be a CFF. M,M 0 ` 0 Therefore in both the cases we can find , such that Given any M ∈ {0, 1} with ` > n, we write M = M km Pr[CtH(M) ⊕ CtH(M 0) = 0] = 1. This leads to trivial where m is a block. We define forgery attacks on CtMAC1 and CtMAC2. Hence prefix-free 0 0 CtMac1 0 (M) = Π ( (M ) ⊕ m) and injective property are necessary. Πn,Πn n CtHΠn CtH is an AXU hash function when CTR is prefix-free and 0 5 SECURITY PROOFSOF CTHAIFA AND CTH injective. So the modified scheme CtHΠn (M ) ⊕ m is a universal hash when CTR is prefix-free and injective. Hence 5.1 Proof of Theorem 4 [2PI Security of CtHAIFA] we can apply composition theorem to show that CtMac1 is We use a similar approach as used in [12] for HAIFA a pseudorandom function. We have the following theorem. based on random oracle. Let A be a second preimage Theorem 6. CtMac1 := CtMac1 Let EK1 ,EK2 be defined based adversary for CtHAIFA that makes q distinct queries to on two independently chosen keyed blockcipher, and let the underlying ideal cipher. Note that in the ideal cipher CTR be a prefix-free and injective CFF. Then, model, A can make both encryption and decryption queries 2 to the underlying block cipher. We also assume that the prf q prp Adv (t, q, `ˆ) ≤ + Adv (t0, `qˆ ) CtHAIFA(M) CtMac1 2n−1 E adversary computes . The adversary can do this by making encryption query (Mi, hi−1) and storing The proof of theorem 6 follows directly from theorem 1 (hi−1,Mi, hi := eMi (hi−1) ⊕ hi−1) for all i ∈ [b]. Actually and 5. The above construction is deterministic. A can compute any DM(h, m) by making encryption query CtMAC1 is based on HtPRF paradigm. Now we use HtM (m, h) to e. Let the sequence of intermediate chaining values paradigm to define probabilistic and stateful MACs that for the challenge message M be (h0, h1, . . . , hb, hb+1). are motivated from XOR-MACs. Let R be a seed (either We denote the i-th query of A by the tuple random or nonce). We define CtMac2 (R,M) = EK1 ,EK2 (xi, mi, ci, yi, oi) such that ci is the counter value, mi is E (R) ⊕ CtH (M). By using Theorem 2 we can have K2 EK1 the message value, xi is some chaining value, and yi = following results on the MAC security if probabilistic and DM(xi, cikmi) ⊕ xi. Further, stateful CtMac2. i o = 0 Theorem 7. Let CtMac2 := CtMac2E ,E be defined based 1) if the -th query is an encryption query then i K1 K2 A c km , x y := e (x ) on two independently chosen keyed blockcipher. Then, and queried i i i and got i cikmi i . 2 2) if the i-th query is a decryption query then oi = 1 forge 0.5q prp 0 qv −1 Adv (t, q , q , `ˆ) ≤ +Adv (t , `ˆ(q +q ))+ and A queried cikmi, yi and got xi := e (yi). CtMac2st m v 2n E m v 2n cikmi Now suppose that A produces a valid second preimage 2 0 forge q prp 0 qv M . We denote this event by Win. Let Win1 and Win2 denote Adv (t, qm, qv, `ˆ) ≤ + Adv (t , `ˆ(qm + qv)) + 0 0 CtMac2$ 2n E 2n the events Win ∧ |M| 6= |M | and Win ∧ |M| = |M |, respectively. Clearly Win = Win ∪ Win . Therefore, Remark 4. Note that CtMAC2 can be shown to have full 1 2 MAC security [4] when instantiated with a random func- Pr[Win] ≤ Pr[Win1] + Pr[Win2]. tion. The security reduces to q2/2n when block ciphers are used for instantiation. This is due to the PRF-PRP We bound the two events as follows: 0 switching lemma (Equation 1). Since we give a concrete 1) If Win1 is true then |M| 6= |M |. In this case we 0 implementation of stateful MAC based on AES block must have h|M|in 6= h|M |in. Therefore the adver- cipher, we have to switch from PRF to PRP model. sary found a second preimage for the last block,

Copyright (c) 2017 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2017.2710125

11 TABLE 2: Baseline CPB of AES using AES-NI. i.e., there exist a query i ∈ {1, . . . , q} such that 0 cikmi = h|M |in 6= h|M|in and yi ⊕ xi = hb+1. encryption key scheduling 1 (cycles/byte) (cycles) This is possible with probability 2n−i+1 . Bounding q (AES, serial) 6.7 117 over q queries we have Pr[Win1] ≤ 2n−q . 0 (AES, parallel) 0.69 117 2) If Win2 is true then |M| = |M |. In this case, if A succeeds, then A was successful in creating a connection to some intermediate chaining value 6.2 Implementing Counter Function Families h j. Suppose this connection is established at the The three counter function families basically differ in their i-th query, then ci must be equal to ctr`(j) and s 1 selection of counter sizes. STD is a standard counter with opt yi ⊕ xi = hj. This probability is again 2n−i+1 . q a fixed size s; STD computes the (optimal) counter size Pr[Win ] ≤ r Bounding over q queries we have 2 2n−q . s based on the length of the input; and VAR dynamically Combining 1 and 2 we have, changes the counter size s based on the number of input 2q 3q blocks processed so far. Pr[Win] ≤ ≤ . In a practical application STDs requires a beforehand 2n − q 2n heuristic on the typical input lengths to be encountered. The result follows. In situations where the input lengths are inconstant, this scheme may not be that efficient. If the input length is 5.2 Proof of Theorem 5 [AXU security of CtH] known beforehand, then STDopt is the best option as it r 0 fixes the optimal (machine-dependent) counter size. VAR Let M 6= M and CTR(M) = S = {X1,...,Xb}, 0 0 has an edge over the other two in the worst case scenario, CTR(M ) = S = {X1,...,Xb0 }. As CTR is injective, there exists at least one block X which appears exactly once in S i.e., when neither the input length is known nor the inputs or S0. Thus, for any δ ∈ {0, 1}n, have predictable lengths. This scenario is quite frequent in server-side applications. 0 opt r Pr[CtHΠn,CTR(M) ⊕ CtHΠn,CTR(M ) = δ] Although STD and VAR are much more flexible in

= Pr[Πn(X) = ⊕x∈S1∪S2\{X}Πn(x)]. terms of the changes in the input length, they do require some book-keeping operations. STDopt requires the compu- The result follows by conditioning on the outputs tation of a suitable counter size for the given input length. X r of the random permutation on all blocks except . Similarly, VAR requires some kind of mechanism to update the counter size when the current counter reaches its limit. STDs does not incur such book-keeping overheads, as it Remark 5. Note that the above result has been proved s implicitly in [4], [5], [6] for the standard counter. Our is fixed for the construction. Thus, STD with s = 32 uses 10 result generalizes their results for any prefix-free and 32-bit counter irrespective of whether the input size is 2 20 opt injective counter function family. or 2 bits. For STD , we pre-store the set of counter sizes ({8, 16, 32, 64}) along with their respective maximum input lengths. Although this incurs a small increment in code size, 6 COMPARISON AND EMPIRICAL RESULT it reduces the number of micro-operations. For VARr, the In this section, we compare the performance of the three overhead can be reduced significantly by reducing the num- candidate counter function families via their application in ber of times the counter size is increased. For example, we different CaE schemes. In particular, we present pipelined increase the size in steps of multiple of 8 · 2i (i.e. 8,16,32,64) software implementation of counter based MACs, and se- (instead of 8 · i). All three counters were implemented in rial implementation of HAIFA based on DM compression 64-bit registers. function. We instantiate the MAC and DM compression Table 3 gives the book-keeping cost incurred by the three opt function using AES-128 [53] block cipher. AES cipher is counters. The cost for STD is significantly lower than used to exploit the AES-NI instruction set available on the other two candidates as in this case the major book- new generation Intel microprocessors. The source code is keeping operation is executed only once per message. For r s publicly available at [54]. VAR and STD the check for last message block accounts for most of the book-keeping cost. VARr has a complex s 6.1 Platform Setup counter update mechanism as compared to STD . This leads to a further increase in its book-keeping cost. Fig. 4 presents As mentioned earlier we use AES-128 (key size 128 bits) these characteristics graphically. as the underlying block cipher. We use performance data for all inputs of length ` = 2k bits where 7 ≤ k ≤ 20. We implemented all the schemes on Intel’s Sandy Bridge 6.3 Performance of MACs microarchitecture using AES-NI and SSE4 intructions. All We summarise the performance results of the deterministic measurements were taken on a single core of Intel Xeon E5- MACs in Table 4 and the stateful MACs in Table 5. We de- 2640 processor at 2.5Ghz with Turbo Boost disabled. The liberately leave out the performance results for probabilistic warmup parameter is 250000 and the data is averaged over MAC, as apart from the random number generation phase, 1000000 repetitions. All results will be either in number of it is exactly similar to the stateful MAC. Fig. 5 and 6 give cycles per byte or number of cycles. The baseline perfor- graphical characteristics of the performance data. STDs is mance for AES using AES-NI instruction set is presented in implemented for s = w · 2i, where w = 8 bits is the word Table 2. size, and i is varied over {0, 1, 2, 3}. Similar values are used

Copyright (c) 2017 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2017.2710125

12

TABLE 3: Cycles per byte comparison between the book-keeping 2.5 operations performed by the three CFF candidates. CtMAC1-VAR4,8 CtMAC1-STDopt,8 s CtMAC1-STD64 STD opt,8 4,8 Length STD VAR 2 32 s = 8 bits s = 16 bits s = 32 bits s = 64 bits CtMAC1-STD CtMAC1-STD16 128B 0.17 0.20 0.22 0.27 0.33 0.18 8 CtMAC1-STD 256B 0.15 0.17 0.18 0.27 0.18 0.17 per Byte 1.5 512B 0.14 0.15 0.16 0.23 0.13 0.15

1KB 0.13 0.14 0.16 0.22 0.09 0.15 Cycles 2KB 0.14 0.13 0.15 0.22 0.09 0.15 4KB - 0.14 0.15 0.22 0.10 0.16 1 8KB - 0.14 0.16 0.22 0.09 0.16 16KB - 0.13 0.15 0.22 0.09 0.15 6 8 10 12 14 16 18 20 32KB - 0.14 0.15 0.22 0.09 0.15 64KB - 0.14 0.16 0.22 0.09 0.16 Message Length (in Log Base 2) 128KB - 0.14 0.16 0.22 0.09 0.17 Fig. 5: CPB plot for the three CtMAC1 candidates. 256KB - 0.16 0.18 0.23 0.09 0.18 512KB - 0.17 0.19 0.24 0.09 0.19 1MB - - 0.19 0.24 0.08 0.19 similar cpb of around 0.8-0.9. For a fixed value of s in STDs, VARr outperforms STDs, 0.5 until the input length is significantly smaller than 2s, or the VAR4,8 r STDopt,8 counter size in VAR is smaller than s. For example, for 64 r s 0.4 STD s = 32 VAR is faster than STD , when the input length 32 STD ` ≤ 128KB (220 bits). At 256 KB, the counter size in VARr STD16 8 0.3 STD expands to 32 bits which causes a change in the slope of

per Byte the curve. Even when the input length lies in the optimal range the gain in using STDs is marginal when compared Cycles 0.2 to the flexibility offered by VARr over a diverse range of input lengths. r 0.1 Note that the effective counter size in VAR is always r bits less than the actual counter size. This is because the 6 8 10 12 14 16 18 20 first r bits are reserved. For example when the counter size Message Length (in Log Base 2) is 16 bits, only 12 bits are used. This reduces the maximum 23 19 Fig. 4: CPB plot for book-keeping operations. message length for 16-bit counter from 2 bits to 2 bits. TABLE 5: Cycles per byte comparison between the software perfor- mance of the three CtMAC2st candidates. in STDopt and VARr while choosing (or expanding) the counter size. STDs Length STDopt,8 VAR4,8 s = 8 bits s = 16 bits s = 32 bits s = 64 bits TABLE 4: Cycles per byte comparison between the software perfor- 128B 1.21 1.28 1.47 2.05 1.31 1.23 mance of the three CtMAC1 candidates. 256B 1.03 1.11 1.30 1.74 1.07 1.04 512B 0.94 1.04 1.11 1.62 0.94 1.00 STDs Length STDopt,8 VAR4,8 1KB 0.91 0.92 1.08 1.58 0.87 0.94 s = 8 bits s = 16 bits s = 32 bits s = 64 bits 2KB 0.84 0.90 1.03 1.54 0.79 0.89 128B 1.26 1.74 1.79 1.96 1.37 1.24 4KB - 0.89 1.02 1.53 0.85 0.88 256B 1.25 1.26 1.29 1.74 1.27 1.26 8KB - 0.88 1.01 1.52 0.82 0.87 512B 1.00 1.02 1.18 1.63 1.00 1.03 16KB - 0.88 1.01 1.51 0.82 0.87 1KB 0.88 0.98 1.07 1.57 0.88 0.91 32KB - 0.87 1.01 1.51 0.82 0.87 2KB 0.83 0.92 1.05 1.54 0.79 0.91 64KB - 0.87 1.00 1.51 0.81 0.89 4KB - 0.90 1.02 1.52 0.86 0.89 128KB - 0.87 1.00 1.52 0.81 0.95 8KB - 0.88 1.01 1.51 0.83 0.87 256KB - 0.87 1.00 1.52 0.81 0.99 16KB - 0.88 1.00 1.51 0.83 0.88 512KB - 0.88 1.01 1.52 0.82 1.01 32KB - 0.88 1.01 1.51 0.82 0.87 1MB - - 1.01 1.52 0.95 1.02 64KB - 0.88 1.00 1.51 0.81 0.89 128KB - 0.87 1.00 1.51 0.82 0.95 256 KB - 0.87 1.01 1.51 0.82 0.98 6.4 Performance of HAIFA 512KB - 0.88 1.01 1.51 0.82 1.00 1MB - - 1.01 1.51 0.95 1.01 We summarise the performance results of CtHAIFA in Ta- ble 6. The graphical representation is illustrated in Fig. 7. Some notes on the characteristics. It is evident from Fig. 5 The comparison results are similar to those obtained earlier, and 6 that STDopt gives the fastest MACs among all the in case of deterministic MACs and stateful MACs. three candidates. Also observe that when the input length is comparable to the counter size both STDs and VARr closely 7 CONCLUSION resemble the performance curve for STDopt. For instance, in In this paper we presented a generalized notion of counter Table 4, look at the entries corresponding to the range 8 KB function families. We also tried to formalize the crypto- (216) to 64 KB (219 bits). In this range all three counters offer graphically significant properties required from a counter

Copyright (c) 2017 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2017.2710125

13

2.5 20 CtMAC2st-VAR4,8 CtHAIFA-VAR4,8 CtMAC2st-STDopt,8 CtHAIFA-STDopt,8 CtMAC2st-STD64 CtHAIFA-STD64 2 CtMAC2st-STD32 CtHAIFA-STD32 CtMAC2st-STD16 15 CtHAIFA-STD16 CtMAC2st-STD8 CtHAIFA-STD8 per Byte 1.5 per Byte Cycles Cycles 10

1

6 8 10 12 14 16 18 20 6 8 10 12 14 16 18 20

Message Length (in Log Base 2) Message Length (in Log Base 2)

Fig. 6: CPB plot for the three CtMAC2st candidates. Fig. 7: CPB plot for the three CtHAIFA candidates.

TABLE 6: Cycles per byte comparison between the software perfor- CtHAIFA mance of the three candidates. studied prefix-free and injective counters. Although

STDs we studied HAIFA hash function which is a serial Length STDopt,8 VAR4,8 s = 8 bits s = 16 bits s = 32 bits s = 64 bits mode CaE scheme, our study was mainly motivated 128B 7.42 8.16 8.96 13.83 7.64 7.58 by those parallel mode CaE schemes such as XOR- 256B 7.37 7.76 8.95 13.33 7.31 7.47 MAC [4] and LightMAC [6]. We showed that prefix- 512B 7.17 7.55 8.69 13.15 7.00 7.53 free and injective properties are necessary for these 1KB 7.02 7.55 8.71 13.07 6.92 7.65 parallel mode of operations. It would be interesting 2KB 6.95 7.48 8.63 13.03 6.88 7.62 to investigate whether the prefix-free property can 4KB - 7.43 8.65 13.00 7.43 7.59 be relaxed in case of serial mode of operation. 8KB - 7.40 8.61 12.92 7.42 7.57 2) New CFF schemes. We have shown a general way 16KB - 7.42 8.61 12.95 7.42 7.60 32KB - 7.43 8.63 12.99 7.41 7.62 of using universal codes as CFFs. Can we have other 64KB - 7.41 8.64 12.97 7.41 7.78 such generic techniques? Also, it would be interest- 128KB - 7.41 8.65 12.98 7.41 8.37 ing to further investigate the CFFs mentioned in this 256KB - 7.42 8.66 12.99 7.42 8.66 paper. 512KB - 7.44 8.67 12.99 7.43 8.82 3) Application in CaI schemes. Although counter size 1MB - - 8.66 12.99 8.65 8.90 does not affect the rate of CaI schemes, it would be interesting to see whether we can get some ad- scheme. Based on these properties we gave straightforward vantage by replacing the standard counter function STDopt VARr proofs for some CaE schemes such as HAIFA hash function, with one of or . One can also investi- XOR universal hash and MACs. gate generic proof techniques for CaI schemes based Further we presented two efficient alternatives for the on CFF properties. standard counter scheme. One of these alternatives, namely, r the VAR counter function family is a nice application of REFERENCES universal codes in cryptography. To the best of knowledge [1] M. Dworkin, “Recommendation for Block Cipher Modes of Oper- this relationship has not been studied before. Finally we ation: The CMAC Mode for Authentication,” NIST, U. S. Depart- gave software comparison of MAC and HAIFA construc- ment of Commerce, NIST Special Publication 800-38B, 2005. tions based on the three counter function families. Based [2] D. A. McGrew and J. Viega, “The security and performance of the galois/counter mode (GCM) of operation,” in Proc. INDOCRYPT on our findings we can set the following guidelines for 2004, 2004, pp. 343–355. choosing one among the three counter function families for [3] P. Rogaway and T. Shrimpton, “Deterministic authenticated- a specific area of application: encryption: A provable-security treatment of the key-wrap prob- opt lem,” in Proc. EUROCRYPT 2006, 2006, pp. 373–390. 1) STD is the best option when the input length is [4] M. Bellare, R. Guerin,´ and P. Rogaway, “XOR macs: New methods known beforehand. for message authentication using finite pseudorandom functions,” in Proc. CRYPTO 1995, 1995, pp. 15–28. 2) If the input length is not known: [5] D. J. Bernstein, “How to Stretch Random Functions: The Security a) VARr is better when the message length can of Protected Counter Sums,” J. Cryptol., vol. 12, pp. 185–192, 1999. [6] A. Luykx, B. Preneel, E. Tischhauser, and K. Yasuda, “A MAC vary arbitrarily over a wide range. s mode for lightweight block ciphers,” in Proc. Fast Software Encryp- b) STD works best when the average input tion - FSE 2016, 2016, pp. 43–59. length is close to the application limit. [7] E. Biham and O. Dunkelman, “A Framework for Iterative Hash Functions - HAIFA,” IACR Cryptology ePrint Archive, 2007. r In general, VAR seems to be the best candidate that works [8] J. Kelsey and B. Schneier, “Second preimages on n-bit hash func- n for wide range of input lengths with comparable perfor- tions for much less than 2 work,” in Proc. EUROCRYPT 2005, 2005, pp. 474–490. mance curve. [9] E. Andreeva, C. Bouillaguet, O. Dunkelman, P. Fouque, J. J. Hoch, The paper shows following new research directions. J. Kelsey, A. Shamir, and S. Zimmer, “New second-preimage attacks on hash functions,” J. Cryptol., vol. 29, pp. 657–696, 2016. 1) Parallel vs Serial mode. In this paper we defined a [10] R. C. Merkle, “One way hash functions and DES,” in Proc. general notion of counter function families and then CRYPTO 1989, 1989, pp. 428–446.

Copyright (c) 2017 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected]. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2017.2710125

14

[11] I. Damgard,˚ “A design principle for hash functions,” in Proc. Construction,” in Proc. Fast Software Encryption - FSE 2002, 2002, CRYPTO 1989, 1989, pp. 416–427. pp. 237–251. [12] C. Bouillaguet and P.-A. Fouque, “Practical Hash Functions Con- [41] M. Bellare, O. Goldreich, and H. Krawczyk, “Stateless evaluation structions Resistant to Generic Second Preimage Attacks Beyond of pseudorandom functions: Security beyond the birthday bar- the Birthday Bound,” Submitted for publication, (2010). rier,” in Proc. CRYPTO 1999, 1999, pp. 270–287. [13] D. J. Bernstein, “Stronger Security Bounds for Wegman-Carter- [42] K. Minematsu, “How to thwart birthday attacks against macs via Shoup Authenticators,” in Proc. EUROCRYPT 2005, 2005, pp. 164– small randomness,” in Proc. Fast Software Encryption - FSE 2010, 180. 2010, pp. 230–249. [14] M. N. Wegman and L. Carter, “New hash functions and their use [43] E.´ Jaulmes and R. Lercier, “FRMAC, a proc. fast randomized in authentication and set equality,” J. Comput. Syst. Sci., vol. 22, pp. message authentication code,” IACR Cryptology ePrint Archive, no. 265–279, 1981. 166, 2004. [15] L. Carter and M. N. Wegman, “Universal classes of hash func- [44] J. Black and M. Cochran, “MAC reforgeability,” in Proc. Fast tions,” J. Comput. Syst. Sci., vol. 18, pp. 143–154, 1979. Software Encryption - FSE 2009, 2009, pp. 345–362. [16] V. Shoup, “A Composition Theorem for Universal One-Way Hash [45] B. Cogliati and Y. Seurin, “EWCDM: an efficient, beyond-birthday Functions,” in Proc. EUROCRYPT 2000, 2000, pp. 445–452. secure, nonce-misuse resistant MAC,” in Proc. CRYPTO 2016, 2016, [17] D. A. Huffman, “A method for the construction of minimum- pp. 121–149. redundancy codes,” Proc. IRE, vol. 40, pp. 1098–1101, 1952. [46] R. M. Fano, “The transmission of information,” Research Labora- [18] D. Salomon, Variable-length Codes for Data Compression. Secaucus, tory of Electronics at MIT, Technical Report No. 65, 1949. NJ, USA: Springer-Verlag New York, Inc., 2007. [47] P. Elias, “Minimum times and memories needed to compute the [19] C. E. Shannon, “A mathematical theory of communication,” Bell values of a function,” J. Comput. Syst. Sci., vol. 9, pp. 196–212, 1974. System Technical Journal, vol. 27, pp. 379–423,623–656, 1948. [48] ISO/IEC 10918-5:2013, Information technology – Digital compression [20] S. Gueron, “Intel R advanced encryption standard (aes) new in- and coding of continuous-tone still images: JPEG File Interchange structions set,” Intel Corporation, Tech. Rep., 2010. Format (JFIF), ISO/IEC, 2013. [21] B. Preneel, R. Govaerts, and J. Vandewalle, “Hash Functions Based [49] ZIP File Format Specification, PKWARE Inc., 2014. on Block Ciphers: A Synthetic Approach,” in Proc. CRYPTO 1993, [50] P. Elias, “Universal codeword sets and representations of the 1993, pp. 368–378. integers,” IEEE Trans. Inf. Theory, vol. 21, pp. 194–203, 1975. [22] B. Preneel, “Davies–Meyer,” in Encyclopedia of Cryptography and [51] A. S. Fraenkel and S. T. Klein, “Robust universal complete codes Security. Springer-Verlag, 2011. for transmission and compression,” Discrete Applied Mathematics, [23] J. Black, P. Rogaway, and T. Shrimpton, “Black-box analysis of vol. 64, pp. 31–55, 1996. the block-cipher-based hash-function constructions from PGV,” in [52] R. Dean, “Formal Aspects of Mobile Code Security,” Ph.D. disser- Proc. CRYPTO 2002, 2002, pp. 320–335. tation, Princeton University, 1999. [24] J. Coron, Y. Dodis, C. Malinaud, and P. Puniya, “Merkle-Damgard˚ [53] Announcing the ADVANCED ENCRYPTION STANDARD (AES), Revisited: How to Construct a Hash Function,” in Proc. CRYPTO NIST, U. S. Department of Commerce Fedral Information Process- 2005, 2005, pp. 430–448. ing Standards Publication 197, 2001. [25] J. Coron, J. Patarin, and Y. Seurin, “The random oracle model and [54] A. Dutta, A. Jha, and M. Nandi. [Online]. Available: https: the ideal cipher model are equivalent,” in Proc. CRYPTO 2008, //github.com/ashwinj/counter-as-encodings 2008, pp. 1–20. [26] P. Rogaway, “Bucket Hashing and Its Application to Fast Message Authentication,” J. Cryptol., vol. 12, pp. 91–115, 1999. [27] S. Halevi and H. Krawczyk, “MMH: Software Message Authenti- cation in the Gbit/Second Rates,” in Proc. Fast Software Encryption - FSE 1997, 1997, pp. 172–189. [28] P. Sarkar, “A New Multi-Linear Universal Hash Family,” Des. Codes Cryptography, vol. 69, pp. 351–367, 2013. [29] S. Winograd, “A New Algorithm for Inner Product,” IEEE Trans. Comput., vol. 17, pp. 693–694, 1968. [30] J. Black, S. Halevi, H. Krawczyk, T. Krovetz, and P. Rogaway, “UMAC: Fast and Secure Message Authentication,” in Proc. CRYPTO 1999, 1999, pp. 216–233. [31] T. Krovetz, “Message Authentication on 64-bit Architectures,” IACR Cryptology ePrint Archive, no. 37, 2006. [32] K. Minematsu and T. Matsushima, “New Bounds for PMAC, TMAC, and XCBC,” in Proc. Fast Software Encryption - FSE 2007, 2007, pp. 434–451. [33] M. Bellare and P. Rogaway, “The Security of Triple Encryption and a Framework for Code-Based Game-Playing Proofs,” in Proc. EUROCRYPT 2006, 2006, pp. 409–426. [34] D. Chang and M. Nandi, “A Short Proof of the PRP/PRF Switch- ing Lemma,” IACR Cryptology ePrint Archive, no. 78, 2008. [35] M. Bellare, J. Kilian, and P. Rogaway, “The Security of The Cipher Block Chaining Message Authentication Code,” J. Comput. Syst. Sci., vol. 61, pp. 362–399, 2000. [36] W. F. Ehrsam, C. H. W. Meyer, J. L. Smith, and W. L. Tuchman, “Message Verification and Transmission Error Detection by Block Chaining,” U.S. Patent 4 074 066, 1976. [37] J. Black and P. Rogaway, “A Block-Cipher Mode of Operation for Parallelizable Message Authentication,” in Proc. EUROCRYPT 2002, 2002, pp. 384–397. [38] A. Berendschot, B. den Boer, J. Boly, A. Bosselaers, J. Brandt, D. Chaum, I. Damgard,˚ M. Dichtl, W. Fumy, M. van der Ham, C. Jansen, P. Landrock, B. Preneel, G. Roelofsen, P. de Rooij, and J. Vandewalle, “Final Report of Race Integrity Primitives,” Lecture Notes in Computer Science, Springer-Verlag, 1995, vol. 1007, 1995. [39] M. Bellare, R. Canetti, and H. Krawczyk, “Keying hash functions for message authentication,” in Proc. CRYPTO 1996, 1996, pp. 1–15. [40] E.´ Jaulmes, A. Joux, and F. Valette, “On the Security of Ran- domized CBC-MAC Beyond the Birthday Paradox Limit: A New

Copyright (c) 2017 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].