<<

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 11, NOVEMBER 2017 2627 Real-Time Digital Signatures for Time-Critical Networks Attila Altay Yavuz, Member, IEEE, Anand Mudgerikar, Ankush Singla, Ioannis Papapanagiotou, Senior Member, IEEE, and Elisa Bertino Fellow, IEEE

Abstract— The secure and efficient operation of time-critical infrastructures possible. Such infrastructures will usher networks, such as vehicular networks, smart-grid, and other automation in a large number of application domains such as smart-infrastructures, is of primary importance in today’s soci- transportation, manufacturing, smart-grid and urban life (e.g. ety. It is crucial to minimize the impact of security mechanisms over such networks so that the safe and reliable operations Smart-city). of time-critical systems are not being interfered. For instance, Because of their control capabilities and pervasive data if the delay introduced by the crypto operations negatively acquisition, securing such smart-infrastructures is a critical affects the time available for braking a car before a collision, requirement. Even though many security techniques are avail- the car may not be able to safely stop in time. In particular, able, their application to smart infrastructures is not straight- as a primary authentication mechanism, existing digital signa- tures introduce a significant computation and communication forward, especially when such infrastructures are based on overhead, and therefore are unable to fully meet the real-time networks that include mobile devices, and for safety reasons, processing requirements of such time-critical networks. In this they have to meet real-time requirements. We refer to such paper, we introduce a new suite of real-time digital signatures networks as time-critical networks. referred to as Structure-free and Compact Real-time Authenti- An example is a vehicular network in which events from cation (SCRA), supported by hardware acceleration, to provide delay-aware authentication in time-critical networks. SCRA is vehicles, such as sudden brake of a vehicle, have to be a novel signature framework that can transform any secure communicated promptly to the other vehicles in the network aggregate signature into a signer efficient signature. We instan- so that they can timely react to the events. Scalability tiate SCRA framework with condensed-RSA, BGLS, and NTRU is also crucial as many envisioned time-critical networks signatures. Our analytical and experimental evaluation validates involve huge numbers of devices and systems. A security the significant performance advantages of SCRA schemes over their base signatures and the state-of-the-art schemes. Moreover, technique for any comprehensive solution is represented we push the performance of SCRA schemes to the edge via highly by authentication as it is critical for establishing trust optimized implementations on vehicular capable system-on-chip and securing communications among parties in a network. as well as server-grade general purpose graphics processing units. Authentication techniques have been widely investigated. We prove that SCRA is secure (in model) and However, to meet the real-time and scalability requirements of show that SCRA can offer an ideal alternative for authentication in time-critical applications. large scale time-critical networks, we need techniques that are Index Terms— Applied , digital signatures, far more efficient than the currently available ones. It is critical real-time authentication, hardware-acceleration. that devices in such a network should be able to respond and/or to initiate a large number of authentications in a small I. INTRODUCTION time-frame. ECHNOLOGICAL advances in sensors and embed- To address such a requirement, in this paper we develop ded systems are making the deployment of “smart” T a series of fast digital signatures, supported by hardware- Manuscript received December 25, 2016; revised April 12, 2017 and acceleration, to enable real-time authentication in time-critical May 29, 2017; accepted June 3, 2017. Date of publication June 19, networks. We introduce a generic signature framework, 2017; date of current version July 26, 2017. This material is partially referred to as Structure-free and Compact Real-time based upon work supported by the Department of Energy under Award DE-OE0000780 and by the NSF CAREER Award CNS-1652389 at Oregon Authentication (SCRA), that can be instantiated with any State University. This work was supported in part by NSF under Award secure aggregate signature. We then develop specific CNS-1719369 and Award ACI-1547390 at Purdue University. The associate SCRA instantiations from Condensed-RSA [30], BGLS [7], editor coordinating the review of this manuscript and approving it for publication was Prof. Qian Wang. (Corresponding author: Attila Altay Yavuz.) NTRU [27] and PASSSign [18], and demonstrate that these A. A. Yavuz is with the School of Electrical Engineering and Computer SCRA schemes are significantly more computationally effi- Science, Oregon State University, Corvallis, OR 97331 USA (e-mail: cient than their counterparts in modern CPUs. We also [email protected]). A. Mudgerikar, A. Singla, and E. Bertino are with the Computer Science computationally parallelize SCRA across thousands light- Department, Purdue University, West Lafayette, IN 47907 USA (e-mail: weight threads commonly supported by modern GPUs. We [email protected]; [email protected]; [email protected]). use several optimizations and show that the performance I. Papapanagiotou is with Netflix Inc., Los Gatos, CA 95032 USA, and also with Purdue University, West Lafayette, IN 47907 USA (e-mail: can be higher compared to the performance obtained when [email protected]). the CPU is used. Finally, we apply similar optimiza- Color versions of one or more of the figures in this paper are available tions to SoCs commonly used by car manufacturers and online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIFS.2017.2716911 IoT deployments. 1556-6013 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. 2628 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 11, NOVEMBER 2017

A. State-of-the-Art Methods and Limitations available pre-defined message structures in certain applications (e.g., smart-grid) to reduce the computational and storage We outline the advantages and limitations of authentication overhead of RSA-type offline-online constructions. Despite its mechanisms that are most relevant to our work. advantages, RA is only suitable for applications that have a pre- 1) Message Authentication Codes and Standard Digital defined message structure with a limited number of message Signatures: Symmetric crypto-based authentication mecha- components. Moreover, RA requires pre-computed tokens (i.e., nisms rely on Message Authentication Code (MAC) [28]. one-time masking signatures) to be stored/renewed per item as Despite their computational efficiency, these methods are in traditional offline-online techniques. Hardware-Accelerated not practical for broadcast authentication in large-scale dis- Authentication (HAA) [39] exploits hardware acceleration to tributed systems, as they require pairwise key distribution speed up RA in various settings. HAA demonstrates the ben- among all signers and verifiers. They also cannot achieve efit of hardware acceleration to reduce the end-to-end delay non-repudiation and public verifiability. Digital signatures of schemes. In particular, HAA shows the (e.g., RSA [35], ECDSA [3]) rely on the Public Key Infrastruc- performance advantages offered by GPUs for offline-online tures (PKIs) [28], which makes them publicly verifiable and signatures to batch regenerate tokens as they are depleted. scalable for large systems. Hence, they are considered as a primary authentication mechanism for large-scale delay- B. Our Contribution aware systems. For instance, the vehicular WAVE architecture We develop a new suite of delay-aware signatures that we mandates the use of PKI mechanisms to sign critical mes- refer to as Structure-Free and Compact Authentication (SCRA) sages [2]. Despite their scalability, standard digital signature to enable fast authentication for time-critical networks. schemes require several expensive operations such as modular 1) Main Idea: SCRA is based on the observation that the exponentiation and pairing (e.g., BLS [8]). Therefore, they are signature aggregation operation of some signature schemes is not suitable for time-critical authentication. It has been shown several magnitudes of times faster than that of their signature that they introduce significant delays, which are unacceptable generation. We leverage this fact to shift the expensive opera- in time-critical networks such as vehicular networks [33]. tions of signature generation phase to the key generation phase. 2) Delayed Key Disclosure and Amortized Signatures: That is, at the key generation (offline), we compute a set of Delayed key disclosure methods [32] are efficient as they signatures on the bit-structures of a hash output domain. Later, introduce an asymmetry between signer and verifier via a we can combine these pre-computed signatures very efficiently time factor. However, these methods require packet buffering, based on the hash of each message without enforcing a and therefore cannot achieve immediate verification (which is message format (e.g., unlike [41]) or storage/regeneration of a vital for delay-aware authentication). Signature amortization token per-message (e.g., unlike offline-online signatures (e.g., (e.g., [25]) computes a signature over a set of messages instead RA [9], [41]) that incurs linear storage and re-computation of individual messages. Hence, the cost of signature generation overhead). This simple but elegant strategy enables SCRA to and verification is amortized over multiple messages. However, achieve very fast signature generation, a low end-to-end cryp- these methods require packet buffering and introduce packet tographic delay, small-constant signature sizes with a constant- loss risk due to the use of hash chains. size private/public key. Figure 1 further outlines our main idea. 3) Specialized Signatures: One-Time Signatures (OTSs) 2) Properties: We outline below the relevant properties of (e.g., [34]) offer fast signature generation and verification. our schemes. However, they incur very large signature and public key sizes, • Generic and Simple Design: SCRA can be instantiated and also public keys must be renewed frequently. Various from any aggregate signature. We prove that SCRA is customizations of traditional signatures (along with crypto- EU-CMA-secure if its base scheme is IA-EU-CMA secure graphic pairing [8]) and OTSs for time-critical systems such (see Section II). We show that SCRA is at least a magnitude as vehicular networks (e.g., [16]) and smart-grids have been times faster than standard signatures as shown in Table I even proposed. However, these schemes still suffer from computa- without optimization. tional inefficiency (due to heavy use of pairings) or public key • Highly Fast Signing, Low Delay and Compactness:We distribution issues (OTSs). develop several instantiations of SCRA offering performance The offline-online signatures (e.g., [31]) pre-compute a trade-offs with different computational overhead, signature and token for each message to be signed at the offline-phase, and key sizes. then use it to compute a signature on a message very effi- - SCRA-C-RSA is constructed from C-RSA [30], which trans- ciently at the online-phase. Despite their merits, offline-online forms the highly costly exponentiation of RSA signing into signatures incur significant storage overhead (i.e., linear with a few modular exponentiations, followed by already efficient respect to the number of messages to be signed). Moreover, signature verification. Therefore, SCRA-C-RSA offers the they require heavy computation for applications with high lowest end-to-end delay among all of its counterparts (e.g., message throughput, since the signer depletes pre-computed 7 and 18 times faster than ECDSA and RSA, respec- tokens rapidly and is forced to regenerate them at the online- tively) with a signature size of standard RSA. This makes phase. Hence, offline-online signatures are not suitable for SCRA-C-RSA an ideal choice for time-critical applications time-critical networks with high message throughput. with a reasonable signature size. Our prior work Rapid Authentication (RA) [41] is an - SCRA-BGLS is constructed from BGLS [7], which reduces efficient offline-online signature, which leverages the already the signing cost from an exponentiation to a few modular YAVUZ et al.: REAL-TIME DIGITAL SIGNATURES FOR TIME-CRITICAL NETWORKS 2629

Fig. 1. The main idea behind our preliminary construction.

TABLE I THE ESTIMATED EXECUTION TIME (IN msec) OF SCRA AND ITS COUNTERPARTS

multiplications. SCRA-BGLS offers the smallest signature A. Notation and Definition size among all counterparts with a minimal signer overhead, S S l | | denotes the cardinality of set . {xi }i=0 denotes (x0,..., making it suitable for resource-limited devices. $ - SCRA-NTRU is based on the NTRU [27] signature scheme. xl). x ← S denotes that variable x is randomly and uniformly ∗ It is important to mention that we use the NTRU scheme selected from set S. ||, |x| and {0, 1} denote the concatenation that is secure against transcript attacks [13]. Signatures are operation, the bit length of variable x and the set of binary aggregated using the lattice based aggregation technique strings of any finite length, respectively. described in [15]. The lattice based sequential aggregate Definition 1: A signature scheme SGN is a tuple of three signature is proven to be secure in the random oracle security algorithms (Kg, Sig, Ver ) defined as follows: κ model [4]. Due to its moderate signature and key sizes and - (sk, PK) ← SGN.Kg(1 ): Given the security κ low end-to-end delay, SCRA-NTRU is ideal for time-critical parameter 1 , the key generation algorithm returns applications. a private/public key pair (sk, PK) as the output. - SCRA-NTRUPASS is based on the PASS [18] signature - s ← SGN.Sig(m, sk): The signing algorithm takes a ∗ scheme. It is also a lattice based cryptographic scheme based message m ∈{0, 1} and a private key sk as the input, on the partial Fourier recovery problem. and returns a signature s as the output. • Performance Enhancements via Hardware-Acceleration: - {0, 1}←SGN.Ver (m, s, PK): The verification algorithm ∗ We improve the performance of SCRA by developing vari- takes a message m ∈{0, 1} , signature σ and public key ous hardware-acceleration and software-optimizations, which PK as the input. It returns a bit: 1 means valid and 0 enable significant speed improvements (see Section VI). means invalid. SCRA relies on aggregate signatures [7], which can aggregate II. DEFINITIONS AND MODELS multiple signatures into a single compact signature. SCRA uses We first introduce our notation and definitions, followed a single-signer aggregate signature (e.g., [30], [43]), which by our system and threat model. We then give our security aggregates signatures computed under the same private key. model, in which we clarify the security properties of the Definition 2: A single-signer aggregate signature ASig is SCRA schemes. defined as follows: 2630 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 11, NOVEMBER 2017

- (sk, PK) ← ASig.Kg(1κ ): Given the security parameter aggregated signatures [29]. To describe Immutable-A-EUCMA 1κ , the key generation algorithm returns a private/public (IA-EU-CMA) [26], [43] security, we first define the aggregate key pair (sk, PK) as the output. signature extraction argument as below. - γi ← ASig.Sig(mi , sk): The signing algorithm takes a 1) Aggregate Signature Extraction: The L-aggregate signa- ∗ message mi ∈{0, 1} and private key sk as the input. ture extraction problem, referred as AE problem, means that It returns a signature γi computed under sk as the output. for a given aggregate signature s ← ASig.Agg(γ1,...,γL ) - s ← ASig.Agg(γ1,...,γL , params): The aggregation computed on L individual data items, it is difficult to extract algorithm takes a set of signatures γ1,...,γL and option- the individual signatures (γ1,...,γL) provided that only s ally some parameters params as the input. It returns is known to the extractor. Moreover, it is difficult to extract a single-compact signature s as the output. Optional any aggregate signature subset s from a given aggregate params may include sk (aggregation under private key) signature s [42]. The AE problem was first introduced by or PK (public aggregation) depending on specific instan- Boneh et al. in [7] for the security of BGLS signatures, tiations. We will omit params for the sake of simplicity. but as an intractability assumption without a proof. Coron - {0, 1}←ASig.Ver(−→m , s, PK): The verification algorithm and Naccache [10] later showed that Boneh’s AE problem −→ takes messages m = (m1,...,m L ), aggregate signa- for BGLS scheme is equivalent to the Computational Diffie ture s and PK as the input. It returns a bit: 1 means valid Hellman Assumption (CDH) [21]. Yavuz et al. [43] analyzed and 0 means invalid. the log truncation problem for forward-secure and aggregate signatures [26], and produced formal proofs with AE argu- B. System and Threat Model ment for only the DLP-based schemes [43]. A related problem Our system model follows the traditional PKC-based broad- in the context of one-way accumulators for RSA have been cast authentication model (e.g., [41]), in which a signer considered in [6], which extends to other aggregate RSA computes a digital signature on a message and broadcasts a variants (e.g., C-RSA [29]). message-signature pair to the verifiers. This model is com- Definition 4: The AE experiment for a ASig is as patible with our target time-critical applications. For instance, follows [42]: in vehicular networks, a vehicle or road infrastructure broad- - Setup. Algorithm B runs (sk, PK) ← ASig.Kg(1κ ) and casts authenticated messages to the surrounding entities as provides PK to the adversary A . described in vehicular communication standards [2]. Our threat - Queries. A queries B on any batch message comprised −→ model reflects how a standard digital signature-based broadcast of L individual messages m j = (m j,1,...,m j,L ) of her authentication works. That is, an adversary A can observe choice for j = 1,...,qs. B replies to each query j with message-signature pairs computed under a private key. A also an aggregate signature s j ← ASig.Agg(γ j,1,...,γj,L), L can actively intercept, modify, inject and replay messages where {γ j,i ← ASig.Sig(m j,i , sk)}i=1. transmitted over the network. A aims at producing existential - Aggregate Extraction. A outputs (−→m ∗,σ),where−→m ∗ = ∗ ∗ forgeries against the digital signatures computed by signers. (m1,...,mk ), 1 ≤ k ≤ L and wins the AE experiment, if ∗  C. Security Model 1. ASig.Ver({m }i∈{1,...,k},σ , PK) = 1, −→ i The security notion for a signature is Existential Unforge- 2. m ∗ is a subset of previously queried or some combi- ability under Chosen Message Attacks (EU-CMA). nation of previously queried batch messages: ∃I  ⊆ −→∗ −→ Definition 3: The EU-CMA experiment for SGN is as {1,...,qs }: m ⊆||k∈I  m k. This implies that −→ follows: m ∗ itself as a batch query never has been queried - Setup. Algorithm B runs (sk, PK) ← SGN.Kg(1κ ) and directly to B (but individual data items in −→m ∗ have provides PK to the adversary A . been queried as an element of different batch queries - Queries. A queries B on any message m j of her before, but not individually), −→∗ choice for j = 1,...,qs. B replies to each query with 3. The extraction is non-trivial: If m is combined with a signature s j ← SGN.Sig(m j , sk). any previously queried or a combination of previously - Forgery. A outputs a forgery (m∗, s∗) and wins the queried batch messages, the combination is not equal EU-CMA experiment, if SGN.Ver (PK, m∗, s∗) = 1and to one of the previously queried batch message itself: ∗ B −→∗ −→ −→ qs m was not queried to . ∀I ⊆{1,...,qs}:[m ||(|| j∈I m j )] ={m l }l=1. SGN is (t, qs,)-EU-CMA secure, if no A in time t making ASig is (t, qs,)-AE secure, if no A in time t making at most at most qs queries has an advantage with probability . qs queries has an advantage with probability . SCRA is constructed from a single-signer aggregate signa- We now provide the definition of Immutable-A-EUCMA ture that achieves the signature immutability (described in (IA-EU-CMA) security [26], [43] as below: detail below). The basic security notion for aggregate sig- Definition 5: The IA-EU-CMA experiment for a ASig is as natures is Aggregate-EU-CMA (A-EU-CMA) [7], [20], which follows [42]: captures the homomorphic properties of aggregate signatures. - Setup. Algorithm B runs (sk, PK) ← ASig.Kg(1κ ) and Later, the security of aggregate signatures has evolved to cap- provides PK to the adversary A . ture improved security properties such as signature immutabil- - Queries. A queries B on any batch message comprised −→ ity. Intuitively, signature immutability refers to the difficulty of of L individual messages m j = (m j,1,...,m j,L ) of her computing new valid aggregated signatures from a set of other choice for j = 1,...,qs. B replies to each query j with YAVUZ et al.: REAL-TIME DIGITAL SIGNATURES FOR TIME-CRITICAL NETWORKS 2631

an aggregate signature s j ← ASig.Agg(γ j,1,...,γj,L), The random padding P is added to ensure that, for practical L where {γ j,i ← ASig.Sig(m j,i , sk)}i=1. applications, the input of hash function remains larger than d - Forgery. A outputs a forgery (−→m ∗,γ∗) and wins the as required. experiment IA-EU-CMA,if We construct a pre-computed sub-message/signature table −→ b 1. The forgery is valid as ASig.Ver( m ∗ ,γ∗, PK) = 1,  ={˜m ,γ }L,2 −1 , which supports very efficient signa- −→ i, j i, j i=1, j=0 2. m ∗ is a subset of previously queried or some com- ture generation.  is constant-size (e.g., unlike [9], [31]) and bination of previously queried batch messages: ∃I  ⊆ imposes no structure/length constraints on the online messages −→∗ −→ {1,...,qs}: m ⊆||k∈I  m k, to be signed (e.g., unlike [41]). −→ 3. Batch query m ∗ has not been queried previously 2) Signature Generation: Given m ∈{0, 1}∗, the signer as −→m ∗ ⊆{−→m }qs . This implies one of the two ∗ ∗ j j=1 computes (M1 ,...,ML ) ← H (m||r), and fetches the cor- ∗ −→∗ $ conditions: (i) At least one item m ⊆ m has never responding signatures γ  of i||M∗||P from ,wherer ← B i i been queried to , or (ii) the AE experiment wining {0, 1}κ, i = 1,...,L. The rest is to combine signatures conditions 2-3 hold as described in Definition 4.   efficiently as s ← ASig.Agg(γ1, ...,γL),whereσ ← (r, s). ASig is (t, qs,)-IA-EU-CMA-secure, if no A in time t 3) Signature Verification: The verifier computes making at most qs queries has an advantage at least with ∗ ∗ (M1 ,...,ML ) ← H (m||r) and verifies σ as probability . ∗ ∗  {0, 1}←ASig.Ver( 1||M1 ||P,..., L||ML ||P , s, PK ).

III. PROPOSED SCHEMES In this section, we present our proposed schemes. We B. Instantiations of SCRA first describe the SCRAdigital signature framework. We An ideal aggregate signature to instantiate SCRA must then provide several instantiations of the generic SCRA, achieve very efficient signature aggregation and each offering a unique performance benefit compared to IA-EU-CMA security. We identified three signatures to the others. instantiate SCRA: Condensed-RSA (C-RSA) [30] based on RSA [35], BGLS [7] based on pairing and aggregate- A. Structure-Free and Compact Real-Time Authentication NTRU signatures [15], [36] based on NTRU [13]. We SCRA can transform any aggregate signature into a signer- summarize important operations of our SCRA instantiations efficient signature scheme, whose signing operation is as in Algorithms 2-5. For the sake of brevity, we only give the fast as just the aggregation (i.e., simple modular addition dominant signature operations that are performed in each or multiplication) of a small set of pre-computed signatures. algorithm. The rest of the SCRA operations are as described SCRA has several advantages over the state-of-the-art sig- in Algorithm 1 and are not repeated. Moreover, we only natures: (i) SCRA is a magnitude(s) of times more efficient give the private/public keys of each instantiation without with respect to signature generation than standard signatures describing key generation steps and parameters in detail. (e.g., RSA [35], ECDSA [3], BGLS [7]). (ii) Unlike message- We refer interested readers to C-RSA [30], BGLS [7] and formatted signature schemes [41], SCRA does not require NTRU [15], [36] for the details. any pre-defined message formats. (iii) Unlike offline-online SCRA-C-RSA is based on Condensed-RSA (C-RSA) [30] signatures [31], [37], [41], SCRA does not require linear- and therefore it obtains the highest computational efficiency sized token storage. (iv) SCRA offers compact signature and benefit from SCRA among all instantiations. That is, C-RSA is public key sizes, and therefore is more scalable than one-time by default a verifier efficient signature scheme but its sig- signatures (e.g., [34]). nature generation is expensive (i.e., an exponentiation under The detailed description of SCRA is given in Algorithm 1. a large modulo). Since the SCRA significantly reduces the We further elaborate as follows: signing cost, SCRA-C-RSA achieves the lowest end-to-end Let (sk, PK) ← ASig.Kg(1κ ) be a ASig key pair and H : delay among all instantiations with a moderate signature size {0, 1}∗ →{0, 1}d be an ideal hash function (i.e., H behaves (e.g., 2KB RSA signature size). SCRA-C-RSA is described in as a Random Oracle (RO) [4]), where d-bit denotes the output Algorithm 2. length of the cryptographic hash function. SCRA-BGLS is based the BGLS signatures [7], and there- 1) Key Generation (Offline): We apply a divide-and- fore has the smallest signature/ among all instantia- conquer strategy over the hash output H :{0, 1}∗ →{0, 1}d. tions (e.g., 320 bits). The SCRA strategy also significantly That is, a d-bit hash output can be interpreted as integers increases the signature efficiency of BGLS. However, since ( j1,..., jL), where each ji is a b-bit integer such that b · L = BGLS has an expensive signature verification due to crypto- d. We then compute a signature on each b-bit integer j graphic pairing operations, SCRA-BGLS has a larger end-to- with its corresponding index i as m˜ i, j ← i|| j||P, γi, j ← end cryptographic delay compared to our other instantiations.  b ASig.Sig(m˜ i, j , sk ), i = 1,...,L, j = 0,...,2 − 1, where SCRA-BGLS is described in Algorithm 3. P is a random padding. The index (i, j) will enable the signer SCRA-NTRU is based on NTRU aggregate signature [15]. to select the corresponding pre-computed signature from the Note that SCRA-NTRU achieves the highest signing effi- table  in the online phase for a given message, and therefore ciency among all instantiations (it is even more efficient than ensure the correctness of the scheme. Moreover, the index i SCRA-C-RSA at the signer side). It also has a low end-to- enforces the order of the bit chunks in the online phase. end delay, which is comparable to SCRA-C-RSA but slightly 2632 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 11, NOVEMBER 2017

Algorithm 1 Structure-Free Compact Real-Time Authentication (SCRA) Scheme (sk, PK) ← SCRA.Kg(1κ ): Executed offline (once). $ 1: (sk, PK) ← ASig.Kg(1κ ), P ←{0, 1}d . 2: Select integers (b, L) such that b · L = d.  b 3: m˜ i, j ← i|| j||P, γi, j ← ASig.Sig(m˜ i, j , sk ), i = 1,...,L, j = 0,...,2 − 1.   b 4: sk ← (sk ,) and PK ← (PK , P),where ← (m˜ i, j ,γi, j ) for i = 1,...,L, j = 0,...,2 − 1.

σ ← SCRA.Sig(m, sk): Given a message m ∈{0, 1}∗, compute its signature as follows: ∗ ∗ κ ∗ b 1: (M1 ,...,ML ) ← H (m||r),wherer ←{0, 1} and Mi ∈[0, 2 − 1], i = 1,...,L.  ∗   2: mi ← i||Mi ||P, and fetch corresponding signature γi of mi from table , i = 1,...,L.   3: s ← ASig.Agg(γ1,...,γL) and σ = (r, s). {0, 1}←SCRA.Ver(m,σ,PK):Givenm ∈{0, 1}∗, verify its signature σ under PK as follows: ∗ ∗ 1: (M1 ,...,ML ) ← H (m||r),  ∗ 2: mi ← i||Mi ||P, i = 1,...,L,    3: {0, 1}←ASig.Ver( m1,...,m L , s, PK ).

Algorithm 2 SCRA Instantiation With Condensed-RSA [30]: SCRA-C-RSA (sk, PK) ← SCRA-C-RSA.Kg(1κ ):Given1κ , generate C-RSA and SCRA-C-RSA parameters as follows: ∗ 1: Randomly generate two large primes (p, q) and computes n = p · q. The public and secret exponents (e, d) ∈ Zn satisfies e · d ≡ 1modφ(n),whereφ(n) = (p − 1)(q − 1).Setsk ← (n, d) and PK ← (n, e).LetH  be a full domain hash  ∗ function (e.g., [5]) defined as H :{0, 1} → Zn.  d b 2: Compute γi, j ← H (m˜ i, j ) mod n, i = 1,...L, j = 0,...,2 − 1, set (, sk, PK) as in Algorithm 1.

  σ ← SCRA-C-RSA.Sig(m, sk): Execute Algorithm 1 SCRA.Sig Step 1-2, and obtain γi of mi from , i = 1,...,L. L Compute s ← i=1(γi ) mod n.Setσ as in Algorithm 1 SCRA.Sig Step 3.  {0, 1}←SCRA-C-RSA.Ver(m,σ,PK): Execute Algorithm 1 SCRA.Ver Step 1-2. In Step 3 (i.e., aggregate signature e L   verification), if s = i=1 H (mi ) mod n, return 1, else return 0.  Algorithm 3 SCRA Instantiation With BGLS [7]: SCRA-BGLS κ (sk, PK) ← SCRA-BGLS.Kg(1 ): G1 and G2 are two (multiplicative) cyclic groups of prime order p. g1 and g2 are generators of G1 and G2, respectively. GT is an additional group such that |G1|=|G2|=|GT |. e is a bilinear pairing a b a·b e : G1 × G2 → GT such that (i) Bilinear:forallu ∈ G1, v ∈ G2, e(u ,v ) = e(u,v) . (ii) Non-degenerate: e(g1, g2) = 1  ∗ (please refer to [7] for details). Finally, H :{0, 1} → G1 is a Full Domain Hash [19] modeled as RO [4].    x G $    1: Set sk = x and PK = g2 ∈ 2 [7], where x ← Z p.  x b 2: Compute γi, j ← H (m˜ i, j ) ∈ G1, i = 1,...L, j = 0,...,2 − 1, set (, sk, PK) as in Algorithm 1.

  1: σ ← SCRA-BGLS.Sig(m, sk): Execute Algorithm 1 SCRA.Sig Step 1-2, and obtain γi of mi from , i = 1,...,L. L G Compute s ← i=1(γi ) ∈ 1.Setσ as in Algorithm 1 SCRA.Sig Step 3.  1: {0, 1}←SCRA-BGLS.Ver(m,σ,PK): Execute Algorithm 1 SCRA.Ver Step 1-2. In Step 3 (i.e., aggregate signature L   x G verification), if e(s, g2) = i=1 e(H (mi ), g2 ∈ 2), return 1, else return 0. Note: We implement SCRA-BGLS on an E, in which modular exponentiation and multiplication correspond point scalar multiplication and point addition on E [17], respectively. less efficient, since NTRU aggregate signature verification the partial Fourier recovery problem rather than the approxi- algorithm in [15] is less efficient than that of SCRA-NTRU and mate CVP problem for SCRA-NTRU. a low end-to-end delay but with a larger signature size. SCRA-NTRUPASS is based on the PASS [18] signature IV. SECURITY ANALYSIS scheme. It provides similar performance to SCRA-NTRUin We now present our security analysis for SCRA schemes. terms of both latency and storage but the lattice based scheme Theorem 1: SCRA is (t, qs,)-EU-CMA secure, if the   is more practical to use. This means that SCRA-NTRUPASS is underlying ASig is (t , qs,)-IA-EU-CMA secure, where t = secure for usage even with smaller parameters as it is based on O(t)+(ASGN +RO(.))·qs.RO(.) and ASGN denote the cost of YAVUZ et al.: REAL-TIME DIGITAL SIGNATURES FOR TIME-CRITICAL NETWORKS 2633

Algorithm 4 SCRA Instantiation With Lattice-Based Sequential Aggregate Signatures [15]: SCRA-NTRU (sk, PK) ← SCRA-NTRU.Kg(1κ ): We use lattice-based sequential aggregate signature schemes AggSign and AggVerify as described in [15], that is secure in the random oracle model [4]. Let AggSign and AggVerify are functions as described in [15].  X 1: Set fA : Bn → Rn as a family of preimage-sampleable NTRUSign [27], where PK ← A = g/f ∈ Rq ,  fg sk ← T = , T is the trapdoor and Bn is the domain of f A. ωi represents the list of i partial aggregate FG signatures. b 2: Compute γi, j ← NT RUSign(sk, H (m˜ i, j )), i = 1,...L, j = 0,...,2 − 1, set (, sk, PK) as in Algorithm 1.

  1: σ ← SCRA-NTRU.Sig(m, sk): Execute Algorithm 1 SCRA.Sig Step 1-2, and obtain γi of mi from , i = 1,...,L. Compute s ← AggSign(T,γi ,ωi−1) for i = 1,...,L.Setσ as in Algorithm 1 SCRA.Sig Step 3.

1: {0, 1}←SCRA-NTRU.Ver(m,ω,PK): Execute Algorithm 1 SCRA.Ver Step 1-2. In Step 3 (i.e., aggregate signature L verification), if AggVerify(A, m, s, {ωi }i ),return1,elsereturn0.

Algorithm 5 SCRA Instantiation With Lattice-Based Sequential Aggregate Signatures [15]: SCRA-NTRUPASS (sk, PK) ← SCRA-NTRUPASS.Kg(1κ ): We again use lattice-based sequential aggregate signature schemes AggSign and AggVerify as described in [15], that is secure in the random oracle model [4]. Let AggSign and AggVerify are functions as described in [15].  ∞ 1: Set fA : Bn → Rn as a family of preimage-sampleable trapdoor function PASSSign [18], where sk is a polynomial L   norm equal to 1(coefficients are chosen independently from the set [−1, 0, 1]), PK ← A = F · sk , T is the trapdoor and Bn is the domain of f A. ωi represents the list of i partial aggregate signatures. b 2: Compute γi, j ← PASSSign(sk, H (m˜ i, j )), i = 1,...L, j = 0,...,2 − 1, set (, sk, PK) as in Algorithm 1.

  1: σ ← SCRA-NTRUPASS.Sig(m, sk): Execute Algorithm 1 SCRA.Sig Step 1-2, and obtain γi of mi from , i = 1,...,L. Compute s ← AggSign(T,γi ,ωi−1) for i = 1,...,L.Setω as in Algorithm 1 SCRA.Sig Step 3.

1: {0, 1}←SCRA-NTRUPASS.Ver(m,σ,PK): Execute Algorithm 1 SCRA.Ver Step 1-2. In Step 3 (i.e., aggregate signature L verification), if AggVerify(A, m, s, { i }i ), return 1, else return 0. 

L random oracle invocation and aggregate signature generation, such that {|m ˜ j,i |=b}i=1 and b · L = d (as in Algorithm respectively. 1 SCRA.Kg Steps 2-3 and SCRA.Sig Step 1). −→ −→ Proof: Suppose that A breaks (t, qs,)-EU-CMA secure 2. F queries s j ← Osk (M j ),whereM j = 1||m ˜ j,1||P,..., SCRA. We construct a simulator F , which breaks L||m ˜ j,L||P. F sends σ j = (s j , r j ) to A (as in Algorithm  (t , qs,)-IA-EU-CMA secure ASig by using A as a subroutine 1 SCRA.Kg Step 3 and SCRA.Sig Steps 2-3). with the experiment below: Forgery: A outputs a forgery (m∗,σ∗ = s∗, r ∗ ) and Setup: F is provided with two oracles as below: wins the EU- CMA experiment, if (i) SCRA.Ver (m∗,σ∗, $ d ∗ A 1. A random oracle h ← RO(m), which returns h ←{0, 1} PK) = 1and(ii) m ∈{/ m1,...,mqs }.If loses if m ∈{0, 1}∗ has not been queried before, else it in the EU-CMA experiment, then B also loses in the returns the same answer h for the given m.Thatis,the IA-EU-CMA experiment and aborts.Otherwise,F returns cryptographic hash function H in SCRA is modeled as a −→∗ ∗ −→∗ ∗ a ASig forgery as (M , s ),whereM = (1||m ˜ 1||P,..., random oracle. L||m ˜ ∗ ||P) such that (m˜ ∗,...,m˜ ∗ ) ← RO(m∗||r ∗) (as −→ L 1 L 2. A signature oracle s ← Osk ( m ) as in in Query phase Step 1). F check the forgery condi- IA-EU-CMA experiment (i.e., Definition 5). That tions for IA-EU-CMA experiments as in Definiton 5 as −→ is, given a query m = (m1,...,m L ), O returns an follows: aggregate signature as s ← ASig.Agg(γ1,...,γL), 1) Validity: Given that SCRA.Ver (m∗,σ∗, PK) = 1 holds,   κ −→ where (sk , PK ) ← ASig.Kg(1 ) and {γi ← ASig.Ver(M ∗, s∗, PK) = 1 also holds. Therefore, ASig m sk }L .Sig( i , ) i=1. ASig forgery is valid. $ −→ −→ −→ 3. F gives PK ← (PK, P) to A ,whereP ←{0, 1}d as F ∗ 2) Non-triviality: checks if M ⊆{M 1,...,M qs } holds. in Algorithm 1 SCRA.Kg Step 4. This implies of the conditions below: ∗ −→ Queries: A queries F on m j ∈{0, 1} for j = 1,...,qs. ∗ ∗ a) At least one data item ( j||m ˜ j ||P) ∈ M has never For each query j, F performs the following operations: been queried to O (i.e., the forgery condition 3.i in F 1. queries RO(.) on (m j ||r j ), and receives an answer as Definition 5). Hence, the IA-EU-CMA-secure ASig is $ κ (m˜ j,1,...,m˜ j,L ) ← RO(m j ||r j ),wherer j ←{0, 1} broken. 2634 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 11, NOVEMBER 2017

b) The signature extraction occurs by Definition 4 condi- 2) Non-triviality: F checks if one these conditions hold:  −→∗ −→ ∗ −→∗ tion 2-3 as ∃I ⊆{1,...,qs}:M ⊆||k∈I  M k (i.e., i) At least one data item ( j||m ˜ j ||P) ∈ M has never the forgery condition 3.ii in Definition 5). This implies been queried to O.ThisimpliesthatIA-EU-CMA- −→ that M ∗ as a batch query has never been queried to O. secure C-RSA is broken, since by the validity condi- ∗ L −→∗ tion, there is a signature as s = H ( j||m ˜ ∗||P)d mod At the same time, each data item { j||m ˜ 1||P} j ∈ M j −→ n, which was not obtained from O. ii) The signature has been queried as a part of a batch query k ∈ I  M , k extraction occurs as defined in Theorem 1 Non-triviality and s∗ is the aggregation of their corresponding indi- condition (b). That is, individual signatures {s∗ = vidual signatures (i.e., individual signatures have been j  ∗ d L extracted and combined as in Definition 4). Finally, H ( j||m ˜ j ||P) mod n} j=1 have never been individually −→ −→ the signature extraction is non-trivial since M ∗ is queried to O, but all were part of a batch query k ∈ I M k. comprised of L data items and therefore it cannot be a This implies IA-EU-CMA-secure C-RSA is broken by the trivial combination of previously asked batch queries. signature extraction argument as in [30] and [42]) (see Hence, the IA-EU-CMA-secure ASig is broken. Section II-C). The non-triviality holds as in Theorem 1. F The success probability is as in Theorem 1 and the proba- If the above conditions hold, wins in the  |n|/2 IA-EU-CMA experiment against ASig.Otherwise,F aborts. bility that H produces a collision is 1/2 . For each query F of A , F performs a query to RO(.) and O, which requires a The probability that wins in the IA-EU-CMA experiment  is identical to that of A winning in the EU-CMA experiment. H computation, followed by an exponentiation/multiplication Note that H is modeled as a random oracle, and therefore under n for each item in (1||m ˜ j,1||P,...,L||m ˜ j,L||P). Hence, F A the probability that H is not target collision-resilient or the execution time of is that of plus [RO(.)+L·(Expn+   subset-resilient [34] is a negligible probability in terms of κ Muln + H )]·qs. d/2 A F Theorem 3: SCRA-BGLS is (t, qs,)-EU-CMA secure, if (i.e., 1/2 ). For each query of , performs a query  to RO(.) and another query to O. Hence, the execution time the underlying BGLS is (t , qs,)-IA-EU-CMA secure, where    t = O(t) +[RO(.) + L · (Exp+ Mul + H )]·qs.RO(.), H , of F is that of A plus (ASGN + RO(.)) · qs.  Exp and Mul denote the cost of random oracle invocation, We now prove that the SCRA-C-RSA and SCRA-BGLS  schemes are secure in Theorem 2 and Theorem 3, respectively. hash function H , modular exponentiation and multiplication G Remark that, for the sake of brevity, we refer to the generic in 1, respectively. A proof in Theorem 1 for common steps, and only emphasize Proof: Suppose that breaks (t, qs,)-EU-CMA secure F the scheme-specific steps in these theorems. SCRA. We construct a simulator , which breaks (t, q ,)-IA-EU-CMA secure BGLS by using A as a sub- Theorem 2: SCRA-C-RSA is (t, qs,)-EU-CMA secure, if s  routine with the experiment below: the underlying C-RSA is (t , qs,)-IA-EU-CMA secure, where    Setup: F is given RO(.) and Osk as in Theorem 1 Setup t = O(t)+[RO(.)+ L ·(Expn+ Muln+ H )]·qs.RO(.), H ,   x  Expn and Muln denote the cost of random oracle invocation, Phase. By Algorithm 3, (sk = x, PK = g2 ∈ G2) and H : ∗ G F  A hash function H , modular exponentiation and multiplication {0, 1} → 1 is a RO. gives PK ← (PK , P) to as in under modulo n, respectively. Algorithm 3. A F ∗ Proof: Suppose that A breaks (t, q ,)-EU-CMA secure Queries: queries on m j ∈{0, 1} for s F O SCRA. We construct a simulator F , which breaks j = 1,...,qs. For each query j, queries on (t, q ,)-IA-EU-CMA secure C-RSA by using A as a sub- (1||m ˜ j,1||P,...,L||m ˜ j,L||P) ← RO(m j ||r j ) as in Theorem 1 s L  x G routine as follows: Query Phase and gets s j ← i=1 H (m˜ i, j ) ∈ 1 as in F = Setup: F is given RO(.) and O  as in Theorem 1 Setup Algorithm 3. returns σ j (sj , r j ). sk A ∗ ∗ ∗ ∗ Phase. By Algorithm 2, (sk = n, d , PK  = n, e ) and hash Forgery: outputs a forgery (m ,σ = s , r ) and  ∗ checks EU- CMA experiment winning conditions (i)-(ii) as function used by O is H :{0, 1} → Zn that behaves as a $ in Theorem 1 Forgery Phase. If they hold then F returns a RO. F gives PK ← (PK, P) to A ,whereP ←{0, 1}d as −→ BGLS forgery for IA-EU-CMA experiment as (M ∗, s∗) as in in Algorithm 2. ∗ Theorem 1 Forgery Phase and proceed as follows: Queries: A queries F on m j ∈{0, 1} for ∗ ∗ j = 1,...,qs . For each query j, F queries O on 1) Validity: SCRA.Ver (m ,σ , PK) = 1 implies that that ∗ L  ∗ x G (1||m ˜ j,1||P,...,L||m ˜ j,L ||P) ← RO(m j ||r j ) as in Theorem 1 e(s , g2) = i=1 e(H (i||m ˜ i ||P), g2 ∈ 2) holds. Query Phase and gets s ← L H (i||m ˜ ||P)d mod n as Therefore, BGLS forgery is valid. j i=1 j,i   in Algorithm 2. F returns σ j= (s j , r j ). 2) Non-triviality: F checks if one of these conditions hold. Forgery: A outputs a forgery (m∗,σ∗ = s∗, r ∗ ) and checks ∗ −→∗ (i) At least one data item ( j||m ˜ j ||P) ∈ M has never the EU- CMA experiment winning conditions (i)-(ii) as in been queried to O.Thatis,IA-EU-CMA-secure BGLS is Theorem 1 Forgery Phase. If they hold, then F returns a −→ broken, since by validity condition, there is a signature C RSA IA EU CMA M ∗ s∗ - forgery for - - experiment as ( , ) as in as e(s, g ) = e(H ( j||m ˜ ∗||P), gx ∈ G ), which was not Theorem 1 Forgery Phase and proceeds as follows: 2 j 2 2 obtained from O. (ii) The signature extraction occurs as 1) Validity: SCRA.Ver (m∗,σ∗, PK) = 1 implies defined in Theorem 1 Non-triviality condition (b). That is, ∗ e L  ∗ ∗  ∗ x (s ) = H (i||m ˜ ||P) mod n holds. Therefore, individual signatures e(s j , g2) = e(H ( j||m ˜ j ||P), g2 ∈ i=1 i G C-RSA forgery is valid. 2), j = 1,...,L have never been individually queried   YAVUZ et al.: REAL-TIME DIGITAL SIGNATURES FOR TIME-CRITICAL NETWORKS 2635

−→ to O, but all were part of a batch query k ∈ I M k. TABLE II This implies IA-EU-CMA-secure BGLS is broken by the THE STORAGE SPACE OF SIGNATURE TABLE AND THE NUMBER signature extraction argument as in [10] (see Section II- OF AGGREGATIONS REQUIRED FOR VARIOUS L AND b VALUES C). The non-triviality of signature extraction holds as in Theorem 1. The success probability analysis is as in Theorem 1 and the G probability that H  produces a collision is 1/2| 1|/2. For each query of A , F performs a query to RO(.) and an another query to O, which requires a H  computation, followed by an exponentiation and multiplication in G1 for each item in (1||m ˜ j,1||P,...,L||m ˜ j,L ||P). Hence, the execution time of F is that of A plus t = O(t) +[RO(.) + L · (Expn +  Muln + H )]·qs.  Remark 1: The formal proof of SCRA-NTRUand In addition to their computational efficiency, the SCRA-NTRUPASS follow a similar logic and therefore SCRA schemes are also compact, since the signature will not be repeated here. At the same time, we note and public key sizes remain the same with their base that despite the existence of an A-EU-CMA analysis, signature scheme (the transmission of |r|=κ is negligible). IA-EU-CMA proof and analysis for signature extraction By comparing to each other, SCRA-C-RSA achieves the argument are not currently available for NTRU signatures. lowest end-to-end delay with a moderate signature size (e.g., Hence, a full formal reduction requires this gap to be filled 256 bytes), while SCRA-BGLS offers the smallest signature first, which is out of the scope of this paper. (20 bytes) but the highest end-to-end delay. SCRA-NTRU has the lowest signing delay (0.0018 msec), low end-to-end delay V. P ERFORMANCE ANALYSIS AND COMPARISON but with large signatures (e.g., 1587 bytes). Note that all In this section, we present the performance results of our SCRA schemes require storing a pre-computed table ,which experiments. We first compare the results of the SCRA with introduces a constant-size extra storage overhead at the signer the state-of-the-art algorithms on a modern powerful CPU. side e.g., 160 KB, 2 MB and 12.33 MB for SCRA-BGLS, We then provide results for the GPU implementations of SCRA-C-RSA and SCRA-NTRU respectively. This signer SCRA-C-RSA and SCRA-NTRU as compared to their CPU storage is plausible even for some embedded devices (e.g., counterparts. For the GPU, we used an Nvidia Tesla K40c Raspberry PI 2 [1]), and negligible for vehicular networks. card, which is comprised of 2880 computing cores with Moreover, recall that, unlike offline-online signatures, the 12GB of GDDR5 device memory and 288GB/sec memory signer overhead of SCRA is constant and it does not require bandwidth. Our base system is equipped with an Intel Core i7- to regenerate tokens. 6700K 4.0GHz Quad-Core Processor and 16GB DDR4 2400 The offline stages of the algorithms take fairly minimal MT/s. This infrastructure represents a datacenter setting. We times of 2.45, 8.65 and 12.83 seconds for SCRA-BGLS, also implemented SCRA on a System-on-Chip (SoC). We SCRA-C-RSA and SCRA-NTRU, respectively. The offline stage used an Nvidia Tegra K1 SoC, which has a 4-Plus-1 quad- will only be required to execute once during the system core ARM Cortex A15 CPU with clock rate of 2.3 Ghz deployment. and an embedded GPU with 192 computing cores. Such SoCs represent smaller scale systems that are widely used in A. Space Versus Execution Time IoT deployments. We open sourced the source code for the We have a trade-off between the space taken to store the research and academic community to use and evaluate.1 signatures and the execution time of the signing and the We summarize the results in Table I. Table I also provides verification stages. As described in 1, an d-bit hash output the implementation details, parameters and key/table sizes. can be interpreted as integers ( j ,..., j ), where each j is Table I shows the clear superiority of SCRA in terms of 1 L i a b-bit integer such that b · L = d. The total number of signature generation efficiency and end-to-end cryptographic signatures that need to be calculated and stored in the offline delay (i.e., the sum of signature generation and verification stage of an algorithm is thus L · 2b. The total storage cost times) using a powerful CPU. That is, the signature gen- is thus L · 2b · S where S is the size of one signature. This eration of SCRA instantiations are 24, 18 and 516 times also implies that the number of aggregations to be performed faster than their non-SCRA counterparts for RSA, BGLS, during the online phase increases linearly with L.TableII and NTRU, respectively. This indicates that SCRA is an provides for the SHA-256 hashing scheme various values of ideal choice for a very high-throughput signature generation, (L, b) parameters and corresponding size of the signature especially for resource-limited devices in IoT deployments. table  for each SCRA instantiation. Similarly, SCRA-C-RSA and SCRA-NTRU offer 18 and 7 times One may observe that the smallest storage overhead can lower end-to-end crypto delay compared to RSA and NTRU, be attained with (L = 256, b = 1), wherein we store a respectively, making them ideal choices for time-critical signature for every bit in the hash output domain. However, authentication. this requires L = 256 signature aggregations in the online 1https://github.com/ipapapa/HWAcccelarated-Crypto signature generation phase (e.g., 256 modular multiplications 2636 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 11, NOVEMBER 2017 for SCRA-C-RSA), which may not be computationally effi- cient. Another end of the trade-off is (L = 16, b = 16), wherein only L = 16 online aggregations are required during the online phase. However, this requires substantially larger table sizes, which may be suitable for some real-life appli- cations. We observed that (L = 32, b = 8) offers a highly favorable overall performance/storage performance, as shown in Table I and Table II. The size of the pre-computed tables for SCRA-BGLS, SCRA-C-RSA and SCRA-NTRUis 160 KB, 2 MB and 12.33 MB, respectively, for signature-sizes of 20, 256 and 1578 bytes, respectively. This will require 32 signatures to be aggregated in the online-signing phase of each SCRA scheme. Fig. 2. SCRA-RSA: Time to sign a message on a server. VI. HARDWARE-ACCELERATION OF SCRA To accelerate SCRA, we leveraged the parallel processing and optimization capabilities of GPUs both on server and embedded in the SoCs. We have introduced several optimiza- tions to parallelize the individual steps of SCRA algorithms. We used optimizations specific to the architecture of the GPU to harness the vast amount of available lightweight cores [11].

A. Accelerating SCRA-C-RSA With GPUs - SCRA RSA - Server: In the offline signature stage, for 8192 messages, we achieve x1.3 times more through- put with our GPU optimizations compared to the CPU only implementations. In the online signature stage, we achieve significantly high throughput gains, which can Fig. 3. SCRA-RSA: Time to verify a message on a server. reach up to x5.3 times. In the verify stage, the gain is around x4.2 times. These results are reported in Figures 2 and 3. In terms of execution time, the GPU can process a message in 0.367, 0.022, 0.031 milliseconds for the the message and (p, q) are the primes used. Then, we use offline, online and verify stages of the algorithm, respec- the mixed radix conversion algorithm [22] to combine the tively. This is approximately x1.31, x5, x4 times faster two parts and recover the signature σ as σ = σ2 +[(σ1 − −1 than the corresponding CPU execution times. The GPU σ2).(q mod p)].q These two parts are processed on separate gives a worse performance than the CPU if a very small threads in the GPU, which is significantly faster than the k-bit number of messages are processed. This is mainly due to modular exponentiation. the low clock speeds of the GPU cores as compared to the 2) Montgomery Multiplication: The modular multiplication CPU and also due to the time to copy the data from the is inefficient in the GPUs since it requires a trial division to CPU memory to the GPU memory and vice-versa. Our determine the result and is not parallelizable. The Montgomery experiments show that the online signature and signature multiplication is suitable for implementation in a GPU, since verification stages are executed faster in the GPU than in it does not require a trial division and can be implemented the CPU for message batches greater than 128 and 256, in parallel on separate words of the message. That is, given −1  respectively. a · b mod n, we first find two integers r and n using the −1  - SCRA RSA - SoC: In the offline signature stage, for 8192 Extended Euclidean Algorithm such that rr − nn = 1. messages, we achieve x3.2 times higher throughput with We then transform a = ar mod n and b = br mod n. Later, our GPU optimizations compared to CPU only imple- we compute a ·b mod n by using Montgomery reduction [28]. mentations. In the online signature stage, we achieve high 3) Batch Processing: The crypto operations for multi- throughput gains up to x5.2 times. In the verify stage, the ple messages are performed concurrently in the GPU. This gain is around x4.8 times. These results are reported in requires that a batch of messages be passed to the GPU, instead Figures 4 and 5. of a single message. Below we describe the techniques we adopted to achieve 4) Breakup of Components Into Words: To optimize the some of the performance speedups shown by above throughput on the GPU, each message component is divided experiments. into words of size 32/64 bits, depending on the GPU capa- 1) Chinese Remainder Theorem (CRT): We leverage bilities. Each operation being run on a single thread is run CRT [28] to accelerate SCRA on GPUs. We split a k-bit over words rather than over entire message components. signature σ into two k/2 bit signatures σ1 and σ2. σ1 = We use standard multi-precision algorithms [12] to represent d mod p−1 d mod q−1 M mod p,σ2 = M mod q,whereM is and perform operations between large integers. YAVUZ et al.: REAL-TIME DIGITAL SIGNATURES FOR TIME-CRITICAL NETWORKS 2637

Fig. 4. SCRA-RSA: Time to sign a message on SoC. Fig. 6. SCRA-NTRU: Time to sign a message.

Fig. 5. SCRA-RSA: Time to verify a message on SoC. Fig. 7. SCRA-NTRU: Time to verify a message. 5) GPU Warp Size Utilization: Warps are set of threads (generally 32) that are considered as one single execution unit stage, we achieve high throughput gains up to x6.5 and inside a CUDA block. To gain maximum throughput from the x30 times respectively. These results are reported in GPU, it is necessary to attain the maximum number of active Figures 6 and 7, respectively. warps per streaming multiprocessor which is 64 in our case. The cryptographic operations for multiple messages are We achieve this by adjusting the number of threads per block performed concurrently on the GPU. This requires that to the optimal value. a batch of messages be passed to the GPU, instead of 6) Memory Latency vs GPU Occupancy: Thesizeofthe a single message for the signing and verification stage. shared memory can limit the number of active warps on the We do not employ the GPU for the online stage of GPU at a particular point in time by reducing the occupancy of SCRA-NTRU because the signature aggregation tech- the Streaming Multiprocessors (SM). The other limiting factor nique is computationally expensive and deploying it on in the performance output is the number of reads and writes a GPU core provides little performance benefit. Due to on the global memory on the device. We identified a balance these reasons, the CPU performs better than the GPU between the SM occupancy and the global memory read/write during the online stage of the protocol. latency by testing various permutations of memory allocations - SCRA NTRU - SoC: In the online signature stage, for among the shared and global memory. 1024 messages, we achieve x0.81 times more throughput 7) Constant Length Non-Zero Window Technique: We scan with our GPU optimizations compared to CPU only the bits of the exponent from the least significant bit to implementations. In the verify sign stage and offline the most significant bit. At each step, we compute a zero stage, we achieve high throughput gains upto x6.65 and window or a non-zero window [23]. With the binary square- x17.7 times respectively. These results are reported in and-multiply method, we can process these windows and Figures 8 and 9 respectively. reduce the number of modular multiplications, making the We summarize below the optimizations that have resulted exponentiation algorithm faster. in the performance gains shown by the previous experiments. 1) Batch Processing: Message components are processed in B. Accelerating SCRA-NTRU With GPUs batches as in Section VI-A. As mentioned before, we do not - SCRA NTRU - Server: In the online signature stage, for use the GPU for the online stage of SCRA-NTRU. 4096 messages, we achieve x0.79 times more throughput 2) Convolution Operations: The convolution operations in with our GPU optimizations compared to CPU only the NTRU in the signing and verification phase are accelerated implementations. In the verify signature stage and offline by employing GPUs. The convolution operation between two 2638 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 12, NO. 11, NOVEMBER 2017

ACKNOWLEDGMENT This work was done in part at Robert Bosch LLC Research and Technology Center at North America (CR/RTC3-NA), Pittsburgh, PA, USA, by Attila A. Yavuz during his employ- ment at Bosch. The authors appreciate and gratefully acknowl- edge the donation of a Tesla K40 GPU and the Tegra K1 System on Chip from the NVIDIA Corporation used for the research described in this paper.

REFERENCES [1] Raspberry Pi 2 Specs, accessed on Jun. 22, 2017. [Online]. Available: https://www.raspberrypi.org/products/raspberry-pi-2-model-b/ Fig. 8. SCRA-NTRU: Time to sign a message. [2] IEEE Guide for Wireless Access in Vehicular Environments (WAVE)— Architecture, IEEE Standard 1609.0-2013, Mar. 2014, pp. 1–78. [3] ANSI X9.62-1998: Public Key Cryptography for the Financial Services Industry: The Elliptic Curve Digital Signature Algorithm (ECDSA), Amer. Bankers Assoc., Washington, DC, USA, 1999. [4] M. Bellare and P. Rogaway, “Random oracles are practical: A paradigm for designing efficient protocols,” in Proc. 1st ACM Conf. Comput. Commun. Secur. (CCS), 1993, pp. 62–73. [5] M. Bellare and P. Rogaway, “The exact security of digital signatures- how to sign with RSA and rabin,” in Proc. 15th Int. Conf. Theory Appl. Cryptogr. Techn., 1996, pp. 399–416. [6] J. Benaloh and M. de Mare, “One-way accumulators: A decentralized alternative to digital signatures,” in Proc. Workshop Theory Appl. Cryptogr. Techn., 1994, pp. 274–285. [7] D. Boneh, C. Gentry, B. Lynn, and H. Shacham, “Aggregate and verifiably encrypted signatures from bilinear maps,” in Proc. 22nd Int. Conf. Theory Appl. Cryptogr. Techn. (EUROCRYPT), 2003, pp. 416–432. [8] D. Boneh, B. Lynn, and H. Shacham, “Short signatures from the Weil pairing,” J. Cryptol., vol. 14, no. 4, pp. 297–319, 2004. Fig. 9. SCRA-NTRU: Time to verify a message. [9] D. Catalano, M. D. Raimondo, D. Fiore, and R. Gennaro, “Off-line/on- line signatures: Theoretical aspects and experimental results,” in Proc. Pract. Theory Public Key Cryptogr. (PKC), 2008, pp. 101–120. [10] J. Coron and D. Naccache, “Boneh et al.’s k-element aggregate n bit polynomials is divided into n cores for each operation extraction assumption is equivalent to the Diffie–Hellman assumption,” where each core is responsible for calculating one bit of the in Proc. 9th Int. Conf. Theory Appl. Cryptol. (ASIACRYPT), 2003, resulting polynomial. pp. 392–397. [11] K. Diao, I. Papapanagiotou, and T. J. Hacker, “HARENS: Hardware 3) Fourier Transformations: Implementing the Fourier accelerated redundancy elimination in network systems,” in Proc. IEEE transformation on GPUs further accelerates the signing, verifi- Int. Conf. Cloud Comput. Technol. Sci. (CLOUDCOM), Dec. 2016, cation and offline stages. Due to the use of faster convolution pp. 237–244. [12] E. K. Donald, “The art of computer programming,” in Sorting and and Fourier transformation operations on GPUs, the verify Searching, vol. 3. Redwood City, CA, USA: Addison Wesley, 1999, stage of the protocol on GPUs is significantly faster than on pp. 426–458. CPUs. [13] L. Ducas and P. Q. Nguyen, “Learning a zonotope and more: Crypt- analysis of NTRU sign countermeasures,” in Advances in Cryptology— ASIACRYPT (Lecture Notes in Computer Science), vol. 7658. Berlin, Germany: Springer, 2012, pp. 433–450. VII. CONCLUSION [14] N. P. Smart et al., “Algorithms, key size and parameters report,” Eur. Union Agency Netw. Inf. Secur. (ENISA), Heraklion, Greece, Tech. In this paper, we developed a new series of delay-aware dig- Rep. TP-05-14-084-EN-N, Nov. 2014. ital signatures for time-critical applications, which we refer to [15] R. El Bansarkhani and J. Buchmann, “Towards lattice based aggregate as Structure-Free Compact Authentication (SCRA). SCRA can signatures,” in Proc. Int. Conf. Cryptol. Africa, 2014, pp. 336–355. [16] X. Fan and G. Gong, “Accelerating signature-based broadcast authen- transform any secure aggregate signature into a signer efficient tication for wireless sensor networks,” Ad Hoc Netw., vol. 10, no. 4, signature via a novel constant-size pre-computation strategy. pp. 723–736, Jun. 2012. [17] D. Hankerson, A. Menezes, and S. Vanstone, Guide to Elliptic Curve We proposed several instantiations of SCRA schemes based on Cryptography. New York, NY, USA: Springer-Verlag, 2004. Condensed-RSA, BGLS, and NTRU signatures, each offering [18] J. Hoffstein, J. Pipher, J. M. Schanck, J. H. Silverman, and W. Whyte, a unique computation time, key and signature size trade-offs. “Practical signatures from the partial Fourier recovery problem,” in Proc. Int. Conf. Appl. Cryptogr. Netw. Secur., 2014, pp. 476–493. Our implementations and performance comparison with the [19] J.-S. Coron, “On the exact security of full domain hash,” in Proc. Annu. existing alternatives show that the SCRA schemes achieve Int. Cryptol. Conf. (CRYPTO), 2000, pp. 229–235. significantly faster signature generation and lower end-to- [20] R. Johnson, D. Molnar, D. X. Song, and D. Wagner, “Homomorphic signature schemes,” in Proc. CT-RSA, 2002, pp. 244–262. end delay. We also formally proved that SCRA schemes are [21] A. Joux and K. Nguyen, “Separating decision Diffie–Hellman from secure (in ROM). Finally, we pushed the performance of computational Diffie–Hellman in cryptographic groups,” J. Cryptol., vol. 16, no. 4, pp. 239–247, 2003. SCRA schemes to their edge by fully implementing them on [22] C. K. Koc, “High-speed RSA implementation,” RSA Lab., Bedford, MA, server-grade GPUs and SoCs, which indicated significant per- USA, Tech. Rep. TR 201, 1994. formance gains. All these properties make the SCRA schemes [23] C. K. Koç, “Analysis of sliding window techniques for exponentiation,” Comput. Math. Appl., vol. 30, no. 10, pp. 17–24, 1995. a suitable alternative for delay-aware authentication for time- [24] B. Lynn. The Pairing-Based Cryptography (PBC) Library, accessed on critical applications. Jun. 22, 2017. [Online] Available: http://crypto.stanford.edu/pbc YAVUZ et al.: REAL-TIME DIGITAL SIGNATURES FOR TIME-CRITICAL NETWORKS 2639

[25] A. Lysyanskaya, R. Tamassia, and N. Triandopoulos, “Multicast authen- Anand Mudgerikar received the bachelor’s degree tication in fully adversarial networks,” in Proc. IEEE Symp. Secur. in information and communication technology from Privacy, May 2004, pp. 241–253. the Dhirubhai Ambani Institute of Information and [26] D. Ma and G. Tsudik, “A new approach to secure logging,” ACM Trans. Communication Technology, India, and the mas- Storage, vol. 5, no. 1, p. 2, 2009. ter’s degree in information security from CERIAS, [27] C. A. Melchor, X. Boyen, J.-C. Deneuville, and P. Gaborit, “Sealing Purdue University, West Lafayette, IN, USA, where the leak on classical NTRU signatures,” in Post-Quantum Cryptogra- he is currently pursuing the Ph.D. degree in informa- phy (Lecture Notes in Computer Science), vol. 8772, M. Mosca, Ed. tion security with the Computer Science Department. Springer, 2014, pp. 1–21. His current research interests include cryptography, [28] A. J. Menezes, P. C. van Oorschot, and S. A. Vanstone, Handbook of intrusion detection systems, and network security. Applied Cryptography. Boca Raton, FL, USA: CRC Press, 1996, [29] E. Mykletun, M. Narasimha, and G. Tsudik, “Signature bouquets: Immutability for aggregated/condensed signatures,” in Proc. Eur. Symp. Res. Comput. Secur., Sep. 2004, pp. 160–176. [30] E. Mykletun and G. Tsudik, “Aggregation queries in the database-as-a- service model,” in Proc. IFIP Annu. Conf. Data Appl. Secur. Privacy, 2006, pp. 89–103. [31] D. Naccache, D. M’Raïhi, S. Vaudenay, and D. Raphaeli, “Can D.S.A. be improved: Complexity trade-offs with the digital signature standard,” in Proc. Workshop Theory Appl. Cryptogr. Techn., 1994, pp. 77–85. [32] A. Perrig, R. Canetti, D. Song, and J. D. Tygar, “Efficient authentication Ankush Singla is currently pursuing the Ph.D. degree in information and signing of multicast streams over lossy channels,” in Proc. IEEE security and assurance with the Computer Science Department, Purdue Symp. Secur. Privacy, May 2000, pp. 56–73. University. He was involved in projects ranging from Hardware Accelerated [33] J. Petit and Z. Mammeri, “Authentication and consensus overhead Authentication and Centralized Lighting management for Energy Saving. His in vehicular ad hoc networks,” Telecommun. Syst., vol. 52, no. 4, current research interests include usage for Internet of Things pp. 2699–2712, 2013. authentication and certificateless cryptography. [34] L. Reyzin and N. Reyzin, “Better than BiBa: Short one-time signatures with fast signing and verifying,” in Proc. 7th Austral. Conf. Inf. Secur. Privacy (ACIPS), 2002, pp. 144–153. [35] R. L. Rivest, A. Shamir, and L. Adleman, “A method for obtaining digital signatures and public-key ,” Commun. ACM, vol. 21, no. 2, pp. 120–126, Feb. 1978. [36] M. Rückert, “Lattice-based signature schemes with additional features,” Ph.D. dissertation, Dept. Comput. Sci., Technische Univ. Darmstadt, Darmstadt, Germany, 2010. Ioannis Papapanagiotou (SM’15) received the dual [37] A. Shamir and Y. Tauman, “Improved online/offline signature schemes,” major Ph.D. degree in computer engineering and in Proc. 21st Annu. Int. Cryptol. Conf., 2001, pp. 355–367. operations research from North Carolina State Uni- [38] Shamus. Multiprecision Integer and Rational Arithmetic C/C++ versity. He has served in the faculty ranks of Purdue Library (MIRACL), accessed on Sep. 2014. [Online]. Available: University (tenure-track) and North Carolina State http://www.certivox.com/miracl/miracl-download/ University. From 2010 to 2013, he was with IBM’s [39] A. Singla, A. Mudgerikar, I. Papapanagiotou, and A. A. Yavuz, “HAA: CTO Office. He is currently an Architect at Netflix Hardware-accelerated authentication for Internet of Things in mis- Inc., a Research Assistant Professor with the Univer- sion critical vehicular networks,” in Proc. IEEE Int. Conf. Military sity of New Mexico, and a Graduate Faculty with Commun. (MILCOM), Oct. 2015, pp. 1–7. Purdue University. He has authored approximately [40] W. Whyte, M. Etzel, and P. Jenney. (2013). Open Source NTRU Public 40 research articles and ten patent disclosures. He is Key Cryptography Algorithm and Reference Code. [Online]. Available: a Senior Member of ACM. He has also served as a TPC Chair in a number of https://github.com/NTRUOpenSourceProject/ntrucrypto IEEE conferences. He has been awarded the NetApp Faculty Fellowship and [41] A. A. Yavuz, “An efficient real-time broadcast authentication scheme for established an Nvidia CUDA Research Center, Purdue University. He has also command and control messages,” IEEE Trans. Inf. Forensics Security, received the IBM Ph.D. Fellowship, the Academy of Athens Ph.D. Fellowship vol. 9, no. 10, pp. 1733–1742, Oct. 2014. for his Ph.D. research, and best paper awards in several IEEE conferences for [42] A. A. Yavuz, “Immutable authentication and integrity schemes for out- his academic contributions. sourced databases,” IEEE Trans. Depend. Sec. Comput., to be published. [43] A. A. Yavuz, P. Ning, and M. K. Reiter, “BAF and FI-BAF: Efficient and publicly verifiable cryptographic schemes for secure logging in resource- constrained systems,” ACM Trans. Inf. Syst. Secur., vol. 15, no. 2, p. 9, 2012.

Attila Altay Yavuz (M’11) received the M.S. degree Elisa Bertino (F’02) is currently a Professor of in computer science from Bogazici University, Computer Science with Purdue University, and Istanbul, Turkey, in 2006, and the Ph.D. degree serves as the Director of the CyberSpace Security in computer science from North Carolina State Laboratory (Cyber2SLab). Prior to joining Purdue University in 2011. He is currently an Assistant University in 2004, she was a Professor and the Professor with the School of Electrical Engineering Department Head with the Department of Computer and Computer Science, Oregon State University. He Science and Communication, University of Milan. is broadly interested in design, analysis, and applica- She has been a Visiting Researcher with the IBM tion of cryptographic tools and protocols to enhance Research Laboratory (currently Almaden), San Jose, the security of computer networks and systems. with the Microelectronics and Computer Technol- He has authored over 40 research articles in top ogy Corporation, with Rutgers University, and with conferences and journals along with several patents. His research on privacy Telcordia Technologies. Her recent research focuses on database security, enhancing technologies (searchable ) and intra-vehicular network digital identity management, policy systems, and security for web services. security are in the process of technology transfer with potential world-wide She is a fellow of ACM and AAAS. She received the IEEE Computer deployments. He is a member of ACM. He was a member of the Security Society 2002 Technical Achievement Award, the IEEE Computer Society and Privacy Research Group with the Robert Bosch Research and Technology 2005 Kanai Award, and the ACM SIGSAC Outstanding Contributions Award. Center, North America (2011–2014). He was a recipient of the NSF CAREER She is currently serving as Editor-in-Chief of IEEE TRANSACTIONS ON Award (2017). DEPENDABLE AND SECURE COMPUTING.