Sharing Is Caring—On the Protection of Arithmetic Logic Units Against Passive Physical Attacks

Sharing is Caring—On the Protection of Arithmetic Logic Units against Passive Physical Attacks Hannes Gross Institute for Applied Information Processing and Communications (IAIK), Graz University of Technology, Inffeldgasse 16a, 8010 Graz, Austria [email protected] Abstract. Embedded systems are often used in security-critical scenarios where physical access of an adversary cannot be prevented. An attacker with unrestricted physical access to an embedded device could thus use observation-based attacks like power analysis or chip probing techniques to extract chip-internal secrets. In this work, we investigate how to counteract first-order passive physical attacks on an embedded microcontroller. In particular, we focus on the protection of the central point of data processing in the microcontroller design—the arithmetic logic unit (ALU)—with the provably secure threshold implementation (TI) masking scheme. Our results show that the amount of required fresh random bits—a problem many masked implementations suffer from—can be reduced to only one bit per ALU access and clock cycle. The total chip area overhead for implementing the whole microcontroller of our case study as a three-share TI is about a factor of 2.8. Keywords: secure ALU, threshold implementation, masking, side-channel analysis, DPA, chip probing 1 Introduction Throughout the last decades, our understanding of the term “security” became much broader. In the beginning of cryptographic research almost the whole security prob- lematic was somehow narrowed down on finding mathematical problems that are hard to solve without the knowledge of some secret information or trapdoor function. As it turned out, physical systems are usually easier to attack through the back door than by attacking the mathematical approach that protects the front door. In particular, when an attacker has unrestricted access to a device—for example, because she is the holder of the device or the device is operated in an area that can hardly be secured—physical attacks become a serious threat. An unprotected hardware implementation reveals its secrets through, e.g., its power consumption [8], the elec- tromagnetic emanation [13], or an attacker could even use needles to eavesdrop on the chip internal data exchange [6]. Protecting cryptographic hardware against physical attacks is now for more than 15 years an ongoing research topic. Many different masking schemes were proposed, but because of the possible occurrence of glitches caused by the combinatorial logic in hardware, the security of the masking schemes were jeopardized. Since the introduction of threshold implementations (TI) by Nikova et al. [10] in 2006, a big step towards provable protection of hardware implementations was taken. However, until now this approach was only used to protect cryptographic hardware implementations against passive physical attacks. In order to protect a whole system against these observation-based attacks, all com- ponents of the system that touch the assets of the system need to be considered. In the case of an embedded system, the data is usually processed by a microcontroller. More precisely, every information that is processed has to pass the arithmetic logic unit (ALU) of the microcontroller. The ALU is thus the most attractive and also the most vulnerable point to be attacked by an adversary that can physically observe a device. It is therefore crucial for the security of such a system to have an ALU protected against these kind of attacks. Our contribution In this work, we show how a general purpose ALU design can be secured against first-order passive physical attacks. Therefore, the functionality of a typical ALU is considered and its functionality is transformed step-by-step in order to fulfill the requirements of the threshold implementation scheme. Considering each function separately is of course not enough. By carefully bringing these functions to- gether, the chip area and the number of required fresh random bits per clock cycle can be reduced. The final ALU TI design requires about 1.1 kGE of chip area and only one random bit per clock cycle. The protected ALU is then integrated in a case study microcontroller and the required changes to the microcontroller design are discussed. Finally, the protected and unprotected hardware implementations are compared on three different levels: (1) on gate level—by comparing standard cell library gates to the TI of the gates required by the functionality of the ALU—, (2) the resulting costs of the stand- alone ALU TI are considered in relation to the unshared ALU, and (3) the comparison is done for the case study microcontroller. It is shown that the area overhead in the latter case is about a factor of 2.8, and the power consumption is increased by a factor of 2.3. The rest of the paper is organized as follows: In Section 2, an overview of the threshold implementation scheme is given. In Section 3, the considered attack scenario is ex- plained briefly. Afterwards, in Section 4, a look into the functionality provided by a general purpose ALU is given, and the functionality is then transformed in accordance with the requirements of the threshold implementation scheme. In a case study (Sec- tion 5), the integration of the protected ALU into a microcontroller design is described before the results are discussed in Section 6. Closing remarks are given in Section 7. 2 Threshold Implementations The threshold implementation (TI) masking scheme was introduced by Nikova et al. in 2006 [10]. In contrast to many existing masking schemes, TIs are provably secure against first-order passive physical attacks—like differential power analysis attacks or chip probing—even in the presence of glitches. In the past, the TI scheme was mainly used to protect cryptographic circuits in order to make them side-channel resistant (e.g. Keccak [2], AES [9], Present [12]). The basis of TI is a technique well known from 2 Evaluate and Update A Az' A 7 y 0 7 0 7 x 0 F3 A 6 0 x F1 ? r A B 1 B ' y B 7 y 0 7 z 0 x_0 B 7 x 0 x F2 F2 F(x, y)_0 6 0 y_0 By ? r2 Cy C ' C C 7 0 7 z 0 x 7 x 0 C F1 y F3 Fig. 1. Example for a shared function Fig. 2. Resharing of a nonuniform function multi party computations named secret sharing which breaks the data dependency of the computation. As it was shown by Barak et al. [1] a calculation can never be totally independent from its underlying data, but the complexity—the order of the attack—can be increased. In the subsequent sections the term sharing and threshold implementations are used interchangeably. The secret sharing principle is illustrated in Figure 1 for a bivariate function based on the TI scheme with three shares. Before the shared function F is applied to the two variables, the variables them- selves need to be shared. In this paper we use the same notations as Bilgin et al. [2]. Accordingly, a variable x is shared with three shares A, B, and C so that Equation 1 is valid at any time. x = Ax ⊕ Bx ⊕ Cx (1) In order to have a shared implementation that is called a valid TI, the sharing must fulfill the following three requirements: correctness, non-completeness, and uniformity. A sharing of a function F is said to be correct if the sum of the output shares equals F applied to the sum of the input shares (see Equation 2). F(Ax⊕Bx⊕Cx) , F 1(Bx;Cx) ⊕ F 2(Ax;Cx) ⊕ F 3(Ax;Bx) (2) The non-completeness property requires that each of the component functions of a function F is independent from at least one input share like it is shown in Figure 1 and Equation 2. Fulfilling the uniformity property is usually the hardest task, because a uniform sharing of F requires that under the assumption of a uniformly distributed input sharing, the resulting output shares of the component functions are again uniformly distributed. Even for a simple boolean function, like AND or OR, no uniform sharing exists for a three-share TI as it was shown by Nikova et al. In this case, additional random bits are required to recover the uniformity of the output shares. As an alternative, a higher amount of shares could be used, but this has the drawback of a higher resource footprint. However, the requirement of many high-quality fresh random bits with a high entropy at each clock cycle is also not quite practical. In 2013, Bilgin et al. [2] showed that the majority of the required random bits for the Keccak TI can be taken from independent shares by using Lemma 1. Lemma 1. (Simplified version from [2]). Let (A, B, C) be n-bit shares (not necessarily uniform). Let (D, E, F) be uniform n-bit shares statistically independent of (A, B, C). Then, (A+D, B+E, C+F) are uniform n-bit shares. 3 Figure 2 shows how the uniformity of a nonuniformly shared function F—where each of the component functions are not uniform—can be repaired according to Lemma 1, when F is a bit-wise function. Since each of the bits of the shares A, B, and C are independent from the bits with a different index, the bits from the input shares of F can be used to reshare the output of the component functions of F. This is done by adding uniformly distributed and statistically independent bits—from the input shares and the two random bits—to the resulting output shares. The random bits are necessary to avoid dependencies between the bits.

Sharing Is Caring—On the Protection of Arithmetic Logic Units Against Passive Physical Attacks

Binary Counter

Cross Architectural Power Modelling

Experiment No

1 DIGITAL COUNTER and APPLICATIONS a Digital Counter Is

Design and Implementation of High Speed Counters Using “MUX Based Full Adder (MFA)”

Hardware-Assisted Rootkits: Abusing Performance Counters on the ARM and X86 Architectures

Central Processing Unit and Microprocessor Video

The Central Processing Unit (CPU)

Processor Hardware Counter Statistics As a First-Class System Resource

Designing Digital Sequential Logic Circuits © N

Tutorial on Adder and Subtractor Logic Circuits Digital Adder: 1. Half Adder 2. Full Adder. Half Adder- Full Adder

Arithmetic Logic Unit Architectures with Dynamically Defined Precision