A Tutorial on the Implementation of Block Ciphers: Software and Hardware Applications

A Tutorial on the Implementation of Block Ciphers: Software and Hardware Applications Howard M. Heys Memorial University of Newfoundland, St. John's, Canada email: [email protected] Dec. 10, 2020 2 Abstract In this article, we discuss basic strategies that can be used to implement block ciphers in both software and hardware environments. As models for discussion, we use substitution- permutation networks which form the basis for many practical block cipher structures. For software implementation, we discuss approaches such as table lookups and bit-slicing, while for hardware implementation, we examine a broad range of architectures from high speed structures like pipelining, to compact structures based on serialization. To illustrate different implementation concepts, we present example data associated with specific methods and discuss sample designs that can be employed to realize different implementation strategies. We expect that the article will be of particular interest to researchers, scientists, and engineers that are new to the field of cryptographic implementation. 3 4 Terminology and Notation Abbreviation Definition SPN substitution-permutation network IoT Internet of Things AES Advanced Encryption Standard ECB electronic codebook mode CBC cipher block chaining mode CTR counter mode CMOS complementary metal-oxide semiconductor ASIC application-specific integrated circuit FPGA field-programmable gate array Table 1: Abbreviations Used in Article 5 6 Variable Definition B plaintext/ciphertext block size (also, size of cipher state) κ number of cipher key bits (also, size of key state) R number of rounds n S-box size (number of input/output bits) P = [pB−1:::p1p0] plaintext block of B bits C = [cB−1:::c1c0] ciphertext block of B bits K = [kκ−1:::k1k0] κ-bit cipher key D = [dB−1:::d1d0] B-bit cipher state ∗ ∗ ∗ ∗ D = [dB−1:::d1d0] B-bit next cipher state X = [xB−1:::x1x0] B-bit cipher state at input to substitution layer and output of key mixing layer Y = [yB−1:::y1y0] B-bit cipher state at output of substitution layer and input to permutation layer Z = [zB−1:::z1z0] B-bit cipher state at output of permutation layer and input to key mixing layer RKr = [rkr;B−1:::rkr;1rkr;0] B-bit round key of round r 0 0 0 0 K = [kκ−1:::k1k0] κ-bit key state y y y y T P = [PB−1:::P1 P0 ] bit-slice plaintext structure: B × B matrix of bits consisting of B rows of y B-bit plaintext blocks, Pi ; i 2 f0; 1; :::; B − 1g y y pi;j element at row i, column j of P , i; j 2 f0; 1; :::; B − 1g y y y y T C = [CB−1:::C1C0] bit-slice ciphertext structure: B × B matrix of bits consisting of B rows of y B-bit ciphertext blocks, Ci ; i 2 f0; 1; :::; B − 1g y y y y T D = [DB−1:::D1D0] bit-slice cipher state: B × B matrix of bits consisting of B columns of y B-bit cipher states, Di ; i 2 f0; 1; :::; B − 1g y y di;j element at row i, column j of D , i; j 2 f0; 1; :::; B − 1g Table 2: Notation Used in Article Contents 1 Introduction to Block Ciphers 9 1.1 Target Implementation Environments . .9 1.2 Basic Cipher Principles . 11 1.3 Substitution Permutation Networks . 12 1.3.1 16-bit SPN . 12 1.3.2 64-bit SPN . 17 1.4 Modes of Operation . 18 1.4.1 Electronic CodeBook Mode . 19 1.4.2 Cipher Block Chaining Mode . 20 1.4.3 Counter Mode . 20 1.5 General Implementation Structures . 21 1.6 Summary . 23 2 Software Implementation 25 2.1 Structure of Encryption . 25 2.2 Structure of Decryption . 26 2.3 Direct Implementation of an SPN . 27 2.3.1 Key Mixing Layer Implementation . 28 2.3.2 Substitution Layer Implementation . 28 2.3.3 Permutation Layer Implementation . 28 2.4 Table Lookup Implementations . 29 2.4.1 Permutation in a Table . 29 2.4.2 Combined Substitution/Permutation Table . 30 2.5 Time/Memory Tradeoffs of Lookup Tables . 33 2.6 Bit-slice Implementations . 35 2.6.1 Bit-slicing the S-box . 36 2.6.2 Restructuring the Data for Bit-Slicing . 39 2.6.3 Bit-slicing the Permutation and Key Mixing . 41 2.6.4 Example of Bit-Slicing . 42 2.6.5 Other Structures for Bit-slicing . 45 2.7 Software Implementation of Cipher Modes . 46 2.8 Summary . 47 7 8 CONTENTS 3 Hardware Implementation 49 3.1 Iterative Design . 50 3.1.1 Basic Iterative Architecture . 50 3.1.2 Round Function Hardware . 52 3.1.3 Loop Unrolling . 53 3.2 Parallelization - A High Speed Implementation . 55 3.3 Pipelining - A High Speed Implementation . 56 3.3.1 Description of Pipeline Architecture . 56 3.3.2 Selecting Hardware for a Pipeline Stage . 58 3.3.3 Timing Issues for Pipeline Designs . 58 3.3.4 Example of Pipelining . 59 3.4 Serial Design - A Compact Implementation . 60 3.4.1 The Concept of Serialization . 61 3.4.2 A Sample Design . 61 3.4.3 Selectable Register . 63 3.5 Hardware Implementation of Decryption . 65 3.6 Hardware Implementation of Cipher Modes . 66 3.7 Tradeoffs Between Architectures . 66 3.8 Summary . 69 4 Conclusion 71 References 73 Appendix: Key Scheduling 77 Chapter 1 Introduction to Block Ciphers Block ciphers are the workhorse of security applications. Using a symmetric key approach, block cipher algorithms encrypt a block of plaintext bits (typically, 64 or 128 bits) to produce an equally-sized block of ciphertext bits. These algorithms can be found in a broad range of application environments and it is almost certainly true that the best known block cipher, the Advanced Encryption Standard (AES) [1][2], is the most applied cipher in the world today. 1.1 Target Implementation Environments Implementations of block ciphers can be divided into two broad categories: software and hardware. Software implementations make use of the instructions available on general purpose processors, while hardware implementations focus on designs at the level of logic gates and registers. Of course, within these two categories, there are many sub-categories dependent on the processor characteristics for software (eg. instruction set, word size, memory availability, etc.) and the technology for hardware (eg. CMOS application-specific integrated circuits (ASICs) and field-programmable gate arrays (FPGAs)). Implementation choices are invariably influenced by the objectives of the application and the features of the targeted environment. This results in the consideration of factors such as speed and resource constraints including chip area, system memory, and device power or energy limitations. As examples of the diversity of target environments, consider that block ciphers could be implemented for the following 3 sample scenarios: 1. a general purpose computer using a processor with a word size of 64-bits (that is, instructions use 64-bit operands and data is stored as 64-bit words), 2. a device targeted to an application for the Internet of Things (IoT), making use of either (a) a small processor (eg. a microcontroller) or (b) custom dedicated hardware of limited area, and 3. a high speed communications gateway targeted to service many high speed data connections. 9 10 CHAPTER 1. INTRODUCTION TO BLOCK CIPHERS Now, consider briefly for these different environments, the resources needed and the influences of those resource requirements on the implementation of the block cipher. For the 64- bit general purpose computer, there is likely to be lots of memory and a comprehensive instruction set with a large word size for operands and in such implementations probably speed of encryption (or throughput) is of most interest. In contrast, the implementation targeted to the IoT device would be fundamentally concerned with the constrained resources such as memory and word size for a small processor or circuit area and power/energy limitations for a hardware implementation. Hence, for the IoT device, it is necessary to consider lightweight cryptography [3] and, for the applications to which such a device is targeted, speed of the encryption process is usually not an important consideration. Lastly, for the high speed communications gateway, speed of encryption is so important that it is often desirable to utilize custom designed hardware implementations that give the ability to process many different connections at high speeds. Clearly, there are a breadth of potential applications and, as a result, there are many different considerations when implementing ciphers. In many scenarios, different types of implementations could be employed within the same application. For example, an IoT device might interact with a software application on a general purpose computer. In some circumstances, specially-targeted ciphers are proposed that are intended specifically for one or more these environments. For example, many recently proposed block cipher algorithms are considered lightweight, targeted specifically to implementation on constrained devices. In such cipher proposals, implementation approaches are often discussed by the cipher proponents. In practice, however, it is often desirable to have a cipher be applied to many different environments, which may vary dramatically in the nature of their requirements. This is case, for example, for AES, which can be found in all manner of targeted environments from high speed software to compact hardware architectures. The near-ubiquity of AES has lead to processors with special instructions to facilitate encryption speedup [4] and the development of specialized hardware structures suitable for the specific components defined for AES (for example, see [5]). In this article, we choose to focus our discussions on the general principles that can be applied to the implementations of ciphers on many different platforms. We do not get into details which are targeted to one particular cipher and, instead, describe approaches that can generally be applied to the majority of proposed block ciphers.

A Tutorial on the Implementation of Block Ciphers: Software and Hardware Applications

Chapter 3 – Block Ciphers and the Data Encryption Standard

Impossible Differentials in Twofish

Block Ciphers and the Data Encryption Standard

Simple Substitution and Caesar Ciphers

KLEIN: a New Family of Lightweight Block Ciphers

On the NIST Lightweight Cryptography Standardization

On the Decorrelated Fast Cipher (DFC) and Its Theory

Development of the Advanced Encryption Standard

Chapter 3 – Block Ciphers and the Data Encryption Standard

Optimized Threshold Implementations: Securing Cryptographic Accelerators for Low-Energy and Low-Latency Applications

A Block Cipher Algorithm to Enhance the Avalanche Effect Using Dynamic Key- Dependent S-Box and Genetic Operations 1Balajee Maram and 2J.M

State of the Art in Lightweight Symmetric Cryptography