<<

This document is downloaded from DR‑NTU (https://dr.ntu.edu.sg) Nanyang Technological University, Singapore.

Cryptanalysis and design of lightweight symmetric‑

Sim, Siang Meng

2018

Sim, S. M. (2018). and design of lightweight symmetric‑key cryptography. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/74160 https://doi.org/10.32657/10356/74160

Downloaded on 26 Sep 2021 17:05:46 SGT 2017 SIM SIANG MENG CRYPTOGRAPHY School of Physical and Mathematical Sciences LIGHTWEIGHT SYMMETRIC-KEY CRYPTANALYSIS AND DESIGN OF

CRYPTANALYSIS AND DESIGN OF LIGHTWEIGHT SYMMETRIC-KEY CRYPTOGRAPHY SIM SIANG MENG 2017 2017 Doctor of Philosophy SIM SIANG MENG CRYPTOGRAPHY School of Physical and Mathematical Sciences LIGHTWEIGHT SYMMETRIC-KEY CRYPTANALYSIS AND DESIGN OF in partial fulfilment of the requirement for the degree of A thesis submitted to the Nanyang Technological University

SIM SIANG MENG To my wife. Acknowledgements

First of all, I would like to express my deepest gratitude to my supervisor Prof Thomas Peyrin for his great guidance and mentoring. His insightful thoughts, endless research ideas and passion in symmetric-key cryptography is a true inspiration. There is not a single day that I am bored of doing cryptographic researches. Despite his busy schedule, he will always try his best to make time for discussion even at very short notice. I would also like to thank him for his unconditional support for me to attend conferences, workshops, summer school and overseas attachment. He is a great supervisor, mentor, friend and tennis rival. I would not have started my doctoral research in NTU if not for Prof KHOO Khoongming, supervisor for my undergraduate final year project (FYP). His guidance and enthusiasm in cryptography motivated me to pursue my doctoral research. It is also him who had introduced me to Thomas. I would like to thank Prof WANG Huaxiong for being my FYP co- supervisor and giving valuable advices; and Prof WU Hongjun for introducing me to cryptography during my undergraduate and guiding me through my first undergraduate research work on cryptography. Many thanks to Yu Sasaki for several reasons. During his visit to NTU, he initiated countless discussions and research topics, where I learnt and benefited a lot from his vast knowledge and experiences. Greatly appreciate his help for arranging my overseas attachment and mentoring me at Nippon Telegraph and Telephone Corporation (NTT). My stay in Japan was a pleasant experience thanks to him being a great mentor, host and translator. I am grateful to have GUO Jian, WANG Lei, Jérémy Jean, Ivica Nikolić, LIU Meicheng, Pierre Karpman, Sumit Kumar Pandey, Mohona Ghosh, Mustafa Khairallah, WANG Haoyang, Vesselin Velichkov, Zakaria Najm and many other past and researchers sharing lab SPMS-MAS-04-01 as labmates. Discussing research works, going for lunch together, sharing and learning each other’s home culture are priceless memories for me. I am delighted to meet Kan Yasuda, Yosuke Todo, Akinori Hosoyamada and many other people in NTT during my overseas attachment in September 2016. I truly enjoyed the experience and looking forward to visit NTT again in the future. I would like to thank all my co-authors, Ralph Ankele, Subhadeep Banik, Christof Beierle, Avik Chakraborti, GUO Jian, Jérémy Jean, KHOO Khoong- ming, Stefan Kölbl, Gregor Leander, Eugene Lee, Eik List, LIU Meicheng, Florian Mendel, Amir Moradi, Ivica Nikolić, Frédérique Oggier, Sumit Kumar

i ii

Pandey, Thomas Peyrin, QIAO Kexin, Sumanta Sarkar, Yu Sasaki, Pascal Sasdrich, Jacob Teo, Yosuke Todo, Dylan Toh, Jade Tourteaux, WANG Gaoli, WANG Lei and ZHANG Guoyan. I have learnt a lot from them through working and collaboration with them. My gratitude to my parents and my brother for their support and believing in me. Sincere thanks to my wonderful wife for her understanding and undivided love. Last but not least, special thanks to the Singapore National Research Foundation Fellowship 2012 (NRF-NRFF2012-06) for partially funding my doctoral research.

SIM Siang Meng March, 2018 Contents

Acknowledgementsi

Abstract vii

List of Works ix

List of Symbols xi

List of Abbreviations xiii

Background1

1 Introduction3 1.1 Cryptology ...... 3 1.1.1 What is cryptology? ...... 3 1.1.2 Symmetric-key and public-key cryptography..... 4 1.1.3 Types of security analysis ...... 5 1.2 Symmetric-key Cryptography ...... 6 1.2.1 Overview of block ciphers ...... 6 1.2.2 Cryptanalysis of algorithms...... 8 1.2.3 Other symmetric-key cryptographic primitives . . . . 12 1.3 Lightweight Cryptography...... 15 1.3.1 Brief history on conventional cryptography ...... 16 1.3.2 Rise of lightweight cryptography...... 17 1.4 About this Thesis...... 19 1.4.1 Overview ...... 19 1.4.2 Organisation of this thesis...... 20

2 Preliminaries 23 2.1 Mathematical Background...... 23 2.1.1 Matrix notation and properties ...... 23 2.1.2 Finite field notation and properties...... 24 2.1.3 Vector subspace notation...... 25 2.1.4 Permutation notation...... 25 2.1.5 Probability: application to . . . . 26 2.2 Components of SPN Cipher ...... 26 2.2.1 Substitution layer...... 27

iii iv CONTENTS

2.2.2 Permutation layer...... 31 2.3 Cryptanalysis Techniques...... 34 2.3.1 Differential cryptanalysis...... 34 2.3.2 Linear cryptanalysis ...... 36 2.3.3 Other cryptanalysis techniques ...... 37

Cryptanalysis of Symmetric-key Cryptography 41

3 Practical Differential Attack on JAMBU 43 3.1 The JAMBU Authenticated Encryption Scheme ...... 44 3.1.1 Description of JAMBU ...... 44 3.1.2 Security claims ...... 45 3.2 Attack on JAMBU in Nonce-Misuse Scenario...... 46 3.2.1 Attack overview...... 47 3.2.2 ...... 47 3.2.3 Extension to a plaintext-recovery attack ...... 51 3.2.4 Discussion on trivial attacks...... 52 3.3 Attack on JAMBU in Nonce-Respecting Scenario ...... 53 3.3.1 Distinguishing attack...... 54 3.3.2 Extension to a plaintext-recovery attack ...... 55 3.3.3 Discussion on trivial attacks...... 56 3.4 Implementation of the Attack...... 56 3.4.1 Results of the attack...... 56 3.4.2 Running time of the attack...... 58 3.5 Conclusion...... 58

4 Invariant Subspace Attack on Midori64 59 4.1 Description of Midori ...... 60 4.2 Invariant Subspace Attack on Midori64 ...... 62 4.2.1 Distinguisher with invariant subspace attack . . . . . 63 4.2.2 Key-recovery with invariant subspace attack . . . . . 65 4.3 Extended Analysis: Weaker Constant...... 66 4.4 Concluding Remarks...... 68

Diffusion and Substitution Layers 69

5 Lightweight MDS Diffusion Matrices 71 5.1 Introduction...... 72 5.1.1 Motivation...... 72 5.1.2 Matrix types ...... 72 5.2 Matrix Properties...... 74 5.2.1 Properties of Hadamard matrices...... 74 5.2.2 Properties of cyclic matrices...... 75 5.3 Compact Equivalence Classes of Matrices...... 79 CONTENTS v

5.3.1 CEC of Hadamard matrices...... 80 5.3.2 CEC of cyclic matrices...... 82 5.4 Search and Results...... 84 5.4.1 Search methodology ...... 84 5.4.2 Survey of lightweight (I)MDS matrices ...... 86 5.5 Summary ...... 90

6 Security of S-boxes 93 6.1 Preliminaries ...... 93 6.2 DDT and Affine Subspace of an S-box ...... 94 6.2.1 Deriving the DDT from low dimension affine subspace 95 6.2.2 Deriving affine subspace from the DDT ...... 95 6.2.3 Recovering all affine subspace transitions from the DDT 96 6.2.4 Remarks on affine subspace with higher dimension . . 97 6.3 Case Study and Search for Strong S-boxes...... 99 6.3.1 Classification for case analysis...... 99 6.3.2 Searching for strong involutory S-boxes ...... 101 6.3.3 Searching for strong non-involutory S-boxes . . . . . 103 6.4 Summary ...... 105

Encryption Algorithm Design 107

7 Beyond Ultra-Lightweight Block Ciphers 109 7.1 On Lightweight Block Ciphers...... 109 7.1.1 Design choices of lightweight block ciphers ...... 109 7.1.2 Challenges in lightweight primitive designs ...... 110 7.2 Designing BULB Ciphers...... 113 7.2.1 Going beyond ultra-lightweight block ciphers . . . . . 113 7.2.2 Design strategies ...... 114 7.2.3 Design approaches ...... 115 7.3 New Framework for Block Ciphers ...... 115 7.3.1 Classification of block ciphers ...... 115 7.3.2 Ideal rate of influence of block ciphers ...... 116

8 The SKINNY Family of Block Ciphers 119 8.1 Specifications of SKINNY ...... 120 8.2 Rationale of SKINNY ...... 127 8.2.1 The designing of SKINNY ...... 127 8.2.2 General design and components rationale ...... 128 8.2.3 Comparing differential bounds ...... 132 8.3 Security Analysis...... 134 8.3.1 Differential and linear cryptanalysis ...... 135 8.3.2 Impossible differential cryptanalysis ...... 136 8.3.3 Invariant subspace attacks ...... 138 8.3.4 Algebraic attacks...... 140 vi CONTENTS

8.3.5 Third-party security analysis of SKINNY ...... 140 8.4 Performance and Comparison ...... 141 8.4.1 Hardware implementations ...... 141 8.4.2 Software implementations ...... 143 8.5 Conclusion...... 144

9 GIFT: A Small Present 145 9.1 Specifications ...... 146 9.1.1 Specification...... 146 9.1.2 GIFT in 2-dimensional array ...... 149 9.2 Design Rationale ...... 152 9.2.1 The Designing of GIFT ...... 152 9.2.2 Design of the GIFT bit-permutation ...... 153 9.2.3 Selection of the GIFT S-box ...... 158 9.2.4 Design of the GIFT ...... 159 9.3 Security Analysis...... 161 9.3.1 Differential and linear cryptanalysis ...... 161 9.3.2 Impossible differential cryptanalysis ...... 163 9.3.3 Invariant subspace attacks ...... 164 9.3.4 Algebraic attacks...... 165 9.4 Performance and Comparison ...... 165 9.4.1 Hardware implementations ...... 165 9.4.2 Software implementations ...... 167 9.5 Conclusion...... 168

Further Discussion 169 10 Conclusion 171 10.1 Summary ...... 171 10.2 Other Works...... 172 10.3 Future Research...... 172

Bibliography 175 Abstract

Lightweight cryptography has been a rising topic along with the global development of very constrained computing devices, for which conventional cryptographic algorithms are often too resource-consuming to fit the use-cases such as RFID tags. Conducting cryptanalysis gives us better understanding on how to protect against adversaries. In the first part of this thesis, we analyse two lightweight symmetric-key primitives — JAMBU and Midori. For both primitives, we found practical attacks. Studying the building blocks of ciphers is essential for designing crypto- graphic primitives. In the second part of this thesis, we look into the com- ponents of Substitution-Permutation Network based block ciphers, namely the diffusion matrix and the S-box. We found new efficient components that ensure strong security properties. Combining our experience acquired in cryptanalysis and our knowledge of the cryptographic components, we are ready to design new ciphers. In the final part of this thesis, we propose two new block ciphers, SKINNY and GIFT, aiming at lightweight applications. They offer excellent performance and security against state-of-the-art cryptanalysis techniques.

vii viii CONTENTS List of Works

Below is the list of works done during my doctoral research in NTU.

Publications In chronological order: 1. Dylan Toh, Jacob Teo, Khoongming Khoo, Siang Meng Sim. Lightweight MDS Serial-type Matrices with Minimal Fixed XOR Count. Africacrypt 2018. 2. Jérémy Jean, Thomas Peyrin, Siang Meng Sim, Jade Tourteaux. Op- timizing Implementations of Lightweight Building Blocks. IACR Transactions on Symmetric Cryptology (ToSC) 2017 / Fast Software Encryption (FSE) 2018. 3. Khoongming Khoo, Eugene Lee, Thomas Peyrin, Siang Meng Sim. Human-readable Proof of the Related-Key Security of AES-128. IACR Transactions on Symmetric Cryptology (ToSC) 2017 / Fast Software Encryption (FSE) 2018. 4. Subhadeep Banik, Sumit Kumar Pandey, Thomas Peyrin, Yu Sasaki, Siang Meng Sim, Yosuke Todo. GIFT: A Small Present. Crypto- graphic Hardware and Embedded Systems (CHES) 2017. 5. Ralph Ankele, Subhadeep Banik, Avik Chakraborti, Eik List, Flo- rian Mendel, Siang Meng Sim, Gaoli Wang. Related-Key Impossible- Differential Attack on Reduced-Round SKINNY. Applied Cryptog- raphy and Network Security (ACNS) 2017. 6. Jian Guo, Jérémy Jean, Ivica Nikolić, Kexin Qiao, Yu Sasaki, Siang Meng Sim. Invariant Subspace Attack Against Midori64 and The Resistance Criteria for S-box Designs. IACR Transactions on Sym- metric Cryptology (ToSC) 2017 / Fast Software Encryption (FSE) 2017. 7. Christof Beierle, Jérémy Jean, Stefan Kölbl, Gregor Leander, Amir Moradi, Thomas Peyrin, Yu Sasaki, Pascal Sasdrich, Siang Meng Sim. The SKINNY Family of Block Ciphers and its Low-Latency Vari- ant MANTIS. International Cryptology Conference (CRYPTO) 2016.

ix x CONTENTS

8. Sumanta Sarkar, Siang Meng Sim. A deeper understanding of the XOR count distribution in the context of lightweight cryptography. Africacrypt 2016.

9. Meicheng Liu, Siang Meng Sim. Lightweight MDS Generalized Circu- lant Matrices. Fast Software Encryption (FSE) 2016.

10. Thomas Peyrin, Siang Meng Sim, Lei Wang, Guoyan Zhang. Crypt- analysis of JAMBU. Fast Software Encryption (FSE) 2015.

11. Siang Meng Sim, Khoongming Khoo, Frédérique Oggier, Thomas Peyrin. Lightweight MDS Involution Matrices. Fast Software Encryption (FSE) 2015.

This thesis consists of works from above except [1,2,3,8].

Miscellaneous 12. Siang Meng Sim, Lei Wang. Practical Forgery Attacks on SCREAM and iSCREAM. SYLLAB website 2014. http://www1.spms.ntu.edu. sg/~syllab/m/images/b/b3/ForgeryAttackonSCREAM.pdf

I played a lead/active role in finding and writing the results in [1-5,8-11], while [6] and [12] are led by Yu Sasaki and Lei Wang respectively. My main contributions for [7] include designing and analysing of the SKINNY S-boxes and the key schedule. List of Symbols

∗ · · · ∗2 — Binary string where ∗ is either 0 or 1 hii — Integer i in binary representation ∧ — AND operation ∨ — OR operation ⊕ — Exclusive-OR (XOR) operation b — NOT operation on bit b • — Scalar product operation ∆ — Bitwise XOR-difference between a pair of binary strings Γ — Linear mask ≪ x — Left bit rotation by x positions ≫ x — Right bit rotation by x positions k — Concatenation ⊥ — null output $ — Random selection M> — Transpose of matrix M

Sx — Symmetric group on a finite set of x objects

Zx — Integers modulo x O() — Big O notation

xi xii CONTENTS List of Abbreviations

AE — Affine equivalence AES — Advanced Encryption Standard BOGI — Bad Output must go to Good Input BULB — Beyond ultra-lightweight block (cipher) CAESAR — Competition for Authenticated Encryption: Security, Appli- cability, and Robustness CEC — Compact equivalence classes DC — Differential cryptanalysis DDT — Differential distribution table DES — GE — Gate equivalent GFN — Generalised Feistel Network IND-CCA — Indistinguishability under chosen attack IND-CCA2 — Indistinguishability under adaptive chosen ciphertext attack IND-CPA — Indistinguishability under chosen plaintext attack INT-CTXT — Integrity of LAT — Linear approximation table LC — Linear cryptanalysis LSB — Least significant bit m.d.p. — Maximal differential probability MDS — Maximum distance separable MILP — Mixed-integer linear programming m.l.a. — Maximal absolute linear bias approximation MSB — Most significant bit NIST — National Institute of Standards and Technology NSA — National Security Agency PE — Permutation-xor equivalence SPN — Substitution-Permutation Network TBC — Tweakable XOR — Exclusive-OR

xiii xiv CONTENTS Background

1

Chapter 1

Introduction

In this chapter, we first give an introduction to cryptology in Section 1.1. Next, we elaborate on symmetric-key cryptography in Section 1.2, followed by a discussion on lightweight cryptography in Section 1.3. Lastly, we give an overview and outline of this thesis in Section 1.4.

1.1 Cryptology

1.1.1 What is cryptology?

Communication is an act of conveying information between entities, it al- lows us to share information, understand one another and live together as a community. While there is information worth forwarding and sharing to the rest of the world, for instance cute kitten pictures1, there is also sensi- tive information, such as medical history or even military operations, that needs to be protected from unintended parties. However, transmission of information could be easily intercepted or observed by potential adversaries, hence private communication is not guaranteed. Instead, establishing a secure communication becomes the best option for protecting the information. Cryptology is the study of secure communications between two parties in the presence of unauthorised third parties. Cryptology can be divided into two sub-branches, cryptography — the art of designing and performing secure communications, and cryptanalysis — the art of analysing and breaking them. One of the roles of cryptography is to protect the confidentiality of information, the basic idea is to encrypt the information with a cryptographic key before sending it through an untrusted channel to the intended party, who has the corresponding cryptographic key to decrypt and recover the original information. To any unauthorised observer, the encrypted information should look gibberish and unintelligible. Cryptanalysis, on the other hand, involves analysing the in attempt to decipher encrypted information without the knowledge of the cryptographic key.

1By courtesy of Google search, Figure 1.10.

3 4 CHAPTER 1. INTRODUCTION

Designing a requires vast knowledge of crypt- analysis in order to protect against various attacks. Conversely, attacking a cryptographic primitive requires deep understanding of its components and their properties in order to find weakness in them. Therefore, cryptogra- phy and cryptanalysis are like the two sides of the same coin, inextricably intertwined with each other.

1.1.2 Symmetric-key and public-key cryptography

One can generally divide cryptography into two main areas — symmetric-key cryptography and public-key cryptography. The former uses a cryptographic key (secret key) for both encryption and decryption process and thus the same secret key has to be shared by both communicating parties. The latter uses a public key and private key per entity to set up a cryptosystem where the public key allows encryption of messages intended for the corresponding entity, and the private key is used for decryption. Although public-key cryptography makes the complex and sensitive key distribution problem easier, symmetric- key cryptography is generally more computationally efficient. Therefore, a common practice is to use public-key cryptosystem to exchange the secret key and symmetric-key cryptosystem for continuous communications, as illustrated in Figure 1.1.

Figure 1.1: Public-key and Symmetric-key cryptosystem

Imagine a scenario where Alice and Bob want to communicate through a 1.1. CRYPTOLOGY 5 public domain in the presence of Eve who is trying to eavesdrop2. In order to prevent Eve from knowing the content of their conversation, Alice first selects a secret key and encrypts it with Bob’s public key. Next, she sends this encrypted secret key to Bob through an authenticated channel3, and Bob decrypts it with his private key to retrieve the secret key. After which, Alice and Bob can communicate through an insecure channel with each other using the same secret key for encryption and decryption. Throughout the entire process, Eve can only observe encrypted information which is unintelligible to her.

1.1.3 Types of security analysis

Given a cryptographic primitive, there are various ways to analyse its security. Here, we categorise three types of security analysis as described in [118]:

Computational security. A cryptographic primitive is considered to be computational secure if the best algorithm for breaking it requires at least N computational resources, where N is some specified very large number that is impractical in real-life situations. However, in many cases, the “best algorithm” is not known. Thus, in practice, people analyse the computa- tional security of a primitive under specific types of attacks. Since showing security against one specific attack does not imply security against other attacks, security analysis is usually conducted by checking many known attack techniques.

Provable security. Typically done by reducing the cryptographic property of a primitive to some well-studied mathematical problem that is believed to be difficult to solve. That is to say, a primitive remains “provable secure” as long as the underlying mathematical problem remains hard to solve. For instance, the security of the public-key cryptosystem RSA [96] is related to the difficulty to factorise very large (hundreds of digits) numbers that are product of two very large prime numbers.

Unconditional security. The holy grail for designing secure cryptographic primitives. A primitive is unconditionally secure if it cannot be broken even when the adversary has infinite computational resources. The one-time pad [111] (OTP) is an example cryptosystem that is unconditionally secure (assuming that the key is truly random and never reused).

2The fictional characters Alice and Bob are first mentioned in [96] and are commonly used in cryptographic scenarios, originated from the first two alphabets in English language ‘A’ and ‘B’. The adversary, on the other hand, comes in many different names, for instance Charlie (from the next letter ‘C’), Eve (represents eavesdropping), or Oscar (represents observer). 3A way of transmitting information that is resistant to any tampering of informa- tion. In practice, one uses key encapsulation mechanism/data encapsulation mechanism (KEM/DEM) schemes. 6 CHAPTER 1. INTRODUCTION

1.2 Symmetric-key Cryptography

A cryptographic algorithm that performs the encryption and decryption process is called cipher or encryption algorithm. The encryption process takes a message (plaintext P ) and secret key K as inputs, and outputs an encrypted message (ciphertext C). On the other hand, the decryption process takes in the same secret key and a ciphertext, and returns the original plaintext.

1.2.1 Overview of block ciphers

In this digital era, information is systematically encoded into binary strings, where each binary unit (bit) holds a value of either 0 or 1. The size or length is the total number of bits in the binary string4. In general, we can distinguish two big families of encryption algorithms — block ciphers and stream ciphers. In a nutshell, a block cipher works on a fixed length message block and secret key. The encryption process (Figure 1.2) takes in an n-bit plaintext and k-bit key as inputs, and outputs an n-bit ciphertext. On the other hand, the decryption process (Figure 1.3) takes in the same key and a ciphertext and returns the original plaintext. Fixing the secret key, a block cipher can be perceived as a bijective function mapping from an n-bit string to another, and changing the key generates a different permutation instance.

Figure 1.2: Block cipher encryption Figure 1.3: Block cipher decryption

In contrast to block ciphers, a uses a k-bit secret key to generate a key stream for masking arbitrary length plaintext into ciphertext5. The study of stream ciphers, however, is not the main focus of this thesis.

4Because of this, large numbers are usually expressed as powers of 2. 5This is known as the synchronous stream ciphers. Another type of stream ciphers, so-called self-synchronizing stream ciphers, additionally use some of the previous ciphertext bits to generate the key stream. 1.2. SYMMETRIC-KEY CRYPTOGRAPHY 7

Design philosophy

While there is no fixed rule on how a block cipher is constructed, it is typically designed such that the operations can be performed iteratively, so-called iterative block cipher. As illustrated in Figure 1.4, it first initialises an internal state IS with the plaintext. Next, the encryption process is realised by iteratively applying a round function to the internal state for r rounds. Concurrently, a key schedule is used to generate round keys or subkeys from a secret key which are inserted into the internal state to produce the ciphertext.

Figure 1.4: Iterative block cipher overview

Iterative block cipher design keeps the description of the algorithm concise and comprehensive. In hardware implementations, the encryption can be performed by implementing and reusing the same circuit (describing one round function) iteratively; such an implementation is called round-based implementation and it is the most commonly used architecture. Analysing the security of the cipher is also simpler, by proving a certain level of security for a small number of rounds, this result can be extrapolated to estimate the security of the entire cipher. Having said that, the construction of iterative block ciphers boils down to designing a round function, for which one can generally distinguish two main frameworks. Substitution-Permutation Network (SPN). Many block ciphers such as AES [39] (the current worldwide standard), PRESENT [28], LED [51], and Midori [5] adopted this design structure. An SPN round function consists of a substitution layer (S-layer) and a permutation layer (P-layer) as illustrated in Figure 1.5. The S-layer usually consists of several s-bit substitution boxes (S-boxes) applied in parallel, substituting every s-bit word for another word. On the other hand, the P-layer mixes several s-bit words together. It is necessary for both the S-layer and P-layer to be invertible as the decryption process requires inverting the S-layer and P-layer. Feistel Network. Named after cryptographer Horst Feistel, this construc- tion was adopted by many block ciphers such as DES [42], CLEFIA [114], Piccolo [112], and [12]. Its round function, as illustrated in Fig- ure 1.6, splits the internal state into two parts, alternatingly taking one part as input to an F-function and inserting its output to the other part. Due to 8 CHAPTER 1. INTRODUCTION the nature of its construction, it requires at least two round function calls before the entire internal state is masked by the key. Another distinctive difference from SPN is that the F-function does not need to be invertible as the decryption process uses the F-function in the same forward direction. Yet, the construction of the F-function can itself be based on SPN design principles.

Figure 1.5: 2-round SPN Figure 1.6: 2-round Feistel Network

1.2.2 Cryptanalysis of encryption algorithms

In this section, we give a short overview of cryptanalysis of ciphers. We recall some settings for an adversary to mount an attack and the types of attacks on ciphers.

Kerckhoffs’ principle In 1883, the Dutch cryptographer Auguste Kerckhoffs stated six design principles for military ciphers [66], and the most famous of them is the second principle which states that (translated from French): It must not require secrecy, and that it is not compromised when it falls into the hands of the enemy. In other words, the details of a cipher can be made known to the public, but as long as the secret key is not known, it should remain secure. This principle is now commonly known as the Kerckhoffs’ principle and is generally followed by cryptographic designers.

Attack models Besides knowing the description of a target cipher, an attack model defines the amount of information and resources that an adversary has access to. 1.2. SYMMETRIC-KEY CRYPTOGRAPHY 9

We define an attack model to be stronger (resp. weaker) than another if the adversary is given access to more (resp. lesser) information and resources. The attack models are, somewhat, arranged in an increasing strength order.

Ciphertext-only attack (COA): The adversary can only observe the ciphertexts. Known-plaintext attack (KPA): The adversary can observe limited number of plaintexts and their corresponding ciphertexts. Chosen-plaintext attack (CPA): The adversary has temporary access to the encryption oracle, thus able to choose the plaintexts and query for their corresponding ciphertexts. Adaptive chosen-plaintext attack (CPA2): In addition to the tem- porary access to the encryption oracle, the adversary can analyse the previous query responses before choosing the next plaintext for query. Chosen-ciphertext attack (CCA): The adversary has temporary ac- cess to the decryption oracle, thus able to choose the ciphertexts and query for their corresponding plaintexts. Adaptive chosen-ciphertext attack (CCA2): In addition to the tem- porary access to the decryption oracle, the adversary can analyse the previous query responses before choosing the next ciphertext for query.

Although stronger attack models seem a bit artificial for some practical scenarios, they usually cover weaker attack models. For instance, if a cipher can be shown to be secure under the CPA2 model, it implies security under CPA, KPA and COA. Various attack models may be combined to form new scenarios for more complex cryptanalysis. For instance, the so-called cryptanalysis technique [127] uses both CPA2 and CCA2. Cryptographic key setting. Besides the attack models, we can also consider how a secret key for the target cipher plays a part in an attack model.

Single-key attack: A fixed secret key for the target cipher is randomly selected from the entire key space. Chosen related-key attack: Given a target secret key, the adversary can choose other keys that have some mathematical relation to the target key, and query encryption of plaintexts using the target key and the other keys. attack: A weak key is a cryptographic key that, when used in specific cipher, generates a permutation instance with some undesirable property. The adversary usually has to first identify a set of weak keys, then assume that the target key belongs to this set.

Types of cryptanalysis Here, we introduce some common terminology in cryptanalysis. Broadly speaking, a generic attack (or trivial attack) does not exploit the properties of 10 CHAPTER 1. INTRODUCTION the internal components of a cipher; instead, a cipher is viewed as a black box and a generic attack provides a security upper bound based on its parameters. This is often associated with the concept of brute force, which is to exhaust all the possible combinations rather than applying any strategy to reduce the search space. Therefore, the computational complexity of an attack should be significantly lower than generic attacks. Following Kerckhoffs’ principle, the secret key is the only unknown factor of a cipher to an adversary. Thus, the primary goal for an adversary is to recover the secret key, but there are also other forms of attack. Key-recovery. The goal of a key-recovery is to recover the secret key used for the encryption. Note that once the key is obtained, any ciphertext can easily be decrypted. A generic key-recovery method is to brute force all possible keys until the correct key is identified. However, the computational complexity is O(2k) (for trying all possible k-bit key candidates), which is impractical for large k. It is considered a key-recovery attack if an adversary can recover some key bit information, say l bits, with complexity significantly lower than 2k (indeed the remaining key bits can then be brute forced with complexity 2k−l). Plaintext-recovery. Similar to a key-recovery, the goal of a plaintext- recovery is to recover the corresponding plaintext for a target ciphertext. This is, of course, assuming that the adversary does not have access to the decryption oracle and make a decryption query for the target ciphertext directly. It is a weaker attack notion than key-recovery as the adversary does not know the secret key and the attack has to be repeated for every new ciphertext. Distinguishing attack. Another form of attack where the adversary’s goal is to distinguish a target cipher from some random permutation. Imagine a game where, with equal chance, a target cipher with a random key or a random permutation is selected as the oracle, the adversary then interacts with this oracle. The goal of the adversary is to determine if the oracle is the target cipher. If the adversary is able to determine the identity of the oracle (being the target cipher or not) faster than brute force search and with probability significantly higher than a random pick, the adversary wins the game and the algorithm of the distinguishing attack is called a distinguisher. A distinguisher is often used as the building block for constructing a key- recovery attack. Ciphers that are secure against distinguishing attacks are said to be in- distinguishable from a random permutation, and the level of security depends on the attack model. Some common definitions are

• indistinguishability under chosen plaintext attack (IND-CPA), • indistinguishability under chosen ciphertext attack (IND-CCA), and • indistinguishability under adaptive chosen ciphertext attack (IND- CCA2). 1.2. SYMMETRIC-KEY CRYPTOGRAPHY 11

Shannon’s properties of secure cipher

The evolution of cryptology was deeply influenced by the paper entitled “Communication Theory of Secrecy Systems” by Claude Shannon, commonly known as “the father of information theory”, in 1949 [111]. In his paper, Shannon identified two fundamental properties for designing a secure cipher — confusion and diffusion. In his original words: The method of confusion is to make the relation between the simple statistics of (encrypted message) and the simple description of (key) a very complex and involved one. This refers to the complexity and complication of the secret key with the ciphertext, making it difficult to isolate the key information from the ciphertext. In the method of diffusion the statistical structure of (message) which leads to its redundancy is “dissipated” into long range statistics. This refers to the spreading and hiding of the plaintext information in the ciphertext, making it difficult to identify any statistical structure of the plaintext from the ciphertext. For SPN-based block ciphers, the confusion and diffusion properties are achieved by the S-layer and P-layer, details in Section 2.2.

Classical cryptanalysis techniques

Cryptanalysis techniques exploit the properties and potential weaknesses of the components of a specific cipher to launch attacks with complexity lower than the generic attack. In the following, we give a quick introduction to the two most widely used cryptanalysis techniques on block ciphers — differential cryptanalysis and linear cryptanalysis. Details of these two (and other) techniques are presented in Section 2.3. Differential cryptanalysis. Introduced by Biham and Shamir [24], differ- ential cryptanalysis is a chosen-plaintext attack that involves choosing the exclusive-or (XOR) difference (difference for short) of two plaintexts and observing the difference of their corresponding ciphertexts. The difference of two n-bit strings is obtained by bitwise XORing the two binary strings. For instance, the difference of 00112 and 10102 is 10012 = 00112 ⊕ 10102. For any non-zero input difference, it propagates to an output difference with some probability, and such transition is called a differential. The general idea is to pre-define an input and output difference pair with relatively high differential probability, and make encryption queries for many plaintext pairs with this input difference in hope to detect high frequency of the correspond- ing output difference in ciphertext pairs. This leads to a distinguishing and even a key-recovery attack. Linear cryptanalysis. Developed by Matsui [85], linear cryptanalysis is a known-plaintext attack that exploits a potential linear bias (more details 12 CHAPTER 1. INTRODUCTION in Section 2.1.5) in the cipher. Expressing some of the plaintext and ciphertext bits as a linear Boolean expression, it holds true (equals to 0) with probability p. The linear bias of the linear expression is stronger (and in the adversary’s favour) if p is further away from probability 1/2.A linear mask is used to denote the extraction of specific bits from the plaintext or ciphertext to form the linear expression. For instance, the linear mask 00112 is used to extract bits b0 and b1 from a 4-bit string b3b2b1b0. i.e. b1 ⊕ b0 = 00112 • b3b2b1b0, where • represents the scalar product operation. The general idea is to pre-define an input and output linear mask pair that form a linear expression with a relatively strong bias, and collect many plaintext-ciphertext pairs in a hope to detect this bias. This leads to a distinguishing and even a key-recovery attack.

1.2.3 Other symmetric-key cryptographic primitives

Besides block ciphers, there are many other symmetric-key cryptographic primitives. For conciseness, we introduce only those that are relevant to this thesis.

Modes of operation

Since a block cipher can only handle a fixed-size plaintext, in order to encrypt messages of arbitrary length, a mode of operation (mode for short) specifies how the message is processed. In general, it involves partitioning a message into blocks of suitable length for the underlying block cipher, the message block to the required block length when necessary, and iteratively processing the message blocks using a block cipher encryption. In most cases, the mode uses a block cipher as a black box for processing the message blocks and is independent of the underlying block cipher apart from its block size. Here we state few of the famous modes: Electronic Codebook (ECB). Each plaintext block is encrypted indepen- dently to produce the ciphertext block. It is the simplest mode and both encryption and decryption process can be parallelised, see Figure 1.7.

Figure 1.7: ECB mode encryption and decryption

Cipher Block Chaining (CBC). In this mode, the previous ciphertext block is XORed to the next plaintext block before encryption. For the very 1.2. SYMMETRIC-KEY CRYPTOGRAPHY 13

first plaintext block, an initialisation vector (IV) is XORed to the plaintext, where the IV should be unpredictable and random [16]. Unlike ECB mode, the encryption cannot be parallelised as it needs to wait for the previous encryption to be completed before next call, as shown in Figure 1.8. The decryption, on the other hand, can be parallelised.

Figure 1.8: CBC mode encryption and decryption

Output Feedback (OFB). In this mode, the output of a block cipher encryption is used as both the input to the next block cipher encryption and for XORing to the plaintext to produce the ciphertext. An IV is used as the input to the first block cipher encryption. By the nature of the construction, it does not require the block cipher decryption for the decryption process, see Figure 1.9.

Figure 1.9: OFB mode encryption and decryption

Although ECB mode can be parallelised, identical plaintext blocks are encrypted to the same identical ciphertext, thus some information of the entire content may be observed. See Figure 1.11 for instance, the white pixels of the image are encrypted to the same ciphertext, giving a distinctive outline of the original image (Figure 1.10). On the other hand, because of the chaining effect, CBC and OFB modes do not have this issue (see Figure 1.12).

Authenticated cipher In the scenario illustrated in Figure 1.1, the cryptosystem protects primarily the confidentiality of the transmitted information, which is to prevent dis- closing of the content to Eve. Besides confidentiality, there are two other properties that are important in practice, the integrity and authenticity of the information. The former ensures that the information that Bob received 14 CHAPTER 1. INTRODUCTION

Figure 1.10: Kitten Figure 1.11: ECB mode Figure 1.12: CBC mode was not intentionally or unintentionally modified. The latter ensures that the information that Bob received is indeed from Alice and not from Eve. In general, the procedure of encryption protects the confidentiality (which is the main focus of this thesis) and authentication ensures the integrity and authenticity of the information. Authentication is another well-studied area in symmetric-key cryptogra- phy: cryptographic primitives such as hash functions and message authen- tication code (MAC) algorithms are used for protecting the integrity and authenticity of the information. This, however, falls outside the scope of this thesis. In 2000, Bellare et al. [17] proposed authenticated encryption schemes which integrate the authentication (protecting both authenticity and integrity) and encryption (protecting confidentiality) procedures together. Two years later, the concept was extended to authenticated-encryption with associated- data (AEAD) by Rogaway [98]. The main difference being that part of the message, the associated-data, only needs to be authenticated but not encrypted. Since then, many new authenticated cipher designs adopted the AEAD framework, for instance the Galois/Counter Mode of operation [86] (GCM). In a nutshell, the encryption process takes in an associated-data, plaintext and secret key as inputs and produces a ciphertext (encrypted plaintext) and tag (generated from associated-data and plaintext/ciphertext using the key). The decryption process takes in an associated-data, ciphertext, tag and the same secret key. If it generates a tag that does not match the input tag, the authentication fails and it outputs null, ⊥. Otherwise the authentication is valid and it outputs the original plaintext. Some schemes also include a public initialisation vector (or nonce) as an input to the encryption and decryption process. In most cases, the same nonce should not be reused for encryption under the same secret key, otherwise the security of the scheme might be compromised. Attack models that abide to this rule are nonce-respecting, otherwise nonce-misuse. Nonce-respecting scenario: Assume that the adversary has temporary access to the encryption oracle, any encryption is performed with a different nonce. Nonce-misuse scenario: Assume that the adversary has temporary 1.3. LIGHTWEIGHT CRYPTOGRAPHY 15

access to the encryption oracle, several can be performed with the same nonce. In both scenarios, the adversary can query the decryption oracle with the same nonce (assuming that he has access to the decryption oracle), but will obtain the corresponding plaintext if the authentication tag is valid. The Competition for Authenticated Encryption: Security, Applicability, and Robustness [33] (CAESAR) is an ongoing competition that aims to select a portfolio of authenticated ciphers suitable for various use-cases. The CAESAR competition began in 2014 with 57 proposals submitted from all around the world. The competition has announced in March 2018 its 7 finalists and the final portfolio is expected to be out about a year later.

Tweakable block cipher Introduced by Liskov et al. [80] in 2002, the concept of tweakable block cipher (TBC) introduces a new input (so-called tweak) to block ciphers (Figure 1.13), where the cost of changing the tweak should be cheaper than the cost of changing the key. The goal is to allow users to generate different permutation instances under the same key at a lower cost than changing the key itself. In addition, the tweak can be disclosed to the public. One can build TBC from block ciphers, but there exists also a trend in designing ad-hoc TBC. In 2014, Jean et al. [57] proposed a new framework TWEAKEY which unifies the usage of tweak and key into one element called tweakey. It provides a simple method to design a tweakable block cipher with any tweak and (Figure 1.14). Deoxys [59] and Joltik [58], proposals submitted to the CAESAR competition, are examples that adopt the TWEAKEY framework for their internal block cipher.

Figure 1.13: Tweakable block cipher Figure 1.14: TWEAKEY framework

1.3 Lightweight Cryptography

Pervasive computing is becoming increasingly important in many applications of our daily lives. The Internet of Things (IoT) is the ever-increasing 16 CHAPTER 1. INTRODUCTION

collection of devices inter-connected with each other through the Internet. This includes countless small and constrained devices such as Radio-Frequency IDentification (RFID) tags and wireless sensor nodes which might manipulate sensitive data and thus require some form of security. For instance, fitness tracker watches collect personal health data and transmit these information through the Internet. This gives rise to a new subfield of cryptography — lightweight cryptography. In this section, we first give a brief history on conventional symmetric-key cryptography followed by a discussion on lightweight cryptography.

1.3.1 Brief history on conventional cryptography

Developed by IBM in the early 1970s, the 64-bit block and 56-bit key size Data Encryption Standard (DES) was used worldwide for encryption of electronic data. Over the years, the advancement in electronic technology and analysis on DES led to its downfall. Some of the notable events: In 1977 (not long after the adaptation), Diffie and Hellman proposed the first exhaustive key search engine on the DES [43], with an estimated cost of US$20 million and able to find a key in 12 hours on average. In 1992, Biham et al. presented the first theoretical attack on full-round DES [25]. A year later, Matsui presented the linear cryptanalysis on full-round DES [85] under the known-plaintext attack model. In the same year, Wiener provided a design for an exhaustive DES key search machine with an estimated cost of US$1 million that can find a key in 3.5 hours on average [128]. In 1997, the DESCHALL Project group publicly broke a message encrypted using DES in 39 days. To showed that cracking DES was actually a very practical proposition, Electronic Frontier Foundation (EFF) built a machine, known as the EFF DES Cracker, in 1998 for less than US$250 thousand and broke DES after only 56 hours of work. These series of events triggered a need for a new secure encryption algorithm, with larger key and block size. In 1997, the United States National Institute of Standards and Technology (NIST) announced an open competition for a new Advanced Encryption Standard to select a successor for DES. Proposals were submitted from around the world and thorough analysis on their security and performance were conducted. In 2001, the block cipher design Rijndael (now commonly known as AES) was selected as the winner and new standard. Even till today, AES remains secure and has become the most deployed block cipher. AES has a simple and neat structure allowing one to prove its security and resistance against classical cryptanalysis, especially differential and linear cryptanalysis. As AES has provided a detailed security analysis and substantial security margin, it became a prominent reference for researchers working on symmetric-key primitives. One of the research trends related to AES is improving the implementation of AES and making it more efficient on various platforms. One of the notable works is the compact implementation of the AES S-box by Canright [34] which reduces the ASIC implementation area by 20%. 1.3. LIGHTWEIGHT CRYPTOGRAPHY 17

Another research direction is designing new ciphers with comparable secu- rity but with better performance than AES. Over the years, cryptographers gained good experience in designing secure primitives and turned their atten- tion to their implementation and performance. One of the advancements is that designers now understand that it is not necessary to have very complex components for block ciphers. For instance, instead of having seemingly random round constant values, it is sufficient to use a simple round counter to generate distinct round constant values for each round. This is one of the main reasons that in recent years new primitive designs have simpler round functions and key schedules (but with more iterations to achieve the target security). In some sense, the study of cryptography undergoes a paradigm shift from security-oriented to performance-oriented research. Nevertheless, ensuring adequate security remains as fundamental criteria for cryptographic primitive design.

1.3.2 Rise of lightweight cryptography

On a global scale, there is a shift from using general-purpose computers to using dedicated resource-constrained devices. Unfortunately, conventional cryptographic primitives such as AES may not be suitable for such devices as they can only dedicate a small portion of the already constrained hardware for security purposes. Even when it can be implemented, the performance may not be acceptable. “Lightweight cryptography aims to provide solutions tailored for resource- constrained devices6.” In March 2017, the NIST published a report [87], entitled “Report on Lightweight Cryptography”, summarising the finding of NIST’s lightweight cryptography project and outlining NIST’s plans for the standardisation of lightweight cryptographic primitives. In the following, we highlight some of the key points brought up in the report.

Target devices

Nowadays, there is a wide range of devices that may manipulate sensitive data and require some form of security, an overview of the device spectrum7 is illustrated in Figure 1.15. While conventional cryptographic primitives are designed to perform well on high-end devices, low-end devices have resource constraints and conventional primitives such as AES may be impossible or not suitable to be implemented on these constrained devices. Lightweight cryptographic primitives primarily target these low-end de- vices such as embedded systems, RFID devices and sensor networks. At the same time, these low-end devices may interact with high-end devices (for

6Quoted from [87]. 7Note that embedded systems fall in the grey area between conventional and lightweight cryptography. For simplicity, we categorise them under lightweight cryptography as in [87]. 18 CHAPTER 1. INTRODUCTION

Figure 1.15: Device spectrum instance a server). Thus, it is important that these lightweight primitives also have adequate performance on high-end devices too.

Performance metrics

The performance of a cryptographic primitive can be quantified with many different metrics; here we list the common metrics and the concerns on lightweight cryptography: Hardware metric. In hardware applications, in particular for ASICs, the physical area requirement is typically measured in terms of gate equivalents (GE), where one GE is the area (in µm2) required by a two-input NAND gate and other gates are also expressed in terms of GE obtained from dividing their area by the area of a two-input NAND gate8. It is pointed out by Juels and Weis [62] that a low-cost RFID tag may allow only 200 to 2000 GE for security purposes9. Software metric. In software applications, the common resource require- ments include the number of registers, the amount of random access memory (RAM) and read-only memory (ROM) space. These resources are usually scarce and limited in constrained devices. Power usage. For devices that do not have their own power source and harvest power from their surroundings, it is essential to have a low power consumption. For instance, a passive RFID chip that powers its internal circuit using the electromagnetic field transmitted by a reader. Energy consumption. Battery-operated devices usually have a limited amount of stored energy, and it may be difficult or impossible to recharge or replace when their batteries are depleted. Therefore, in these situations it is crucial to maintain the energy consumption to the minimum. Latency. While in most cases having some small delay in respond is ac- ceptable, there are also real-time applications that require very fast response times. For instance, the steering, airbags or brakes controls of an automotive system.

8The area of the two-input NAND gate differs between libraries, thus each library has a different set of GE values for the various gates. 9We omit the discussion on FPGAs where the area requirement and implementation are quite different from ASIC. 1.4. ABOUT THIS THESIS 19

Throughput. It is the amount of data output over a period of time. While in most of lightweight applications, high throughput is not a necessary design goal, moderate throughput is still required for lightweight designs.

NIST’s lightweight cryptography project

Currently, the ISO/IEC 29192-2:2012 specifies two block ciphers suitable for lightweight cryptography:

PRESENT [28]: a lightweight block cipher with a block size of 64 bits and a key size of 80 or 128 bits;

CLEFIA [114]: a lightweight block cipher with a block size of 128 bits and a key size of 128, 192 or 256 bits.

An amendment was proposed in 2014 to include the block ciphers SIMON and [12] with various block and key size combinations. In 2015, the first working drafts of the amendments with SIMON and SPECK were initiated and the inclusion discussion is still ongoing. The scope of NIST’s lightweight cryptography project (initiated in 2014) includes all cryptographic primitives that are needed in constrained environ- ments, with an initial focus on symmetric-key cryptography. Instead of holding an open competition, like the AES competition, to select a new standard for lightweight cryptography, NIST’s approach on lightweight cryptography is to have an open call for proposals to develop and maintain a portfolio of lightweight cryptographic primitives approved for limited use. This is because the features and requirements for lightweight cryptography are constantly changing with the rapid advancement of technology. Thus, there is a need to constantly update this lightweight portfolio, provide new standards for specific usage and phase out those that are no longer suitable. All in all, lightweight symmetric-key cryptography is a rising topic and it is of great interest to us to study the efficiency and security of the lightweight symmetric-key cryptographic primitives.

1.4 About this Thesis

1.4.1 Overview

In this thesis, we conduct both breadth and depth studies on lightweight symmetric-key cryptography, from analysing the entire mode structure to studying the components at bit-level. New results are found in several aspects, including cryptanalysis of new lightweight primitive proposals, creating new lightweight components, and designing new state-of-the-art lightweight symmetric-key encryption algorithms. One should know how to attack in order to defend better. In the first part of our trilogy (Chapter 3 and4), we perform cryptanalysis on JAMBU 20 CHAPTER 1. INTRODUCTION

(an authenticated encryption mode submitted to the ongoing CAESAR competition) and a low energy consumption block cipher Midori. Our attacks have been implemented and verified. In the second part (Chapter 5 and6), we focus on the building blocks of cryptographic primitives, namely the diffusion matrix and the S-box, which are commonly used in block cipher designs. Regarding the former, we propose equivalence classes of diffusion matrices in term of branch number to significantly reduce the search space and perform exhaustive search for lightweight MDS matrices that was previously not feasible. We found several new lightweight diffusion matrices that outperform those in existing primi- tives. Regarding S-boxes, we discuss their resistance against a cryptographic technique called invariant subspace attack under different frameworks and search for strong candidates in various cases. In the finale of our trilogy (Chapter 7–9), we utilise our experience in cryptanalysis and knowledge in building blocks of symmetric-key primitives to design two new lightweight symmetric-key encryption algorithms. From two opposite design approaches, prioritising high security and reduce the area (high-security-reduce-area) and prioritising small area and improve the security (small-area-increase-security), we constructed two state-of-the- art symmetric-key primitives, SKINNY and GIFT, which offer excellent performance for lightweight applications. We also conduct extensive security analysis on them to show that they are both efficient and secure against state-of-the-art cryptanalysis techniques.

1.4.2 Organisation of this thesis

The chapters in this thesis are organised into alphabetically ordered parts:

Background

Chapter 1. We give an introduction to symmetric-key cryptography and the outline of this thesis. (You are near the end of this chapter.)

Chapter 2. We provide a quick recapitulation of several mathematical and cryptographic topics that are used throughout this thesis.

Cryptanalysis of Symmetric-key Cryptography

Chapter 3. We break some of the confidentiality claims of JAMBU, one of the submissions to the CAESAR competition, by showing a practical attack on JAMBU. This work was published in [95].

Chapter 4. We exploit the properties of the building blocks of Midori and perform distinguishing and key-recovery attacks on Midori64. This work was published in [49]. 1.4. ABOUT THIS THESIS 21

Diffusion and Substitution Layers Chapter 5. We analyse two types of diffusion matrices and propose equiv- alence classes of those matrices to reduce the search space and conduct exhaustive search for new lightweight candidates. These results were pub- lished in [82,115]. Chapter 6. We look into the security properties of S-boxes, study their resistance against invariant subspace attack under different scenarios. This work was published as part of [49].

Encryption Algorithm Design Chapter 7. We discuss the challenges and design strategies for designing lightweight symmetric-key encryption algorithms. Chapter 8. We design a new lightweight tweakable block cipher family SKINNY which is comparable with NSA’s 2013 design SIMON in terms of performance, while in addition proving much stronger security guarantees. Results from this chapter were published in [4,14]. Chapter 9. We revisit the design strategy of PRESENT, leveraging advance- ments in construction and cryptanalysis since its publication and obtain an improved version of a lightweight block cipher, named GIFT, that has better performance while correcting some well-known weakness of PRESENT. This work was published in [7].

Further Discussion Chapter 10. We conclude this thesis and state some possible directions for future research. 22 CHAPTER 1. INTRODUCTION Chapter 2

Preliminaries

In this chapter, we provide some mathematical background (Section 2.1) that is used in this thesis, elaborate more on the components of SPN ciphers (Section 2.2) and recapitulate some cryptanalysis techniques (Section 2.3).

2.1 Mathematical Background

2.1.1 Matrix notation and properties

In this section, we give the common notation and some matrix properties used throughout this thesis. We assume matrices to be matrices of order m unless otherwise specified. We denote matrices with bold letters in uppercase and vectors with bold letters in lowercase. Entries of matrices. Starting index from 0, the first row (resp. first column) is denoted as row 0 (resp. column 0). M[i, j] denotes the (i, j)-entry of matrix M (i for row, j for column). The (i, j)-entry of the product matrix AB can Pm−1 be expressed as x=0 A[i, x]B[x, j]. Involution. A matrix is involutive if it is self-inverse, i.e. M−1 = M. It is also called an involutory matrix. Transpose. The transpose of a matrix is an operation that flips a matrix over its diagonal. The entries of the transposed matrix M> can be expressed in terms of the original matrix M by swapping the row and column entries. i.e. M>[i, j] = M[j, i]. Orthogonal. A matrix is orthogonal if its transpose matrix is also its inverse matrix, i.e. M> = M−1. Permutation matrix. A permutation matrix P is a binary matrix with exactly one ‘1’ in each row and column and ‘0’s everywhere else. The transpose of a permutation matrix is also its inverse, i.e. P−1 = P>. This is > Pm−1 > because the (i, j)-entry of the product matrix PP is x=0 P[i, x]P [x, j] = Pm−1 x=0 P[i, x]P[j, x] which equals to 1 if and only if i = j, and 0 otherwise.

23 24 CHAPTER 2. PRELIMINARIES

2.1.2 Finite field notation and properties

In symmetric-key cryptography, finite field arithmetic is often used to describe and understand the transformations within the encryption algorithms. In most cases, characteristic 2 finite fields are used because the field addition behaves as a bitwise XOR operation. We denote by GF(2s) the finite field with 2s elements, where s ≥ 1. While there are many different choices of basis to express the finite field elements, in this thesis we only consider the conventional polynomial basis1. Field element representation. Finite field elements of GF(2s) can be Ps−1 i expressed as s-tuple binary vectors or polynomials of the form i=0 biX , where bi ∈ {0, 1}. The coefficients bs−1bs−2 . . . b0 can also be written in hexadecimal form. When the context is clear, we omit appending the classical 0x prefix to be concise. Field addition. The addition over GF(2s) can be realised by component- wise addition modulo 2 of the binary input vectors. The XOR symbol ⊕ and + are used interchangeably for the finite field addition. Since we have addition modulo 2, we can deduce that (a + b)2 = a2 + b2, where a, b ∈ GF(2s) and this can be extended to arbitrary many terms. Field multiplication. The multiplication of two finite field elements is defined by some irreducible polynomial of degree s, denoted by p(X). The operation, denoted by × or ·, is computed by reducing the product polynomial of the polynomials representing the two field elements modulo p(X). When necessary, we append the irreducible polynomial p(X) in hexadecimal form to the finite field: GF(2s)/p(X). The multiplication by any non-zero element α ∈ GF(2s) can be viewed as the multiplication matrix of order s over GF(2) denoted by Mα. One can quickly infer that multiplication matrices of non-zero elements are invertible and pairwise-commutative, since non-zero elements in GF(2s) are invertible and field multiplication is commutative. Example 2.1. Given a finite field defined by irreducible polynomial X3 + X + 1, we denote it as GF(23)/0xb, the multiplication by α = 7 seen as (1, 1, 1) ∈ [GF(2)]3 can be computed as follows:

(1, 1, 1)(b2, b1, b0) = (b2 ⊕ b0, b2 ⊕ b1, b1) ⊕ (b1, b2 ⊕ b0, b2) ⊕ (b2, b1, b0)

= (b1 ⊕ b0, b0, b2 ⊕ b1 ⊕ b0) , 3 ∼ 3 where (b2, b1, b0) is an arbitrary element of GF(2 ) = [GF(2)] . The multipli- cation can also be expressed as a matrix multiplication:       0 1 1 b2 b1 ⊕ b0       0 0 1 b1 =  b0  , 1 1 1 b0 b2 ⊕ b1 ⊕ b0 where the matrix is the multiplication matrix of 7. 4 1We have conducted detailed analysis of the choice of basis and its impact on lightweight cryptography in [106]. 2.1. MATHEMATICAL BACKGROUND 25

Linear transformation over GF(2s). A function L :[GF(2s)]m → [GF(2s)]m is a linear transformation if it satisfies the following two con- ditions:

•L (u + v) = L(u) + L(v); •L (a · u) = a ·L(u),

∀u, v ∈ [GF(2s)]m and ∀a ∈ GF(2s). A direct consequence is that L(0) = 0, where 0 is an all zeroes vector.

2.1.3 Vector subspace notation

We recall some definitions related to linear and affine subspaces.

s Linear subspace. Let F2 be a s-tuple vector space over GF(2). A subset s A ⊆ F2 is a linear subspace if:

• the zero vector is in A; •∀ u, v ∈ A, we have u + v ∈ A.

s s Affine subspace. Let A ⊆ F2 be a linear subspace and c ∈ F2 \ A. i.e. c is s in F2 but not in A. Then c + A , {c + x|x ∈ A} is an affine subspace. n Dimension of subspace. The cardinality of any subspace A ⊆ F2 is of the form 2d, where d is the dimension of the subspace A.

2.1.4 Permutation notation

For the convenience of our discussion, we use three different notations for describing permutations.

Cycle notation. Denoted by σ = (i0 i1 i2 ... ix−1), the cycle notation describes the mapping σ(ij) = i(j−1) mod x for 0 ≤ j ≤ x − 1, fixed position indexes are omitted.

Array notation. The array notation, denoted by σ = [i0, i1, . . . , im−1], de- scribes the resultant position of each element after the permutation, expressed as σ[j] = ij.

Formula notation. Denoted by σ = f(i), where f : Zm → Zm is a bijection function. The formula notation is a common way to represent a permutation, which, depending on the permutation, can be very compact or complicated.

Example 2.2. Let the formula notation of permutation σ be f(i) = 3 − i, where i ∈ Z4. It can also be expressed in a cycle form σ = (0 3)(1 2) or an array form σ = [3, 2, 1, 0]. 4 26 CHAPTER 2. PRELIMINARIES

2.1.5 Probability: application to linear cryptanalysis

In linear cryptanalysis, one needs to find linear expression with strong linear bias, but what exactly is a linear bias?

Linear mask. Suppose we have a 4-bit binary string b = b3b2b1b0, where bi ∈ F2. We can extract specific bits to form a linear Boolean expression, say b1 ⊕ b0, using a linear mask Γ = 00112. The linear expression can be expressed as Γ • b = b1 ⊕ b0, where • represents the scalar product operation. s s Given a function S : F2 → F2, we are interested to know how close a linear s expression of specific input and output bits (extracted using ΓI , ΓO ∈ F2) is behaving like a linear (or affine) function. An expression is linear if ∀x, ΓI •x+ΓO •S(x) = 0, or affine if ∀x, ΓI •x+ΓO •S(x) = 1. Since in most cases, the expression is neither linear nor affine, we denote the frequency of it being 0 as e= and it being 1 as e! = 2s − e=, and the probability that the expression equals 0 as p = e=/2s. Linear bias. The linear bias of a Boolean expression is given as ε = p − 1/2. It measures how close the expression is behaving like a linear (or affine) function. The further away ε is from 0, the stronger the bias is, and an expression behaves like a linear function if it is positive or an affine function if it is negative. On the other hand, an expression is unbiased if ε = 0. In other words, there is an equal number of occurrences for which the expression equals 0 and 1. i.e. e= = e!. In [85], Matsui described the effectiveness of an expression as

1 p − , 2 which is simply the absolute value of ε, i.e. |ε|. Since the sign only determines if it is linear or affine, what matters the most is the magnitude of the linear bias; the higher it is, the more effective the linear cryptanalysis will be.

2.2 Components of SPN Cipher

In an n-bit SPN cipher, its internal state IS can typically be divided into n/s s-bit cells, sometimes we also use nibble (resp. byte) to describe a cell if s = 4 (resp. s = 8). In a round function, the S-layer usually consists of s-bit S-boxes updating all the cells in parallel. And the P-layer applies linear transformations on every m cells (ms-bit words). As mentioned in Section 1.2.2, confusion and diffusion are two important design principles for a secure cipher. For the confusion property, having a strong linear relation between the key and ciphertext is not sufficient as solving linear equations is a well-studied problem in mathematics. The S-layer is a crucial component of an SPN cipher for introducing non-linearity to the system. For the diffusion property, it is often achieved by the P-layer and to some extent through the S-layer as well. 2.2. COMPONENTS OF SPN CIPHER 27

2.2.1 Substitution layer

Since we require the S-layer to be invertible, in this thesis we focus on s s invertible S-boxes, we denote an s-bit S-box as S : F2 → F2. While most of the primitives adopt 4-bit or 8-bit S-boxes, there are also other instances such as PRINTcipher [70] and KECCAK [21] which uses 3-bit and 5-bit S-boxes respectively.

Properties of S-boxes

We recall some of the S-box properties that are commonly considered. We use the 4-bit S-box in PRESENT, denoted as SP , as an example for illustration purposes. The truth table of SP is presented in Table 2.1.

Table 2.1: Specifications of PRESENT S-box

x 0123456789abcdef

SP (x) c56b90ad3ef84712

s Definition 2.3 (Differential). Let ∆I , ∆O ∈ F2 be the input and output differences of S; we define the differential probability from ∆I to ∆O under S as ]{x ∈ s|S(x) ⊕ S(x ⊕ ∆ ) = ∆ } D (∆ , ∆ ) = F2 I O , S I O 2s and the maximal differential probability (m.d.p.) as

Dmax(S) = max DS (∆I , ∆O) . ∆I 6=0,∆O6=0

This property is the main line of defence against differential cryptanalysis, the lower the differential probability is, the more resources (plaintext pairs) an adversary would need to query to mount an attack. Note that the differential uniformity of S-box S can easily be computed from the m.d.p., namely s 2 · Dmax(S). The differential transition of an S-box is often represented in a differential distribution table (DDT) where each (∆I , ∆O)-entry shows the number of s occurrences for which the differential transition holds, i.e. ]{x ∈ F2|S(x) ⊕ S(x ⊕ ∆I ) = ∆O}.

Example 2.4. The DDT of the PRESENT S-box is presented in Table 2.2, where an empty cell represents a transition that is not possible, i.e. ]{x ∈ 4 F2|SP (x) ⊕ SP (x ⊕ ∆I ) = ∆O} = 0. From Table 2.2, one can see that the largest value (excluding the (0, 0)- −2 entry) is 4, which means Dmax(SP ) = 2 . 4 28 CHAPTER 2. PRELIMINARIES

Table 2.2: DDT of PRESENT S-box

∆O 0123456789abcdef 0 16 1 4 4 4 4 2 2 4 2 2 2 2 2 3 2 2 2 4 2 2 2 4 4 2 2 2 2 2 2 5 2 2 2 2 2 4 2 6 2 2 2 4 2 4 ∆I 7 4 2 2 2 2 4 8 2 2 2 4 2 4 9 2 4 2 2 2 4 a 2 2 4 2 2 2 2 b 2 2 4 2 2 2 2 c 2 4 2 2 2 2 2 d 2 4 2 2 2 2 2 e 2 2 2 2 2 2 2 2 f 4 4 4 4

s Definition 2.5 (Branch number of an S-box). Let ∆I , ∆O ∈ F2 be the input and output differences of S, we define the branch number of an S-box as

BS = min wt(∆I ) + wt(∆O) , ∆I 6=0,DS (∆I ,∆O)>0 where wt(·) is the Hamming weight.

Since we are considering invertible S-boxes, it is trivial to see that for any non-zero input difference, any possible output difference must be non-zero too. Thus the lower bound of the branch number of an S-box is 2, which is tight if there exists some differential transition from an input difference with Hamming weight 1 to some output difference with Hamming weight 1.

Example 2.6. From Table 2.2, one can see that there is no Hamming weight 1 to Hamming weight 1 differential transition. The minimal sum of the Hamming weight of any non-zero input-output difference pair through the

PRESENT S-box is at least 3. Therefore, BSP = 3. 4

s Definition 2.7 (Linear bias). Let ΓI , ΓO ∈ F2 be the input and output linear masks; we define the absolute linear bias approximation of S masked with ΓI and ΓO as

]{x ∈ s|Γ • x + Γ •S(x) = 0} 1 F2 I O LS (ΓI , ΓO) = − , 2s 2 and the maximal absolute linear bias approximation (m.l.a.) as

Lmax(S) = max LS (ΓI , ΓO) . ΓI 6=0,ΓO6=0 This property is the counterpart of the differential case, the lower the absolute linear bias is, the more resources (plaintext-ciphertext pairs) an 2.2. COMPONENTS OF SPN CIPHER 29 adversary would need to mount a linear cryptanalysis. Note that the linearity s+1 of an S-box S can be computed from the m.l.a., namely 2 · Lmax(S), while the correlation and potential [92] of the linear approximation of S are 2 2 · Lmax(S) and 4 · Lmax(S) respectively. The linear transition of an S-box is often represented in a linear approx- imation table (LAT) where each (ΓI , ΓO)-entry shows the frequency with which the equation holds minus half the cardinality of the domain of S. i.e. e= − 2s−1. Example 2.8. The LAT of the PRESENT S-box is presented in Table 2.3, where an empty cell represents a linear expression that is unbiased, i.e. e= = e!. Table 2.3: LAT of PRESENT S-box

ΓO 0123456789abcdef 0 8 1 -4 -4 -4 4 2 2 2 -2 -2 2 -2 4 4 -2 2 3 2 2 2 -2 -4 -2 2 -4 -2 -2 4 -2 2 -2 -2 4 -2 -2 -4 -2 2 5 -2 2 -2 2 2 2 -4 4 2 2 6 -4 -4 -4 4 ΓI 7 4 4 -4 4 8 2 -2 -2 2 -2 2 -2 2 4 4 9 4 -2 -2 2 -2 -2 -2 -4 -2 2 a 4 2 2 2 -2 -4 2 2 -2 2 b -4 -2 -2 2 -2 -4 2 2 2 -2 c -2 -2 -2 -2 4 -4 -2 2 2 -2 d 4 4 -2 -2 2 2 2 -2 2 -2 e 2 2 -4 4 -2 -2 -2 -2 -2 -2 f 4 -2 2 -2 -2 -2 2 4 2 2

From Table 2.3, one can see that the largest absolute value (excluding −2 the (0, 0)-entry) is 4, which means Lmax(SP ) = 2 . 4 Definition 2.9 (Algebraic degree). An s-bit S-box can be perceived as an s-tuple of Boolean functions with s input bits. The degree of a Boolean function is the degree of the largest monomial in its algebraic normal form (ANF), and the algebraic degree of an S-box is the maximal degree of its components. Example 2.10. Expressing the input to PRESENT S-box as a 4-tuple binary vector (x3, x2, x1, x0) and similarly for the output (y3, y2, y1, y0), where x0 and y0 are the least significant bits (LSB), we can express the output bits as Boolean functions of the input bits.

y0 = x0 + x2 + x3 + x1x2

y1 = x1 + x3 + x1x3 + x2x3 + x0x1x2 + x0x2x3 + x0x1x3

y2 = 1 + x2 + x3 + x0x1 + x0x3 + x1x3 + x0x1x3 + x0x2x3

y3 = 1 + x0 + x1 + x3 + x1x2 + x0x1x2 + x0x1x3 + x0x2x3 30 CHAPTER 2. PRELIMINARIES

We can easily see that the largest monomial has degree 3 and thus SP has algebraic degree 3. 4

Classification of S-boxes

The properties of an S-box are essential for understanding its resistance against various cryptanalysis. Thus, we are interested to see how these properties are preserved under certain transformations.

Definition 2.11 (S-box affine equivalence [26]). Let Mi and Mo be two s 0 0 invertible binary matrices and ci, co ∈ F2. The S-box S defined by S (x) = MoS(Mi(x ⊕ ci)) ⊕ co belongs to the affine equivalence (AE) set of S.

It is shown by Nyberg [91] that both Dmax and Lmax are preserved under the AE class. We omitted the proof from this thesis but intuitively, both adding constants and linear transformations does not change the differential probability. Since the former has no impact on the difference and the latter changes the difference value to another with probability 1, the impact is essentially rearrangement of the rows and columns of the DDT without changing the values. Similarly for the linear case.

Definition 2.12 (S-box permutation-xor equivalence [104]). Let Pi and s 0 Po be two permutation matrices and ci, co ∈ F2. The S-box S defined by 0 S (x) = PoS(Pi(x ⊕ ci)) ⊕ co belongs to the permutation-xor equivalence (PE) set of S.

Besides preserving Dmax and Lmax (since permutation matrices are a subclass of invertible binary matrices), more properties such as the branch number and circuit complexity (from an implementation viewpoint) of the S-box are preserved under the PE class. The reason being that adding constants affects neither the branch number nor the algebraic degree, and the permutation is simply a renaming of the input and output bits. To define an invertible s-bit S-box, we need to define a bijective mapping between the 2s possible input and output values; this yields a total of 2s! distinct invertible s-bit S-boxes. While it is infeasible to exhaust or analyse all of them when s is large, it is still feasible for 4-bit S-boxes. 4-bit S-boxes. In 2007, Leander et al. [77] have proposed 16 AE classes of S-boxes that are optimal against differential and linear attacks, so-called −2 −2 optimal S-boxes. These S-boxes have Dmax = 2 and Lmax = 2 . There are approximately 244.25 distinct 4-bit S-boxes, which is feasible to fully analysis all of them, De Cannière [41] classified all the 4-bit S-boxes into 302 AE classes. In 2011, Saarinen [104] refined the classification to PE classes (about 227 PE classes) and introduced the class of golden S-boxes which are not only optimal against differential and linear attacks, but also have optimal algebraic degree and other properties that may not be preserved under the AE classes. In recent years, many lightweight primitive proposals adopted 2.2. COMPONENTS OF SPN CIPHER 31

4-bit S-boxes as their building block since 4-bit S-boxes are well-studied and can have very low implementation cost. 8-bit S-boxes. Analysing 8-bit S-boxes is much harder than 4-bit S-boxes because the search space is much larger (there are about 21684 invertible 8-bit S-boxes), and there are still many open questions. For instance, the optimal (lowest possible) m.d.p and m.l.a of 8-bit S-boxes is still not known. Till date, one of the best 8-bit S-boxes is the S-box used in AES (commonly known as −6 −4 AES S-box), with Dmax = 2 and Lmax = 2 . Although larger size S-box provides stronger resistance against differential and linear cryptanalysis, the implementation cost tended to be much higher. For instance, the optimal implementation of the AES S-box by Canright costs 180 GE [34], while an optimal 4-bit S-box, say the Piccolo S-box, can cost as low as 12 GE [112]. Not to mention that this GE difference will be widened when multiple S- boxes are implemented in parallel. In 2015, Canteaut et al. [35] proposed a construction for 8-bit S-boxes using smaller 4-bit S-boxes; this yields an −5 −3 8-bit S-box with Dmax = 2 and Lmax = 2 that costs about 45% of the AES S-box.

2.2.2 Permutation layer

We classify P-layers into two categories: for a P-layer that involves linear operations such as XOR, we classify it as a diffusion layer and the matrix representation for such a layer is called the diffusion matrix. On the other hand, for a P-layer that does shuffling and rearrangement of bits, we denote it as a bit-permutation layer.

Diffusion layer The role of a diffusion matrix of order m is to mix the m components of a vector to create diffusion; its diffusion power is often quantified by the branch number of the matrix. Definition 2.13 (Branch number of a matrix [39]). The branch number of an m × m matrix M over GF(2s) is the minimum number of non-zero entries in the input vector u and output vector v = M · u as we range over all non-zero u ∈ [GF(2s)]m. Although the diffusion layer is linear and does not lower the differential probability (resp. linear bias) of a cipher, its primary task to is spread the internal dependencies. We denote an s-bit word (or cell) in the internal state that has non-zero difference (resp. some bits involved in a linear expression) as active. Example 2.14. Consider the toy cipher as shown in Figure 1.5. Without the P-layer, a single active cell in the internal state will only invoke one S-box in each round. On the other hand, if the matrix representation of the diffusion layer has branch number 4, a single active cell subjected to 32 CHAPTER 2. PRELIMINARIES the P-layer will spread and generate at least three active cells, invoking at least three S-boxes in the next round. This yields a much lower differential probability or absolute linear bias of the entire cipher. 4

Definition 2.15 (Maximum distance separable). A maximum distance sepa- rable (MDS) matrix of order m is a matrix that attains the maximum branch number m + 1.

The term “maximum distance separable” originates from the field of coding theory, where “maximum distance separable codes” are linear codes that attain equality in the Singleton bound [84]. This MDS property offers perfect diffusion, meaning changing x many s-bit words of the inputs changes at least m − x + 1 of the outputs [124]. A matrix is almost-MDS if it has close-to-maximum branch number m.

Example 2.16. A typical example of MDS matrix is the matrix representa- tion of the MixColumns operation in AES (commonly known as AES matrix) can be written as 02 03 01 01   01 02 03 01   , 01 01 02 03 03 01 01 02 where {01, 02, 03} ∈ GF(28)/0x11b. The AES matrix has a maximum branch number of 5. 4

Proposition 2.17 (MacWilliams et al. [84]). A matrix is MDS if and only if its square submatrices are all non-singular.

The following two corollaries are directly deduced from Proposition 2.17 when we consider submatrices of order 1 and 2 respectively.

Corollary 2.18. All entries of an MDS matrix are non-zero.

Corollary 2.19. Given an m × m matrix M, if there exist pairwise distinct i1, i2, j1, j2 ∈ {0, 1, ..., m − 1} such that M[i1, j1] = M[i1, j2] = M[i2, j1] = M[i2, j2], then M is not an MDS matrix.

Similar to the S-boxes, we are interested to see how the properties of a diffusion matrix are preserved under certain transformations.

Proposition 2.20. For any permutation matrices P and , the branch numbers of M and PMQ are equal.

Proof. Since P (resp. Q) is a permutation matrix, it is like rearranging the rows (resp. columns) of M, which has no impact on the values of the output (resp. input) vector. Hence the minimum number of non-zero components in the input vector and corresponding output vector remains the same and PMQ has the same branch number as M.  2.2. COMPONENTS OF SPN CIPHER 33

Also, since multiplying a non-zero scalar does not change the number of non-zero components in a vector, the branch number of a matrix is preserved when the matrix is multiplied with some non-zero scalar factor. Definition 2.21 (Matrix permutation equivalence). Two matrices M and 0 0 M are called permutation equivalent, denoted by M ∼P M , if there exist two permutation matrices P and Q such that M0 = PMQ. From Proposition 2.20, we know that permutation equivalent matrices have the same branch number. We show in the following (Lemma 2.22) that ∼P is a well-defined equivalence relation.

Lemma 2.22. The permutation equivalence relation ∼P is an equivalence relation. Proof. We prove that the relation is reflexive, symmetric and transitive, hence it is an equivalence relation. Reflexive: For any matrix A, let P and Q be identity matrices, we have A ∼P A. Symmetric: For any two matrices A and B such that A ∼P B, by Def- inition 2.21, there exist two permutation matrices P and Q such that B = PAQ. Since the inverse of a permutation matrix is its transpose, > > we have A = P BQ , thus B ∼P A. Transitive: For any three matrices A, B and C such that A ∼P B and 0 0 B ∼P C. By Definition 2.21, we have B = PAQ and C = P BQ . Hence, we have C = P0PAQQ0, where P0P and QQ0 are also permutation matrices. Therefore, A ∼P C.  Besides the row and column manipulation of a diffusion matrix, we are also interested in reordering the entries of the matrix. Definition 2.23 (Index permutation). An index permutation σ acting on an ordered set {a0, a1, ..., am−1} is a permutation that permutes the index of the elements. In most cases, the construction of the diffusion matrix follows a certain “rule” to construct from its first row. For instance, each row of the AES matrix is a right rotation of the previous row, and its first row is defined as (02, 03, 01, 01). Thus changing the arrangement of its entries in the first row changes the entire matrix. Definition 2.24. Given a matrix M that is defined by its first row under some rule, we denote by Mσ the matrix generated under the same rule with the first row of M subjected to an index permutation σ. Example 2.25. For an index permutation σ(i) = 3 − i, it transforms the first row of the AES matrix M from (a0, a1, a2, a3) = (02, 03, 01, 01) to (a3, a2, a1, a0) = (01, 01, 03, 02). Thus, we have 01 01 03 02   σ 02 01 01 03 M =   . 03 02 01 01 01 03 02 01 34 CHAPTER 2. PRELIMINARIES

4

Bit-permutation layer

Another design strategy is to use a bit-permutation as P-layer, and let the S-layer do the diffusion work. In hardware implementation, a bit-permutation simply consists in some rewiring of the circuit which does not incur any implementation cost. This is a key feature of the lightweight block cipher PRESENT. To compensate for its weak diffusion, the bit-permutation of PRESENT is designed to spread the internal dependencies as much as possible and the PRESENT S-box is chosen for its strong security properties, such as having optimal m.d.p., m.l.a. and branch number 3. Although the S-layer comes at a higher cost, the hardware implementation cost of the cipher as a whole will likely be much lower because of the absence of the diffusion layer.

2.3 Cryptanalysis Techniques

In this section, we recall some of the commonly used cryptanalysis techniques that have to be checked when proposing new ciphers; our analysis demon- strates that the new proposal is resistant against these attacks to a certain extent.

2.3.1 Differential cryptanalysis

One of the two most widely used cryptanalysis on block ciphers, differ- ential cryptanalysis (DC) was introduced by Biham et al. [24]. It is a chosen-plaintext attack that exploits the high probability of certain plain- text difference propagating through multiple round functions to a specific ciphertext difference, we call it the differential. The first part of the technique is finding some differential with high 0 probability. Let ∆0 = IS0 ⊕ IS0 be the difference between the two plaintexts 0 that are initialised as the internal states IS0,IS0. Our objective is to analyse how the difference propagates through r round functions. At round i, the difference ∆i−1 propagates to ∆i with probability pi when taken over all plaintexts and keys. This uncertainty comes from the propagation of non-zero difference through the S-boxes (so-called active S-boxes) which can be computed from the DDT of the S-box. At the end, we obtain a differential trail through the entire cipher,

p1 p2 p3 pr−1 pr ∆0 −−→ ∆1 −−→ ∆2 −−→ ... −−−−→ ∆r−1 −−→ ∆r , we call this the differential characteristic.

Lemma 2.26. Assuming that each differential transition ∆i−1 → ∆i occurs with probability pi and these differential transitions form a Markov chain. 2.3. CRYPTANALYSIS TECHNIQUES 35

Then the probability of the entire differential characteristic ∆0 → ∆1 → · · · → ∆r−1 → ∆r occurring is given as r Y Pr(∆0 → ∆1 → · · · → ∆r−1 → ∆r) = pi . i=1 Notice that Lemma 2.26 assumes independence between rounds, which is not the case in reality as the subkeys are related. Thus, the probability of a differential characteristic is only an estimation. However, in practice, this is proven to be reasonably accurate for many ciphers. Besides the target differential characteristic, there could be other dif- ferential characteristics with the same initial difference ∆0 leading to the same final difference ∆r. These contribute to the overall probability of the differential (∆0, ∆r).

Lemma 2.27. The differential probability of (∆0, ∆r) is the sum of the probabilities of all the differential characteristics with the same initial and final differences. X pD(∆0,∆r) = Pr(∆0 → · · · → ∆r) . Once a differential characteristic with high probability has been identified, the corresponding differential can be used to mount a distinguishing attack and even key-recovery attack. In a distinguishing attack, , one queries for the encryptions of ND random plaintext pairs with the same input difference ∆0. Tally the number of ciphertext differences equal to ∆r and check if it is close to the differential probability. If yes, with high chance the oracle is the target cipher, otherwise it is a random permutation. For the key-recovery attack, a (r − 1)-round differential (∆0, ∆r−1) is used. After obtaining the ciphertext pairs, we guess the last round subkey and work back to the previous internal 0 states ISr−1 and ISr−1 and check if they have the target difference ∆r−1. The correct subkey candidate should result in significantly more matching than the wrong candidates. The amount of plaintext pairs needed to launch a differential attack is estimated to be −1 ND = c · pD , where c is a small constant. For an n-bit block cipher, there is a maximum of 2n−1 possible plaintext pairs. Therefore, to mount a DC on an n-bit block cipher, there must be some differential probability significantly higher than 21−n. Conversely, showing that any differential of a reduced-round block cipher has probability lower than 21−n is an early indication that the full-round cipher is not susceptible to DC. A common way to analyse an SPN block cipher’s resistance against DC is to find a lower bound for the number of active S-boxes, NS, for any r- round differential characteristic. This means the probability of any r-round NS differential characteristic is upper bounded by (Dmax) , where Dmax is the m.d.p. of the S-box in the cipher. Thus, if a substantially reduced-round NS 1−n version has (Dmax) < 2 , the cipher is believed to be immune to DC. 36 CHAPTER 2. PRELIMINARIES

2.3.2 Linear cryptanalysis

Developed by Matsui in the 1990s [85], linear cryptanalysis (LC) is the coun- terpart of DC and is widely used against block ciphers. It is a known-plaintext attack that exploits the linear bias within the non-linear transformation of ciphers. There are several similarities between DC and LC. First we find a linear approximation with high absolute linear bias, which is done by analysing the propagation of the linear expression through the round functions. Later, we use it to launch a distinguishing attack and even key-recovery attack.

Let Γ0 be the linear mask for the plaintext IS0. At round i, the linear expression Vi :Γi−1 • ISi−1 + Γi • ISi has some linear bias εi = pi − 1/2. Similar to DC, this uncertainty comes from the linear propagation through the active S-boxes, where the bias can be computed from the LAT of the S-box. At the end, we obtain a series of linear expressions through r-round of a block cipher,

V1 :Γ0 • IS0 + Γ1 • IS1

V2 :Γ1 • IS1 + Γ2 • IS2 . .

Vr :Γr−1 • ISr−1 + Γr • ISr , we call this the linear characteristic. Combining these expressions together cancels the intermediate internal state bits and yields an expression in terms Pr of the plaintext and ciphertext bits i=1 Vi = Γ0 • IS0 + Γr • ISr. The probability of this expression can be estimated using Matsui’s famous piling- up lemma.

r Lemma 2.28 (Piling-up lemma [85]). Let {Vi}i=1 be independent random variables whose values are 0 with probability pi or 1 with probability 1 − pi. Pr Then the probability that i=1 Vi = 0 is

r   1 r−1 Y 1 + 2 pi − . 2 i=1 2

From Lemma 2.28, we see that the linear bias of an r-round linear characteristic is r r−1 Y εt = 2 εi . i=1 Similar to Lemma 2.26, this lemma assumes that the rounds are inde- pendent, but again from past experimental results this estimation seems reasonably accurate. As with DC, the linear hull effect is the combined effect of all the linear characteristics with the same input and output linear masks, the difference 2.3. CRYPTANALYSIS TECHNIQUES 37 is that the effects can cancel out in LC. According to [85], the number of plaintext-ciphertext pairs needed to launch a linear attack is estimated to be

−2 NL = c · εt , where c is a small constant. For an n-bit block cipher, there is a maximum of 2n possible plaintext- ciphertext pairs. Conventionally, showing that any input and output linear masks of a reduced-round block cipher has absolute linear bias significantly lower than 2−n/2 is an early indication that the full-round cipher is not susceptible to LC. For SPN block ciphers, one can find the minimum number of active S-boxes in any r-round linear characteristic and estimate the absolute linear approximation using Lemma 2.28 and the m.l.a. of the S-box. If a substan- −n/2 tially reduced-round version has εt < 2 , the cipher is believed to be immune to LC.

2.3.3 Other cryptanalysis techniques

Besides the classical differential and linear cryptanalysis, there are many other cryptanalysis techniques such as algebraic [109], boomerang [127], impossible differential [23,69], integral [38,71] and improved integral (using division property) [120], invariant subspace [75], meet-in-the-middle [29,36], non-linear invariant [121], [27], etc. In this thesis, we focus on three other cryptanalysis techniques besides DC and LC, namely the impossible differential, invariant subspace, and algebraic attack.

Impossible differential cryptanalysis Impossible differential attack is a variant of DC, introduced independently by Biham et al. [23] and Knudsen [69]. In contrast with the classical DC that search for differential characteristic with high differential probability, impossible differential exploits differentials that are impossible (occur with probability 0). Starting with some input difference ∆0, we analyse how this difference propagates through f rounds in the forward direction to difference ∆f , where the cells in the internal state can be active (has non-zero difference), inactive (zero difference) or unknown (either active or inactive). On the other hand, we choose some output difference ∆f+b and propagate it in the backward 0 direction through b rounds to difference ∆f . If there exists some cell that 0 does not match between ∆f and ∆f , i.e. active on one side but inactive on the other, then this transition is impossible and we found ourselves an (f + b)-round impossible differential characteristic, denoted as

f 0 b ∆0 −−→ ∆f →← ∆f ←− ∆f+b ,

f+b or ∆0 9 ∆f+b for short. 38 CHAPTER 2. PRELIMINARIES

By adding rounds before and/or after the impossible differential charac- teristic, one can collect pairs with certain plaintext and ciphertext differences. If there exists a pair that meets the input and output differential values of the impossible differential under some subkey, then this subkey candidate must be wrong. In this way, we filter as many wrong keys as possible and exhaustively search for the rest of the keys.

Invariant subspace attack As a method of cryptanalysis, the invariant subspace attack was introduced by Leander et al. at CRYPTO 2011 [75]. In this method, the adversary aims to find so-called invariant subspaces, i.e. subsets of the set of all possible state and key values, invariant of the round transformations used in the analysed cipher. When such a subset exists, then the adversary encrypts plaintexts that belong to the subset, assumes the secret key belongs as well to the subset (thus it is in essence a weak-key attack) and expects to obtain corresponding ciphertexts that also belong to the subset. This immediately yields a distinguisher for the cipher, while more advanced approaches can be used for key-recovery. The invariant subspaces for an n-bit iterated cipher with a round function

FKi (IS) = RF (IS ⊕ Ki), where IS is the internal state, Ki is the subkey of round i, can be formally introduced as follows. Assume there exists two n n constants u, v ∈ F2 and a linear subspace A ⊆ F2 such that RF (u ⊕ A) = 2 v ⊕ A . If Ki ∈ u ⊕ v ⊕ A, then it follows:

FKi (v ⊕ A) = RF ((v ⊕ A) ⊕ (u ⊕ v ⊕ A)) = RF (u ⊕ A) = v ⊕ A. Therefore, if the plaintext P ∈ v ⊕A, it follows that the ciphertext C ∈ v ⊕A regardless of the number of rounds.

Algebraic attack The main idea of algebraic attack is to express an encryption algorithm as a system of algebraic equations of ciphertext bits in terms of the plaintext and key bits. An adversary then substitutes the known variables (plain- text/ciphertext bits) and attempts to solve this system to recover the key bits. While algebraic attack has been successfully applied to some public-key primitives and stream ciphers, it has mostly been less effective against block ciphers [125]. Gaussian elimination method is the most well-known and standard way to solve a system of linear equations (equations monomial variables of degree 1). However, for block ciphers, these equations can be highly non-linear due to the non-linear component of a cipher. In the case of SPN ciphers, the non-linearity is attributed to the algebraic degree of the underlying S-boxes. Although there are techniques such as Gröbner bases algorithms [32] and

2Note that the function RF is applied on a set of values. It can also be interpreted as ∀a ∈ A, ∃a0 ∈ A s.t. RF (u ⊕ a) = v ⊕ a0. 2.3. CRYPTANALYSIS TECHNIQUES 39 the cube attack [44] that tackle systems of non-linear equations, an iterative block cipher remains secure against algebraic attack if there are sufficient iterations to generate equations with a very high degree. 40 CHAPTER 2. PRELIMINARIES Cryptanalysis of Symmetric-key Cryptography

41

Chapter 3

Practical Differential Attack on JAMBU

JAMBU is a nonce-based authenticated encryption operating mode proposed by Wu et al. [130], that can be instantiated with any block cipher. Yet, the submission AES-JAMBU to the CAESAR competition uses AES-128 [39] as internal block cipher. The main advantage of JAMBU mode is its low memory requirement, which places it in the group of lightweight authenticated encryption modes. Indeed, when instantiated with a 2n-bit block cipher and without counting the memory needed to store the secret key, JAMBU will only require to maintain a 3n-bit internal state, where classical authenticated encryption modes such as OCB [99, 101] would require a 6n-bit internal state or even more. In terms of speed performance, AES-JAMBU is reasonably fast, being about twice slower than AES-CBC [103] (but much slower than OCB since the calls to the internal cipher cannot be parallelised). Contribution of this chapter. In this chapter, we first describe a very practical attack on JAMBU that breaks its confidentiality claim in the nonce- misuse scenario. More precisely, with only 2n/2 encryption queries and computing time (which amounts to 232 for AES-JAMBU), we are able to predict the value of a ciphertext block corresponding to a chosen plaintext whose prefix has never been queried to the encryption oracle before, which invalidates the designers’ 2n-bit security claim. We first explain our nonce-misuse scenario attack in Section 3.2. Our attack works by trying to force a zero-difference on the input of one of the internal block cipher calls of JAMBU. Normally, forcing such a collision on a 2n-bit value should require 2n computations, but thanks to a divide-and- conquer technique, we are able to divide this event into two subparts, for a total cost of 2n/2 computations. Having a collision on one of the internal block cipher calls will render this particular JAMBU round totally linear with regards to differences, and will eventually allow us to predict a ciphertext block for the next round. Then, because of the rather small tag size of JAMBU, we are able to extend our technique to the more interesting case of a nonce-respecting scenario (Section 3.3). More precisely, with 23n/2 computations (which amounts to

43 44 CHAPTER 3. PRACTICAL DIFFERENTIAL ATTACK ON JAMBU

296 for AES-JAMBU), one can break JAMBU’s confidentiality in the adaptive chosen-ciphertext model, with message prefixes not previously queried. In order to confirm our claims, we have implemented the nonce-misuse attack on AES-JAMBU as detailed in Section 3.4. In 2016, Wu and Huang [131] proposed using SIMON as the building block for their mode to make it more suitable for constrained devices. We remark that our techniques will work independently of the cipher instantiating the JAMBU mode, yet we will focus on AES-JAMBU for ease of description.

3.1 The JAMBU Authenticated Encryption Scheme

3.1.1 Description of JAMBU

JAMBU uses a k-bit secret key K and an n-bit public nonce value IV to authenticate a variable length associated data string AD and to encrypt and authenticate a variable length plaintext P . It produces a ciphertext C, which has the same bit length as the plaintext, and an n-bit tag T . The encryption of JAMBU can be seen as a variant of the OFB mode [130]. The encryption process of JAMBU consists of 5 phases as described below: padding, initialisation, processing of the associated data, processing of the plaintext, and finalisation/tag generation. The computation structure is illustrated in Figure 3.1, 3.2 and 3.3, where each line represents an n-bit value. We will represent the 3n-bit internal state of JAMBU by the variables (Si,Ri) with Si = (Ui,Vi), where Ri, Ui and Vi are n-bit values. We will denote by EK the 2n-bit internal cipher using the secret key K. Padding. First, the associated data AD is padded with 10∗ padding: a ‘1’ bit is appended to the data, followed by the least number of ‘0’ bits (possibly none) to make the length of the padded associated data become a multiple of n bits. Then, the same padding method is applied to the plaintext. Initialisation. As depicted in Figure 3.1, JAMBU uses an n-bit public nonce n 2n−3 value IV to initialise the internal state: S0 = EK (0 kIV ) ⊕ (0 k101), where k denotes concatenation, and R0 = U0. Processing of the associated data. The padded associated data is divided into n-bit blocks, and then processed block by block as described in Figure 3.1. Note that a single padded block 1||0n−1 will be processed in the case of an empty associated data string. We omit the details of this phase since it is irrelevant to our attack. Moreover, in the rest of the chapter we will only 2n−2 use empty AD strings, so we get S1 = (U1,V1) = EK (S0) ⊕ (1||0 ||1) and R1 = R0 ⊕ U1. Processing of the plaintext. We denote by p the number of plaintext blocks after padding and P = (P1,P2,...,Pp). The plaintext is processed block by block as depicted in Figure 3.2. At round i, the internal state is updated with the plaintext block Pi by Si+1 = (Ui+1,Vi+1) = EK (Si)⊕(Pi||Ri) 3.1. THE JAMBU AUTHENTICATED ENCRYPTION SCHEME 45

Figure 3.1: Initialisation and processing of the associated data

R R R i  i+1  i+2 Pi Pi+1

Ui Xi  Ui+1 Xi+1  S Si EK i+1 EK Si+2 Vi Yi  Vi+1 Yi+1  Pi  Pi+1 

Ci Ci+1

Figure 3.2: Processing of the plaintext

and Ri+1 = Ri ⊕ Ui+1. The ciphertext block Ci is then computed with Ci = Pi ⊕ Vi+1. Finalisation and tag generation. When all the plaintext blocks are processed, the final state is (Sp+1,Rp+1). The authentication tag T is generated with two internal block cipher calls, as depicted in Figure 3.3.

R R p+1  p+2

Up+1 Xp+1 Up+2 Xp+2  S S p+1 EK p+2 EK Y Vp+1 p+1  Vp+2 Yp+2 

3 T

Figure 3.3: Finalisation and tag generation

3.1.2 Security claims

The security claims of JAMBU are given in the CAESAR competition submis- sion document [130]. When instantiated with a 2n-bit block cipher, JAMBU processes n-bit plaintext blocks and eventually outputs an n-bit tag T . When the nonce is not reused, JAMBU is claimed to provide 2n-bit security for confidentiality and n-bit security for authentication. We note that the type of confidentiality security (i.e. IND-CPA, IND-CCA or IND-CCA2) is not mentioned by the designers. When the nonce is misused, JAMBU is claimed to remain reasonably strong. Namely, in that scenario, the confidentiality of JAMBU is supposed to be only partially compromised as the authors claim that “it only leaks the information of the first block or the common prefix 46 CHAPTER 3. PRACTICAL DIFFERENTIAL ATTACK ON JAMBU of the message”. Regarding authentication in the nonce-misuse scenario, the authors remain vague, only mentioning that “the integrity of JAMBU will be less secure but not completely compromised”. We summarise in Table 3.1 the security claims of the CAESAR competi- tion candidate AES-JAMBU where n = 64. We remark that as with many authenticated encryption schemes, if verification fails during decryption the new tag and the decrypted plaintext should not be given as output. Moreover it is also important to note that the total amount of message material (plain- text and associated data) that can be protected by a single key is limited to 264 bits for AES-JAMBU.

Table 3.1: Security claims for AES-JAMBU

Confidentiality Integrity (bits) (bits) nonce-respecting 128 64 128 (except first not nonce-misuse block or common prefix) specified

3.2 Attack on JAMBU in Nonce-Misuse Scenario

In this section, we analyse JAMBU in the nonce-misuse attack model, where a nonce can be used to encrypt multiple plaintexts. In such a model, JAMBU is an online authenticated encryption scheme, namely the i-th ciphertext block is produced before the (i + 1)-th plaintext block is read. An inherent property of online authenticated encryption is that common prefix plaintext blocks always produce the same corresponding ciphertext blocks. According to the security claims of JAMBU [130], the only compromised confidentiality security from the nonce-respecting model to the nonce-misuse model is this additional inherent property as becoming an online authenticated encryption in the latter model. However, we present here a practical attack to distinguish JAMBU from a random online authentication encryption, which invalidates the designers’ confidentiality security claims of JAMBU in the nonce-misuse model.

Confidentiality of online authenticated encryption

For an online encryption scheme (EK , DK ) with a key space K, its confiden- tiality security is usually defined via upper bounding the advantage of all chosen-plaintext distinguishers1. We give a brief description as follows, and refer interested readers to [3, 45] for the full formal definitions. Let OAE

1It has been proven that an authenticated encryption satisfying both IND-CPA and INT-CTXT security notions is also IND-CCA secure [18, 19, 64]. 3.2. ATTACK ON JAMBU IN NONCE-MISUSE SCENARIO 47 denote the set of all online authenticated encryption algorithms that have $ the same block and tag size with (EK , DK ). Let (OEnc, ODec) ← OAE denote an algorithm randomly selected from OAE. Let D be a distinguisher that interacts with EK or Enc, and outputs one bit. Its advantage is defined as: h i h i cpa $ EK $ OEnc AdvE (D) := Pr K ← K, D ⇒ 1 −Pr (OEnc, ODec) ← OAE, D ⇒ 1 .

cpa cpa Then we define AdvE (t, q, σ, `) := maxD AdvE (D), where the maximum is taken over all distinguishers that run in time t and make q queries, each of length at most ` blocks and of total length at most σ blocks.

3.2.1 Attack overview

Our attack is based on an observation that we explain below. JAMBU maintains a 3n-bit internal state, but uses only one invocation to a 2n-bit block cipher EK to update it per plaintext block. Thus, there are always n state bits per round which are not updated through the strong primitive (i.e. the underlying block cipher). More precisely, every round Si = (Ui,Vi) is input to the block cipher: (Xi,Yi) = EK (Si). While Ri is linearly updated with Ui+1 as Ri+1 = Ri ⊕ Ui+1. Furthermore, if a pair of plaintexts satisfying ∆Si+1 = 0 is found, then part of the state differences in two consecutive rounds are linearly related, i.e. ∆Vi+2 = ∆Ri, which will be exploited by our cryptanalysis. Overall, our attack can be divided into three parts. First the adversary will try to build a special difference structure in the internal state by querying the encryption of a fixed message2 with several different nonces. Then, using this special differential structure, he will try to recover the values of these internal differences and at the same time force a zero-difference on the input of one of the internal cipher calls. Finally, based on this differential structure that is now fully known and controlled, he will be able to distinguish JAMBU from a random online authenticated encryption, and further forge some ciphertext blocks for a message that has never been queried before.

3.2.2 Distinguishing attack

First step

For the first step, the adversary picks a random n-bit message block P1, and asks for the encryption of this message for 2n/2 distinct nonce values. Since the corresponding ciphertext blocks are also n-bit long, the adversary will have a good chance to observe a collision on the block C1. We denote IV and IV 0 the two nonces leading to that collision. One can easily see from Figure 3.4 that since no difference is inserted in the block P1, a collision on C1 necessarily means that we have a collision on the difference value of

2We note that it is not necessary that the message is fixed. Just for the simplicity of description, we use a fixed message here and in subsequent sections. 48 CHAPTER 3. PRACTICAL DIFFERENTIAL ATTACK ON JAMBU the upper branch and lower branch of the internal state (i.e. the difference values in R1 and in Y1 are equal). We denote that difference by ∆R, and we denote the random difference in X1 by ∆S. We remark that for this first attack step, we do not need to reuse any nonce value.

R1 ∆R R2 R3

 P  ∆ P1 2 = 0

∆S U1 X1  U2 X2  S S2 S 1 EK EK 3 ∆R ∆=0 V1 Y1  V Y2  ∆=0 2 P1  P2 ∆=0  C1 C2 Figure 3.4: The first step of the attack.

Second step

In the second step, the goal will be to deduce the values of ∆S and ∆R which remain unknown at this moment. In order to achieve this, the adversary will now try to insert a difference in P1 in the hope that it will be equal to ∆S. If his choice is right, one can see from Figure 3.5 that he will cancel the difference in U2 and that the difference appearing in the next ciphertext block C2 (for a plaintext block P2 without difference) will necessarily be ∆R. A key observation is that since no difference will be present any more on the input of the incoming block cipher call, the difference in C2 will remain ∆R whatever the choice of the value of P1. To summarise, if the adversary adds the difference ∆S to P1, then the difference in C2 will remain the same (i.e. ∆R) whatever the value of P1 is. This behaviour is what the adversary will use to detect when he makes the right choice for the difference insertion in P1.

Figure 3.5: The second step of the attack.

The detailed procedure to find ∆S is as follows. Firstly, the adversary constructs two tables as depicted in Figure 3.6, each having 2n/2 three-tuples of one-block plaintexts, such that all pairs created by taking one element from each of these two tables will correspond to all the 2n possible differences on an n-bit value. More precisely, let hii denote the integer i in a n/2-bit binary 3.2. ATTACK ON JAMBU IN NONCE-MISUSE SCENARIO 49

3 representation . One table T1 is { ( h0ikhii, h1ikhii, h2ikhii ) }, where i ranges over all n/2-bit values4. We denote by ( h0ikhii, h1ikhii, h2ikhii ) the i-th element of T1. The other table T2 is { ( hjikh0i, h1 ⊕ jikh0i, h2 ⊕ jikh0i ) }, where j ranges over all n/2-bit values. Similarly we denote the j-th element of T2 by ( hjikh0i, h1 ⊕ jikh0i, h2 ⊕ jikh0i ). The pairwise differences between T1 and T2 are { ( h0ikhii ) ⊕ ( hjikh0i ) = hjikhii }, where i and j independently range over all n/2-bit values. Thus, one can see that it covers all the possible differences of a one-block n-bit plaintext. In particular, although each element is a 3-tuple of plaintexts and hence each pair consist of three n-bit differences by XORing the corresponding plaintexts, these differences are all equal, i.e. , ( h0ikhii )⊕( hjikh0i ) = ( h1ikhii )⊕( h1⊕jikh0i ) = ( h2ikhii )⊕( h2⊕jikh0i ).

T1 T2

000…000 000…000 000…000 000…000 000…000 000…001 000…001 000…000 … … … … … … … … st 1 tuple 000…000 000 …000 … … … …  … … … … 000…000 111…110 111…110 000…000 000…000 111…111 111…111 000…000

000…001 000…000 000…001 000…000 … … … … 000…001 000…001 000…010 000…000 … … … … … … … … … … … … … … … … nd 2 tuple 000…001  <1> 000 …000 … … … …  … … … … … … … … 000…001 111…110 111…111 000…000 … … … … 000…001 111…111 000…000 000…000 … … … …

000…010 000…000 000…010 000…000 000…010 000…001 000…011 000…000 rd … … … … … … … … 3 tuple 000…010  <2> 000…000 … … … …  … … … … 000…010 111…110 111…100 000…000 000…010 111…111 111…101 000…000

Figure 3.6: The tables T1 and T2. An example of element pair for difference hjikhii.

Secondly, the adversary selects a random one-block plaintext P2. For each element ( h0ikhii, h1ikhii, h2ikhii ) in table T1, he uses separately the three plaintext block values as the first block P1, concatenates them with P2 as the second block, and makes three encryption queries with the nonce IV to receive the three corresponding ciphertexts. Then, the adversary computes the pairwise differences in the second block of these ciphertexts. In detail, let C[h0ikhii]2, C[h1ikhii]2 and C[h2ikhii]2 denote the second ciphertext blocks corresponding to h0ikhiikP2, h1ikhiikP2 and h2ikhiikP2 respectively. The adversary computes the following two n-bit differences and stores them. =

∆C[hii]1 = C[h1ikhii]2 ⊕ C[h0ikhii]2 , ∆C[hii]2 = C[h2ikhii]2 ⊕ C[h0ikhii]2 .

Similarly, for each element of the second table T2, the adversary makes 0 encryption queries with the above P2 as the second block and IV as the 3Typically n is 32, or 64, i.e. even integers. 4Note that the adversary can use any three distinct constants other than h0i, h1i, and h2i to construct the tables. Here we use these particular constants only to simplify the notation. 50 CHAPTER 3. PRACTICAL DIFFERENTIAL ATTACK ON JAMBU nonce to receive the ciphertexts, and then computes the pairwise differences 0 0 of the second ciphertext blocks, denoted as (∆C [hii]1, ∆C [hii]2). Then he matches the differences to previously stored { (∆C[hii]1, ∆C[hii]2) }. Once a matched pair is found, the adversary computes ∆R and ∆S from the corresponding plaintexts and ciphertexts as follows 0 ∆R = C[h0ikhii]2 ⊕ C [hjikh0i]2 , ∆S = ( h0ikhii ) ⊕ ( hjikh0i ) = hjikhii .

If the adversary does not find a match after running all elements in T2, he outputs 0. Now we evaluate the success probability of this step if the adversary interacts with JAMBU. For a pair (h0ikhii)kP2 with nonce IV and (hjikh0i)kP2 0 0 with nonce IV , if ∆S = hjikhii, we have that C[h0ikhii]2 ⊕C [hjikh0i]2 = ∆R 0 as explained in Section 3.2.1. Similarly, we have that C[h1ikhii]2 ⊕ C [h1 ⊕ 0 jikh0i]2 = ∆R and C[h2ikhii]2 ⊕ C [h2 ⊕ jikh0i]2 = ∆R. Then, we further deduce that 0 0 C[h0ikhii]2 ⊕ C [hjikh0i]2 = C[h1ikhii]2 ⊕ C [h1 ⊕ jikh0i]2 0 0 ⇒ C[h0ikhii]2 ⊕ C[h1ikhii]2 = C [hjikh0i]2 ⊕ C [h1 ⊕ jikh0i]2 0 ⇒ ∆C[hii]1 = ∆C [hji]1 0 With an identical reasoning, we deduce that ∆C[hii]2 = ∆C [hji]2. On the other hand, if ∆S 6= hjikhii, the pair of the i-th element from T1 and the j-th element from T2 satisfies the two n-bit equality conditions with probability 2−2n. Since there are in total 2n such pairs, the probability of faulty positive pairs is negligible. Hence, the adversary gets the correct values of ∆S and ∆R with a probability very close to 1.

Third step Finally, in the third and last step, the adversary will choose a random one- block value P1 such that P1 and P1 ⊕ ∆S have never been queried before as first plaintext block (he can simply keep track of the previously queried P1 values). Then, he will pick a random value for the second plaintext block P2 and ask the encryption of the message (P1||P2) with the nonce IV . He receives ciphertext blocks C1 and C2 from the encryption oracle. Then the adversary asks the encryption of another message (P1 ⊕ ∆SkP2) with the 0 0 0 nonce IV , and receives ciphertext blocks C1 and C2. Then he computes 0 0 C2 ⊕ C2, and compares it to ∆R. If C2 ⊕ C2 = ∆R holds, the adversary outputs 1. Otherwise, the adversary outputs 0. One can easily evaluate the advantage of the adversary. For JAMBU, he will output 1 with a probability equal to 1. On the other hand, for a random authenticated encryption, he outputs 1 with a probability of 2−n. Therefore, the advantage of the adversary is almost 1.

Attack and complexity summary To summarise, our attack requires in total about O(2n/2) encryption queries and computations, and can be divided into three parts: 3.2. ATTACK ON JAMBU IN NONCE-MISUSE SCENARIO 51

• first step (2n/2 encryption queries and computations): the adversary n/2 picks a plaintext block P1 and queries encryption of this block for 2 distinct nonces. He keeps the nonce pair (IV,IV 0) that leads to a collision on the ciphertext block C1. • second step (O(2n/2) encryption queries and computations): the adversary picks a random second plaintext block P2 and a random n/2-bit value I and queries the encryption of the 2n/2 plaintext blocks n/2 P1 = (0 kI) concatenated with P2 with nonce IV and the encryption n/2 n/2 of the 2 plaintext blocks P1 = (Ik0 ) concatenated with P2 with nonce IV 0. He repeats the process with a few other constant values instead of 0n/2 in order to improve the filtering, and he eventually deduces the value of ∆S by checking which difference applied in P1 leads to the same difference in C2 whatever is the choice of I. He directly deduces that this difference in C2 is actually ∆R. • third step (2 encryption queries and computations): the adversary picks a random value P1 such that P1 and P1 ⊕ ∆S have never been queried before, and asks the encryption oracle for the ciphertext corre- sponding to the message (P1||P2) with nonce IV . He receives (C1||C2). 0 Then he queries the encryption of (P1 ⊕ ∆SkP2) with nonce IV and 0 0 0 receive (C1kC2). Finally he checks if C2 ⊕ C2 = ∆R holds. We remark that for JAMBU-AES [130], we have n = 64 and thus the confidentiality security is only around 32 bits in the nonce-misuse attack model. Thus, our cryptanalysis invalidates the confidentiality claims of the JAMBU designers.

3.2.3 Extension to a plaintext-recovery attack

Our distinguishing attack can be extended to a more powerful plaintext- recovery attack in a straightforward way. The setting is as follows. Note that our attack is in the chosen-plaintext model, and hence the adversary requires only the encryption algorithm of JAMBU. In other words, he is given access to an encryption oracle of JAMBU instantiated with a randomly selected key that is secret to the adversary. He is allowed to query any plaintext of his own choice and gets the corresponding ciphertext. In the end, the adversary is required to choose a new (nonce, ciphertext) pair5 and to produce a corresponding plaintext for it. If the plaintext is indeed valid and if the prefix to its last block has never been queried before, then the plaintext-recovery attack is said to succeed (the reason of these restrictions is detailed in the discussion on trivial attacks in Section 3.2.4). The procedure is as follows and it also has three steps. The first two steps are exactly the same as the first two steps of the distinguishing attack detailed

5We note that here the attack is only to analyse processing of plaintext/ciphertext of JAMBU, that is confidentiality of encryption, and hence relaxes the setting such that the adversary is not required to provide the tag for his choice of (nonce, ciphertext) pair. 52 CHAPTER 3. PRACTICAL DIFFERENTIAL ATTACK ON JAMBU in Section 3.2.2, and we adopt the same notation. In the third and last step, the adversary will choose a random value P1 such that P1 and P1 ⊕ ∆S have never been queried before as first plaintext block under the nonce IV and 0 IV . Then, he will pick a random value for the second plaintext block P2 and ask the encryption of the message (P1||P2) with the nonce IV . He receives ciphertext blocks C1 and C2 from the encryption oracle. Since he knows the value of ∆R and ∆S, he will be sure that if he applies the difference ∆S on 0 P1 with nonce IV , he will get difference ∆S on C1 and difference ∆R on C2. Therefore, he can predict the plaintext (P1 ⊕ ∆S,P2 ⊕ ∆R) corresponding to 0 ciphertext (C1 ⊕ ∆S||C2) with nonce IV . Moreover, it is easy to see that (C1 ⊕ ∆S||C2) is not a prefix of any of previous returned ciphertext of the encryption of JAMBU, since the first ciphertext block is a permutation of the first plaintext block under the same nonce, and since P1 ⊕ ∆S has not been queried before as the first plaintext block under IV 0 to the encryption oracle. One might argue that P1 is the first plaintext block and this is part of the security exclusions in the JAMBU security claims. However, we have used P1 for simplicity of description, but the attack remains the same with any amount of random message blocks prepended to P1. The complexity of the above plaintext-recovery attacks is also O(2n/2) encryption queries and computations. The success probability is almost 1 (we omit the detailed evaluation since it is similar to the distinguishing attack).

3.2.4 Discussion on trivial attacks

In 2014, Rogaway claimed a generic plaintext-recovery attack on online authenticated encryption in the nonce-misuse setting [100]. His attack adopts divide-and-conquer strategy and recovers the plaintext block by block. In detail, the adversary uses the recovered first i − 1 plaintext blocks as prefix, guesses the i-th plaintext block, and verifies the correctness by sending it to the encryption oracle and compares the received i-th ciphertext block with the i-th target ciphertext block. However, obviously this attack essentially just reveals again the inherent weakness of online authenticated encryption that has been known before and has been also explicitly pointed out by the designer of JAMBU: common prefix plaintext blocks produces the same corresponding ciphertext blocks. In particular, the adversary has to query a plaintext to the encryption oracle, then receive a ciphertext that is exactly the same as the target ciphertext, and then output this plaintext as the correct plaintext. As a comparison, in our plaintext recovery setting, we explicitly exclude such rather trivial attacks by restricting that the last block of the target ciphertext (or plaintext) must not share its prefix with any previously returned ciphertext from the encryption oracle. One can also think of the following trivial distinguishing attack on JAMBU and several other CAESAR candidates. For an ideal online authenticated encryption as defined in [3,45], the i-th plaintext block should be input to a random permutation to produce the i-th ciphertext block, where the index of the random permutation is determined by the nonce, the associated data 3.3. ATTACK ON JAMBU IN NONCE-RESPECTING SCENARIO 53 and the first i − 1 plaintext blocks. On the other hand, for JAMBU the i-th plaintext block is simply XORed to an internal state: Ci = Pi ⊕ Vi+1, where the value of Vi+1 is determined by the nonce, the associated data and the first i − 1 plaintext blocks. Hence, ∆Ci = ∆Pi always holds under the same nonce, the same associated data and the same first i − 1 plaintext blocks. In details, an adversary queries a nonce IV and a one-block plaintext P1 to the encryption oracle, and receives a ciphertext C1. He then queries the 0 same nonce IV and another one-block plaintext P1 to the encryption, and 0 0 0 receives a ciphertext C1. If C1 ⊕ C1 = P1 ⊕ P1 holds, the adversary outputs 1. Otherwise, he outputs 0. This distinguishing attack can trivially be extended to a plaintext-recovery attack on single-block ciphertexts. As a comparison, our attacks reveal a specific weakness of JAMBU: when processing plaintext blocks, it uses only one invocation to a small block cipher (2n bits) to update a larger state (3n bits). Such a design choice obviously favours efficiency, but our attacks imply that there is a greater security compromise to pay than originally expected by the JAMBU designers.

3.3 Attack on JAMBU in Nonce-Respecting Scenario

In this section, we analyse the confidentiality security of JAMBU in the nonce- respecting scenario. JAMBU claims a 2n-bit confidentiality security (or 128-bit security for AES-JAMBU) in this setting. However, the claim statement does not contain any specification on the attack model considered (IND-CPA, IND-CCA or IND-CCA2). Hence, one may wonder if JAMBU can achieve such a confidentiality security level under all (previously known) attack models6. We note that the adaptive chosen-ciphertext security (IND-CCA2) of JAMBU can be trivially broken with 2n queries by reusing messages with common prefixes (see Section 3.3.3). However, our distinguishing attack works with prefixes not previously queried. Furthermore, our method can be extended to a more powerful plaintext-recovery attack.

Confidentiality under an adaptive chosen-ciphertext attack

For an authenticated encryption scheme (EK , DK ) with a key space K, its confidentiality security under adaptive chosen-ciphertext attacks has been defined in [16], usually referred to as IND-CCA2. Here we provide a brief description, and refer interested readers to [16] for the full formal definition. Let RO denote a random oracle that has the same output bit length as EK on every input plaintext. Let D be a distinguisher that interacts with (EK , DK ) or (RO, DK ), and outputs one bit. Its advantage is defined as: h i h i cca2 $ EK ,DK $ RO,DK AdvE (D) := Pr K ← K, D ⇒ 1 − Pr K ← K, D ⇒ 1 . cca2 cca Then we define AdvE (t, q, σ, `) := maxD AdvE (D), where the maximum is taken over all distinguishers that run in time t and makes q queries, each of

6Yet we trivially observe that JAMBU can only achieve 2n confidentiality security in the IND-CCA3 model [102] (and not the expected 22n), due to its n-bit tag size. 54 CHAPTER 3. PRACTICAL DIFFERENTIAL ATTACK ON JAMBU length at most ` blocks and of total length at most σ blocks. The distinguisher must not make two queries with the same nonce to the encryption oracle that is EK or RO. Moreover, we assume the distinguisher does not query the outputs from one oracle to the other oracle. Namely, he does not query the received ciphertext from EK or RO to DK , and does not query the received plaintext from DK to EK or RO. These assumptions aim at preventing trivial distinguishing attacks.

3.3.1 Distinguishing attack

We notice that JAMBU uses an n-bit tag. Therefore, one can always obtain the corresponding plaintext for a ciphertext of his own choice from the decryption oracle by making at most 2n queries, i.e. by exhaustively guessing the tag value. Based on this observation, we can transform the distinguishing attack in the nonce-misuse setting detailed in Section 3.2 to a distinguishing attack in nonce-respecting setting, with a complexity increase by a factor 2n and hence with a total complexity of 23n/2, which is lower than the 2n-bit security one might expect. In detail, the attack in the nonce-misuse setting consists of three steps, and the repeating nonces requirement happens in steps 2 and 3. Thus, we will mainly modify these two steps. We adopt the same notation as Section 3.2.

First step

The procedure is exactly the same as before. For the plaintext P1, its 0 ciphertext is denoted as C1 under the nonce IV and as C1 under the nonce 0 0 0 IV . Then we denote V [IV ]2 = P1 ⊕ C1 and V [IV ]2 = P1 ⊕ C1.

Second step

Firstly, the adversary constructs tables T1 and T2 as before. Secondly, he selects a random one-block ciphertext block C2. For each element ( h0ikhii, h1ikhii, h2ikhii ) in table T1, he executes a similar procedure to interact with the decryption oracle for each of h0ikhii, h1ikhii and h2ikhii. Here we use h0ikhii as an example to describe this procedure. The adversary computes V [IV ]2 ⊕ ( h0ikhii ) as the first ciphertext block, concatenates it with C2 as the second block, and queries the constructed two-block ciphertext to the decryption oracle with the nonce IV and with a random selected tag value. If the decryption oracle returns a failure symbol ⊥, the adversary changes the tag to a new value, and makes a decryption query with the same nonce and the same ciphertext. He will repeat such decryption queries by exhaustively trying new tag values until the decryption oracle returns a plaintext instead of ⊥. In the returned plaintext, it is easy to get that the first block is h0ikhii, and we denote its second block as P [h0ikhii]2. Similarly, we define the notation P [h1ikhii]2 and P [h2ikhii]2 for the second plaintext block corresponding to h1ikhii and h2ikhii respectively. Once a plaintext 3.3. ATTACK ON JAMBU IN NONCE-RESPECTING SCENARIO 55 obtained, the adversary computes the pairwise differences of the second plaintext blocks as follows:

∆P [hii]1 = P [h1ikhii]2 ⊕ P [h0ikhii]2 , ∆P [hii]2 = P [h2ikhii]2 ⊕ P [h0ikhii]2 .

For each element of the other table T2, the adversary makes similar decryption 0 0 queries, but using IV as nonce and V [IV ]2 to compute the first ciphertext blocks. We denote the computed pairwise differences of the second plaintext 0 0 blocks as (∆P [hji]1, ∆P [hji]2). The adversary matches the differences to previously stored { (∆P [hii]1, ∆P [hii]2) }. Once a matching pair is found, the adversary computes ∆R and ∆S from the corresponding plaintexts and ciphertexts as follows:

0 ∆R = P [h0ikhii]2 ⊕ P [hjikh0i]2 , ∆S = ( h0ikhii ) ⊕ ( hjikh0i ) = hjikhii .

If no match is found after trying all elements in T2, the adversary outputs 0.

Third step

The adversary selects a random one-block C1 such that C1 and C1 ⊕ ∆S have not been queried before as a first block of ciphertext under the nonces IV and 0 IV . Then, he selects another random block C2. Firstly, the adversary makes queries C1kC2 to the decryption oracle with the nonce IV by exhaustively guessing the tag until he receives the plaintext, where the second plaintext block is denoted as P2. Secondly, the adversary makes queries C1 ⊕ ∆SkC2 to the decryption oracle with the nonce IV 0 by exhaustively guessing the tag until he receives the plaintext, where the second plaintext block is denoted 0 0 as P2. Finally, he computes ∆P2 = P2 ⊕ P2, and compares it to ∆R. If ∆P2 = ∆R, the adversary outputs 1. Otherwise, he outputs 0.

The overall complexity is dominated by step 2, which is upper bounded by O(23n/2) (or 296 for AES-JAMBU). The advantage of the distinguisher is almost 1 (we omit the detailed evaluation since it is similar to that of the attacks in previous sections).

3.3.2 Extension to a plaintext-recovery attack

The plaintext-recovery attack setting is as follows. The adversary is given access to both encryption and decryption oracles of JAMBU instantiated with a randomly selected key that is secret to the adversary. He is allowed to make encryption and decryption queries of his own choice. Note that he must not make two encryption queries with the same nonce. In the end, the adversary is required to choose a nonce and a ciphertext (where the last block of the ciphertext must not have the same prefix as the last blocks of any previously outputted or queried ciphertext under the same nonce) and to produce a corresponding plaintext for it. If the plaintext is indeed valid, the plaintext-recovery attack is said to succeed. 56 CHAPTER 3. PRACTICAL DIFFERENTIAL ATTACK ON JAMBU

The attack procedure is similar with that of distinguishing attacks from Section 3.3.1. The first two steps are exactly the same, and we adopt the same notation. In the third and last step, the adversary will choose a random one-block value C1 such that C1 and C1 ⊕ ∆S have never been outputted as the first ciphertext block from the encryption oracle and have never been queried to the decryption oracle as first ciphertext block under the nonce IV and IV 0. Then, he will pick a random value for the second plaintext block C2 and interact with the decryption oracle to receive the plaintext P1kP2 of the ciphertext (C1||C2) under the nonce IV . Since he knows the value of ∆R and ∆S, he will be sure that if he applies the difference ∆S on P1 with nonce 0 IV , he will get difference ∆S on C1 and difference ∆R on C2. Therefore, he can predict the plaintext (P1 ⊕ ∆S,P2 ⊕ ∆R) corresponding to ciphertext 0 (C1 ⊕ ∆S||C2) with nonce IV . The complexity of the above plaintext-recovery attacks is O(23n/2) en- cryption queries and computations (or 296 for AES-JAMBU), and its success probability is almost 1 (we omit the detailed evaluation, since it is similar to the distinguishing attack).

3.3.3 Discussion on trivial attacks

In the nonce-respecting scenario, although the adversary cannot make two encryption queries with the same nonce, he is allowed to repeat nonces during the interaction with the decryption oracle. Hence, if he makes more than 2n decryption queries, he will obtain more than one pair of plaintext and ciphertext under the same nonce. As a result, this leads to several trivial attacks (similar to the trivial attacks on JAMBU in the adaptive chosen-ciphertext attack model described in Section 3.2.4). For example, the adversary can interact with the decryption oracle to receive a plaintext P1 for nonce IV and a one-block ciphertext C1, and then interact with the 0 encryption oracle to receive a ciphertext C1 for a random one-block plaintext 0 0 0 P1 with the same nonce IV . Finally he checks if P1 ⊕ C1 = P1 ⊕ C1 holds. We refer to Section 3.2.4 for more discussions on trivial attacks on JAMBU.

3.4 Implementation of the Attack

Using a regular computer, we have implemented the attack on AES-JAMBU for the nonce-misuse scenario as described in Section 3.2 and we have verified the special differential structure from Figure 3.5. For simplicity, the associated data was set to be empty, and the 128-bit key was set to

K = 0x100f0e0d0c0b0a090807060504030201 .

3.4.1 Results of the attack

In the first step of the attack, we chose a random 64-bit plaintext P1 and asked for encryption under different nonce values. With 232 encryption 3.4. IMPLEMENTATION OF THE ATTACK 57

0 queries, we found a collision on a pair of ciphertexts C1,C1 with a pair of nonce values IV,IV 0 (see Table 3.2).

Table 3.2: First step of the attack

K : 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 IV : b1 ef 89 a0 4e 21 30 bd IV 0 : 10 5a 1f 5b 34 49 1e 5c P1 : 7f 95 77 ca 09 77 a8 a5 C1 : 2d 2b 58 18 fa f5 af f1 0 C1 : 2d 2b 58 18 fa f5 af f1

With this pair of nonce values, we proceeded to the second step of the attack, P2 being set to zero for simplicity. We constructed the tables T1 and T2 and by matching the differences in the second block of ciphertexts, we obtained the values of ∆S and ∆R. Table 3.3 shows the first tuple of the pair of plaintexts and ciphertexts tables with the matching difference.

Table 3.3: Second step of the attack

hjikh0ikP2 : 60 28 6d 74 00 00 00 00 00 00 00 00 00 00 00 00 0 C [hjikh0i]2 : af 45 56 9e 26 c6 7e d0 h0ikhiikP2 : 00 00 00 00 93 47 1e 92 00 00 00 00 00 00 00 00 C[h0ikhii]2 : 73 79 44 54 a7 b4 5b 4c ∆S : 60 28 6d 74 93 47 1e 92 ∆R : dc 3c 12 ca 81 72 25 9c

In the third step, we chose a random 128-bit plaintext (P1kP2) and asked for its encryption with nonce IV . Upon receiving the ciphertext D D (C1kC2), we deduced the ciphertext (C1 kC2 ) = (C1 ⊕ ∆SkC2 ⊕ ∆R) for the 0 plaintext (P1 ⊕ ∆SkP2) with nonce IV without querying it to the encryption oracle. Finally, we checked that by asking for the encryption of the plaintext 0 0 0 (P1 ⊕ ∆SkP2) with nonce IV , the ciphertext (C1kC2) obtained is indeed what we had deduced (as can be seen from Table 3.4).

Table 3.4: Third step of the attack

IV : b1 ef 89 a0 4e 21 30 bd P1kP2 : 95 d9 43 9e 0b 4d 6d 27 6a ba db 0a 12 f8 13 45 C1kC2 : c7 67 6c 4c f8 cf 6a 73 6b 05 9b c6 fc e6 7a ee ∆S : 60 28 6d 74 93 47 1e 92 ∆R : dc 3c 12 ca 81 72 25 9c D D C1 kC2 : a7 4f 01 38 6b 88 74 e1 b7 39 89 0c 7d 94 5f 72 IV 0 : 10 5a 1f 5b 34 49 1e 5c P1 ⊕ ∆SkP2 : f5 f1 2e ea 98 0a 73 b5 6a ba db 0a 12 f8 13 45 0 0 C1kC2 : a7 4f 01 38 6b 88 74 e1 b7 39 89 0c 7d 94 5f 72 58 CHAPTER 3. PRACTICAL DIFFERENTIAL ATTACK ON JAMBU

3.4.2 Running time of the attack

For the first step of the attack, it took about 3.7 hours and 36 GB of memory to find a collision. While for the second step of the attack, it took about 8.8 hours and 320 GB to find ∆S and ∆R. For the second step of the attack, one can perform a trade-off between the computation time and memory requirement. For instance, instead of constructing tables of 232 elements, one can construct tables of 230 (or 228 respectively) elements and the computation time takes about 2.2 hour (or 0.5 hour respectively) and 80 GB (or 20 GB respectively) of memory. However, in this case, one would have to guess the 2 (or 4 respectively) most significant bits of the difference values i and j. Hence, by repeating the attack procedure 16 times (or 256 times respectively), the value of ∆S and ∆R can be recovered by enumerating all the values for the most significant bits.

3.5 Conclusion

In this chapter, we have proposed a cryptanalysis of the confidentiality of JAMBU in both the nonce-misuse and nonce-respecting scenarios. Namely, we have shown that one can break confidentiality in the nonce-misuse sce- nario with 232 computations and queries, while having access to only the encryption oracle. For the nonce-respecting scenario, we show that our attack can be extended to break the confidentiality protection of JAMBU with 296 computations and queries in the adaptive chosen-ciphertext attack model, with message prefixes not previously queried. As far as we know, our analysis is the first and only attack on JAMBU till date. Post-cryptanalysis. The designers of JAMBU acknowledged our analysis. In their updated document [131], they incorporated our analysis and gave a more detailed security analysis for their mode. They also restated their claim: The new claim is that if nonce is reused, and if the first i blocks are the same, then the security of the (i + 1)-th and (i + 2)-th plaintext blocks are insecure. (Previously,) we claimed that if nonce is reused, and if the first i blocks are the same, then the security of the (i + 1)-th block is insecure. Chapter 4

Invariant Subspace Attack on Midori64

Midori is a family of lightweight block ciphers published by Banik et al. at Asiacrypt 2015 [5], which have been advertised as one of the first lightweight ciphers optimised with respect to energy consumed by the circuit per bit in encryption or decryption operations. To achieve the desired low energy goal, several design decisions were made in Midori. It adopts an AES-like SPN structure, and the diffusion layer consists of almost-MDS 4 × 4 binary matrices. The 4-bit S-box has a small delay, i.e. 1.5-2 times faster than those of [31] and PRESENT [28]. The round constants are seemingly random binary values extracted from the constant π. The key schedule is trivial and efficient. Finally, the number of rounds is rather small in comparison to other AES-like lightweight ciphers: only 16-20 rounds are used. Given the above choices of round function operations, the security of the design must be carefully discussed. The submission document of Midori contains a standard analysis of the proposed ciphers against various types of attacks: differential and linear, boomerangs, impossible differentials, etc. As a result, it has been concluded that the ciphers provide a sufficient security margin. Additional analysis of Midori has been provided in [79], which shows that 12 rounds (out of 16) of Midori64 can be attacked with the meet- in-the-middle technique, with a rather high complexity: the key-recovery requires around 255.5 chosen plaintexts, 2109 memory, and 2125.5 computations. Contribution of this chapter. In this chapter, we show that Midori64 has a class of 232 weak keys that can be distinguished with a complexity of a single query. Furthermore, within this class of keys, a key-recovery can be efficiently achieved given two plaintext-ciphertext pairs, including the pair used in the distinguisher. We describe our attack in Section 4.2, it is based on invariant subspace attacks [75]. It uses the unfortunate combination in Midori64 of round constants, S-box, and multiplication by a binary matrix in the diffusion layer. When each cell of the secret key has the value 0 or 1 (in total 232 such keys), and each cell of the state (including the plaintext) has the value of 8 or 9,

59 60 CHAPTER 4. INVARIANT SUBSPACE ATTACK ON MIDORI64 then the transformations in Midori64 keep the state in the same class (of cells values 8 and 9). Hence, regardless of the number of rounds, the class is maintained, and as a result, the ciphertext belongs to this class as well. This fact allows to launch an efficient distinguisher. The key-recovery attack uses an additional fact: the values 8 and 9 are fixed points for the S-box used in Midori64. As a result, the whole cipher under the weak-key class becomes a linear transformation (the only non- linear component, the S-box, turns into the identity mapping). Therefore, recovering the key is equivalent to solving a system of linear equations and it can be achieved given only two pairs of plaintext-ciphertext verifying the distinguisher. We have confirmed the correctness of the whole analysis by implementing independently the distinguisher and the key-recovery. In Section 4.3, we extend our analysis by considering other round constant candidates for Midori64. At the current stage, our attacks do not apply to Midori128.

4.1 Description of Midori

Midori consists of two algorithms Midori64 and Midori128. The block size, n, is 64 bits for Midori64 and 128 bits for Midori128, and the key size is 128 bits for both. The ciphers adopt a standard SPN structure, and the internal state is represented as 4 × 4 cells, where the size of each cell is 4 bits for Midori64 and 8 bits for Midori128. The internal state IS has sixteen cells s0, s1, . . . , s15 arranged as :

  s0 s4 s8 s12 s s s s   1 5 9 13 IS =   . s2 s6 s10 s14 s3 s7 s11 s15

From the 128-bit secret key, Midori64 generates a 64-bit whitening key WK and r − 1 64-bit round keys RK 0, RK 1,..., RK r−2. On the other hand, Midori128 generates a 128-bit whitening key and round keys. Here, r is the number of rounds, which is 16 for Midori64 and 20 for Midori128. The plaintext is first loaded into the state and the whitening key WK is XORed to the state. Then, the round function RF, which takes as input the current state and the round key RKi and outputs the updated state, is iterated r − 1 times. Finally, the last round function RF l is applied and the resulting state is output as the ciphertext.

Key generation

In Midori64, the 128-bit key K is separated into two 64-bit states K0 and K1. Then, the whitening key WK is computed as K0 ⊕ K1, and the round keys RK i for i = 0, 1,... 14 are computed as K(i mod 2) ⊕ αi, where αi is a round constant described below. 4.1. DESCRIPTION OF MIDORI 61

The round constants αi where i = 0, 1,...., 14 consist of 16 binary cells. The constants have been derived from the hexadecimal encoding of the fractional part of π. For example, α0 and α1 are defined as follows: 0 0 1 0 0 1 1 0     0 1 0 0 1 0 1 0 α0 =   and α1 =   . 0 0 1 1 1 0 0 0 1 1 1 1 1 0 0 0

The remaining αi are defined similarly. Refer to [5] for more details. We later exploit the fact that all the αi’s are binary matrices, i.e. all the cells in any αi are either 0 or 1.

Round function and the last round function The round function RF consists of the four operations SubCell, ShuffleCell, MixColumn and KeyAdd that update the n-bit state IS.

SubCell. This operation in Midori64 applies a 4-bit S-box Sb0 to each cell, while in Midori128 applies four 8-bit S-boxes SSb0, SSb1, SSb2 and SSb3 to each of the four cells in row 0, row 1, row 2 and row 3, respectively. Each SSbi is generated from the 4-bit S-box Sb1. Refer to [5] for the specification of Sb1 and details of how SSbi are generated from Sb1. The specification of Sb0 is shown in Table 4.1.

Table 4.1: Specifications of Sb0

x 0123456789abcdef

Sb0(x) cad3ebf789150246

ShuffleCell. This transformation is a cell-wise permutation. Each cell is permuted as follows.

    s0 s4 s8 s12 s0 s14 s9 s7 s s s s  s s s s   1 5 9 13  10 4 3 13   −→   . s2 s6 s10 s14  s5 s11 s12 s2  s3 s7 s11 s15 s15 s1 s6 s8 MixColumn. This transformation applies a 4 × 4 binary involutory matrix to each column of the state as follows.

      si 0 1 1 1 si s    s   i+1 1 0 1 1  i+1   =     , for i ∈ {0, 4, 8, 12}. si+2 1 1 0 1 si+2 si+3 1 1 1 0 si+3

KeyAdd. KeyAdd(IS, RK i) cell-wise XORs RK i to the state IS. Last round function. The last round function RF l only applies two operations; namely, SubCell(S) and KeyAdd(S, WK). 62 CHAPTER 4. INVARIANT SUBSPACE ATTACK ON MIDORI64

Summary

The encryption of Midori can be summarised as shown in Algorithm 1, and the Midori64 encryption function is depicted in Figure 4.1. Note that the decryption can be described similarly. However, since our attack only uses the encryption, we omit the description of the decryption.

Algorithm 1 — Midori encryption algorithm

1: function Midori-Encryption(P,K = K0kK1) 2: IS ← P 3: WK ← K0 ⊕ K1 4: IS ← KeyAdd(IS, W K) 5: for i = 0, . . . , r − 2 do 6: IS ← SubCell(IS) 7: IS ← ShuffleCell(IS) 8: IS ← MixColumn(IS) 9: RK i ← Ki mod 2 ⊕ αi 10: IS ← KeyAdd(IS, RK i) 11: IS ← SubCell(IS) 12: IS ← KeyAdd(IS, WK) 13: return IS

K K K K K K K K K 0 ⊕ 1 0 1 0 1 0 0 ⊕ 1

α0 α1 α2 α13 α14

WK RK0 RK1 RK2 RK13 RK14 WK

P C ··· SubCell SubCell SubCell SubCell SubCell ShuffleCell ShuffleCell ShuffleCell ShuffleCell MixColumn MixColumn MixColumn MixColumn

Figure 4.1: Midori64 encryption algorithm

4.2 Invariant Subspace Attack on Midori64

In this section, we present the invariant subspace attack on Midori64. Our analysis reveals a class of 232 weak keys. Within this class, Midori64 can be distinguished from a random permutation with a single chosen plaintext query, a negligible computational cost, and a negligible memory. Moreover, the key can be recovered from the 232 potential candidates in 216 operations. We first introduce the notation used in this attack.

K : a subspace of cell values consisting of two elements 0 and 1, i.e. K , {0,1}. 4.2. INVARIANT SUBSPACE ATTACK ON MIDORI64 63

K : a subspace of state values in which each of its sixteen cells belongs to 16 K, i.e. K , K . S : an affine subspace of cell values consisting of two elements 8 and 9, i.e. S , {8,9} =8⊕K. S : an affine subspace of state values in which each of its sixteen cells 16 belongs to S, i.e. S , S .

4.2.1 Distinguisher with invariant subspace attack

Proposition 4.1 (Invariant Subspace). If the 128-bit secret key K0kK1 satisfies K0,K1 ∈ K, then any plaintext P ∈ S is mapped by Midori64 to a ciphertext C ∈ S with probability 1. Throughout this section, we prove Proposition 4.1. To achieve this, we focus independently on each transformation used in Midori64.

Round key generation

Let x, y ∈ K. Then, x ⊕ y ∈ K. Therefore, for any X,Y ∈ K, X ⊕ Y ∈ K. The whitening key WK is computed by K0⊕K1. By assuming K0,K1 ∈ K, we have WK ∈ K. The round key for the i-th round, RK i, is computed by K(i mod 2) ⊕ αi. Here, an important observation for our attack is that all the round constants αi only consist of 0 and 1, i.e. αi ∈ K for i = 0, 1,..., 14. By assuming K0,K1 ∈ K, we have RK i ∈ K for all i = 0, 1,..., 14.

Data processing part

Let x ∈ S and y ∈ K. Then, x ⊕ y ∈ S. Therefore, for any X ∈ S and Y ∈ K, X ⊕ Y ∈ S. As long as the plaintext P ∈ S, the state after adding the whitening key, WK ∈ K, belongs to S. Then, the state is processed by the SubCell operation. Here, we exploit two particular data transitions through the S-box for Midori64; Sb0(8) = 8 and Sb0(9) = 9. Namely, as long as the input state belongs to S, SubCell is equivalent to the identity mapping. Obviously, we obtain S = SubCell(S). The subsequent ShuffleCell is a cell-wise permutation, and since all cells in S satisfy S, S = ShuffleCell(S). The MixColumn operation is slightly more complex. Because the diffusion matrix is a binary matrix with weight three in each row, each output cell from MixColumn can be represented as the XOR of three input cells. As long as the input state belongs to S, each of three cells is either 8 or 9. Thus, the possibilities for each output cell is the following eight cases:

8 ⊕ 8 ⊕ 8 = 8, 8 ⊕ 8 ⊕ 9 = 9, 8 ⊕ 9 ⊕ 8 = 9, 8 ⊕ 9 ⊕ 9 = 8, 9 ⊕ 8 ⊕ 8 = 9, 9 ⊕ 8 ⊕ 9 = 8, 9 ⊕ 9 ⊕ 8 = 8, 9 ⊕ 9 ⊕ 9 = 9.

In any case, each output cell belongs to S, thus S = MixColumn(S). 64 CHAPTER 4. INVARIANT SUBSPACE ATTACK ON MIDORI64

The KeyAdd operation is the same as the whitening key addition, i.e. :

S = KeyAdd(S, RK i ∈ K) .

Summary

As αi ∈ K, any weak key K0,K1 ∈ K leads to WK ∈ K and RK i ∈ K. Let P ∈ S. Then, the state after the whitening key addition becomes S = KeyAdd(P ∈ S, WK ∈ K). Then, the following round function is iterated by incrementing the round number i.

S = SubCell(S) , S = ShuffleCell(S) , S = MixColumn(S) , S = KeyAdd(S,RKi ∈ K) .

As a result, regardless of the number of rounds applied, the state belongs to S with probability 1. The last round consists of only SubCell and the whitening key addition, which does not break the property. This completes the proof of Proposition 4.1. By following the notation in [76], the affine subspace 8⊕{0, 1} is mapped to itself with SubCell, ShuffleCell, MixColumn and KeyAdd when RKi ∈ K.

Experiments and success probability

We implemented our invariant subspace distinguisher and verified its correct- ness. Some examples are shown in Table 4.2.

Table 4.2: Experimental data

Example 1 Example 2 Example 3

K0 0000000000000000 1100110011001100 0000101001001110 K1 0000000000000000 0011001100110011 1101010100010001 P 8888888888888888 9999999999999999 9889898898898989 C 9998899889888899 8999999988988989 9999988988898889

One can easily evaluate the probability of a false positive result, a random encryption outputs C∗ ∈ S with a probability of (2−3)16 = 2−48. Therefore, from Proposition 4.1 we see that with probability of almost 1, we can distinguish Midori64 from a random permutation.

Computer search of invariant subspaces

We performed a computer search to detect the largest weak-key class. We exhausted all possible subspaces of cell values in the plaintext (each cell 4.2. INVARIANT SUBSPACE ATTACK ON MIDORI64 65 belongs to the same subset) and all possible values of secret key cells (similarly, they all belong to another subset). As there are 16 values for the cells in each of the two cases, the exhaustive search required around 216 · 216 = 232 time. We found five subspaces, all subsets of the original subspace. Thus, we can conclude that no larger weak-key classes of the analysed type exist in Midori64. We emphasise that while the generic search algorithm presented in [76] could detect Midori64’s invariant subspaces, about 22(64−16) = 296 oper- ations are required for exhausting all the possibilities. Even with their advanced probabilistic search, about 50 × 264−16 ≈ 253.6 operations are re- quired. Indeed, the time complexity of this generic algorithm decreases exponentially with the dimension of the subspace, making it harder to detect small subspaces like in Midori64 (apart from being generic, reducing the search space to something feasible is possible by using the specific structure of Midori64’s round function). In contrast, the exhaustive analysis we present here has been found by careful analysis of the components of the cipher without using the generic invariant subspace detection algorithm.

4.2.2 Key-recovery with invariant subspace attack

In this section, we describe how a chosen plaintext P and its corresponding ciphertext C satisfying the subspace distinguisher can be used to efficiently recover the 128-bit weak key. Because the size of the weak-key class is 232, the exhaustive search on the entire weak-key space requires 232 computations. Hence, our goal is to recover the key in time less than 232. The main observation pertains to the behaviour of the S-box on the subset S. Indeed, the S-box Sb0 used in Midori64 has four fixed points S ⊂ {3, 7, 8, 9}. Consequently, under the assumption that IS ∈ S, the S-box behaves like the identity mapping, which in turn makes the full Midori64 cipher linear.

Therefore, recovering the 128-bit key K = K0||K1 can be done by writing the system of linear equations between P ∈ S and C ∈ S. To describe the system, we denote by k0, . . . , k15 the 16 variables from K0, and by k16, . . . , k31 the 16 variables from K1. We emphasise that ki ∈ K, since we assume that K belongs to the weak-key class K.

    k0 k4 k8 k12 k16 k20 k24 k28 k k k k  k k k k   1 5 9 13  17 21 25 29 K0 =   ∈ K ,K1 =   ∈ K , k2 k6 k10 k14 k18 k22 k26 k30 k3 k7 k11 k15 k19 k23 k27 k31

Similarly, we denote the 16 known variables of the plaintext P by p0, . . . , p15 and the 16 known variables of the ciphertext C by c0, . . . , c15, 66 CHAPTER 4. INVARIANT SUBSPACE ATTACK ON MIDORI64 that is:

    p0 p4 p8 p12 c0 c4 c8 c12 p p p p  c c c c   1 5 9 13  1 5 9 13 P =   ∈ S ,C =   ∈ S . p2 p6 p10 p14 c2 c6 c10 c14 p3 p7 p11 p15 c3 c7 c11 c15

With this notation, we have a linear system of 16 equations (see Table 4.3), where there are 32 unknowns. The notation k[a,b,c], similarly for p and c, defines the XOR-sum of the variables, i.e. k[a,b,c] = ka ⊕ kb ⊕ kc.

Table 4.3: Linear system of equations for Midori64 key-recovery k[0,11,14,15,21,22,23,26,28,29,30,31] = p[0,5,6,7,10,11,12,13] ⊕ c[5,6,7,10,12,13,14,15] k[1,11,19,24,26,29,31] = p[1,3,8,10,11,13,15] ⊕ c[3,8,10,13,15] ⊕ 1 k[2,14,19,21,22,23,24,28,30,31] = p[2,3,5,6,7,8,12,15] ⊕ c[3,5,6,7,8,12,14,15] k[3,15,19,24,25,29] = p[8,9,13,15] ⊕ c[3,8,9,13] ⊕ 1 k[4,11,13,15,22,25,27,28,29,30] = p[4,6,9,12,14,15] ⊕ c[6,9,11,12,13,14] ⊕ 1 k[5,14,22,23,25,28,29,30] = p[5,6,7,9,12,13] ⊕ c[6,7,9,12,13,14] ⊕ 1 k[6,13,14,15,22,25,28,29] = p[9,12,14,15] ⊕ c[6,9,12,13] k[7,13,14,15,23] = p[13,14,15] ⊕ c7 k[8,15,24,29] = p[13,15] ⊕ c[8,13] k[9,11,13,14,24,28] = p[8,9,11,12,13,14] ⊕ c[8,12] k[10,11,25] = p[9,10,11] ⊕ c9 ⊕ 1 k[12,13,14,15,29] = p[12,14,15] ⊕ c13 k[16,19,24,25,29,31] = p[0,3,8,9,13,15] ⊕ c[0,3,8,9,13,15] ⊕ 1 k[17,19,22,23,24,25,26,27,28,31] = p[1,3,6,7,8,9,10,11,12,15] ⊕ c[1,3,6,7,8,9,10,11,12,15] k[18,19,21,22,23,24,28,29,30,31] = p[2,3,5,6,7,8,12,13,14,15] ⊕ c[2,3,5,6,7,8,12,13,14,15] ⊕ 1 k[20,22,23,25,28,29] = p[4,6,7,9,12,13] ⊕ c[4,6,7,9,12,13] .

The system is undetermined, the set of solutions contains 216 elements, which provides 216 key candidates for the 128-bit secret key K. Using an additional known plaintext-ciphertext pair, we uniquely determine the key in 216 operations. More precisely, the above system of equations de- scribes a Gröbner Basis so that one can simply enumerate all the 216 values for k0, k1, k2, k3, k4, k5, k6, k7, k8, k9, k10, k12, k16, k17, k18, k20 ∈ K and uniquely and efficiently determine the remaining 16 key variables.

4.3 Extended Analysis: Weaker Constant

The selection of the round constants (which currently have cells that are either 0 or 1) certainly has contributed towards the existence of the invariant subspace for the whole cipher. There are, however, round constants that allow even larger invariant subspaces. We further describe such constants. Similarly to the original selection, we assume that all cells of the round constants belong to a particular set RC which is a proper subset of {0, 1}4. 4.3. EXTENDED ANALYSIS: WEAKER CONSTANT 67

An analysis of the S-box Sb0 reveals possible values for RC. More precisely, we first find all possible affine invariant subspaces for the S-box1, that is, Sb0(u ⊕ A) = v ⊕ A. Subsequently, if RC ⊆ A, then the space is stable by the addition of the round constants. For instance, in the original Midori64, u = 8, v = 8,A = {0, 1} and RC = {0, 1} ⊆ A. A computer search shows that there are several affine subspaces for Sb0, some even of size 4 (refer to Table 4.4). For example, u = 2, v = d, A = {0, 5, a, f} is an affine invariant subspace for Sb0. For this subspace, if RC ⊆ A, then the weak-key class would be larger: each subkey cell can take any of the values from u ⊕ v ⊕ A, thus the size of the weak-key class would become 264.

Table 4.4: Affine invariant subspaces for Sb0 u v A u v A 0 c 0 c 3 3 0 4 1 a 0 b 4 e 0 a 1 a 0 2 9 b 5 b 0 e 2 d 0 f 5 b 0 2 c e 2 d 0 5 a f 6 f 0 9 3 3 0 b 7 7 0 f 3 3 0 a 7 7 0 e 3 3 0 7 a d 8 8 0 1

The modified constants not only permit distinguishers for larger weak-key classes, but lead to a key-recovery for the classes. Note, in our key-recovery attack on Midori64 with the original constants, we have used the fact Sb0(8) = 8 and Sb0(9) = 9, thus it was possible to model the S-box as a simple identity function, which in turn made the whole encryption to behave as a linear function. In general, as long as the S-box behaves as an affine function on the invariant subspace (for the S-box), the key-recovery can be reduced to solving a system of linear equations, e.g. any 2-bit permutation is an affine mapping. Let us focus on the above example u = 2, v = d, A = {0, 5, a, f}, that is

Sb0(2 ⊕ 0) = d ⊕ 0, Sb0(2 ⊕ 5) = d ⊕ a,

Sb0(2 ⊕ a) = d ⊕ 5, Sb0(2 ⊕ f) = d ⊕ f.

We need to find a linear function l(x), such that l(0) = 0, l(5) = a, l(a) = 5, l(f) = f, hence, Sb0(x) = l(2⊕x)⊕d on the points from A. By solving the system of linear equations l(5) = a, l(a) = 5, where l is represented as a 4×4 binary matrix of unknowns, we deduce that l(x) = l(x3x2x1x0) = x2x3x2x3, where x0 is the LSB. Similarly to the previous discussion from Section 4.2.2, we note that the remaining operations in the cipher are all linear, thus the

1This is in line with the discussion presented in [76], see Lemma 6. 68 CHAPTER 4. INVARIANT SUBSPACE ATTACK ON MIDORI64 whole encryption becomes a linear function. Hence, again the key can be recovered by solving a system of linear equations. 0 0 The search space can be enlarged to Sb0(u ⊕ A) = v ⊕ A . If A ∩ A contains values other than 0 and RC ⊆ A∩A0, the space is stable by addition of the round constants. There is a large number of such subspaces when |A| = |A0| = 2. In addition, we also found two cases where |A| = |A0| = 4. Those two cases are shown in Table 4.5.

0 0 Table 4.5: Affine invariant subspaces for Sb0 with A 6= A , |A| = |A | = 4 u A v A0 c 0 1 2 3 0 0 2 4 6 0 0 5 a f 1 0 7 a d

4.4 Concluding Remarks

We have presented an invariant subspace attack against full Midori64. We have shown that Midori64 has a class of 232 weak keys, and with such keys along with a properly chosen plaintext, the cipher becomes a linear transformation and thus can be distinguished with a single chosen-plaintext query. Furthermore, the key-recovery can be performed simply by solving a system of linear equations. Our attack cannot be applied to Midori128. The difficulty comes from the usage of four different S-boxes, SSb0, SSb1, SSb2 and SSb3. To apply the invariant subspace attack, all the S-boxes must have an identical affine subspace transition, and this is unlikely to occur. Our attacks can be prevented with a change of the round constants of Midori64. On the other hand, there exist constants that allow even larger weak-key classes. To eliminate the dependency on the round constants, we investigate (in Chapter 6) alternative S-boxes for Midori64 that provide cer- tain level of security against the found invariant subspace attacks, regardless of the choice of the round constants. Post-cryptanalysis. The designers of Midori acknowledged our analysis. In 2016, [121] introduced a new cryptanalysis technique called non-linear invariant attack (can be regarded as an extension of linear cryptanalysis and similar attack to invariant subspace attack) and presented a weak-key attack on Midori64 with a larger key size. Diffusion and Substitution Layers

69

Chapter 5

Lightweight MDS Diffusion Matrices

Diffusion matrices are often used in the P-layer of an SPN cipher. While most of the SPN ciphers adopt diffusion matrices of order 4, there are block ciphers like KHAZAD [9] that use 8 × 8 diffusion matrices. Other symmetric-key cryptographic primitives may also use diffusion matrices as building blocks. For instance, the PHOTON family of lightweight hash functions [50] uses diffusion matrices of various orders. Several designs use a weak yet fast diffusion layer based on simple XOR, addition and rotation/shifting operations, but another trend is to rely on strong linear diffusion matrices, such as maximum distance separable (MDS) matrices which have optimal diffusion property. This is the case of AES for example. However, very often the price for having strong diffusion property is the heavy implementation cost, in either software or hardware implementations. Therefore, it is of interest to search for lightweight1 MDS diffusion matrices. Contribution of this chapter. In this chapter, we look at two types of matrix structures, namely Hadamard and circulant matrices, which are commonly used as the diffusion layer. In Section 5.2, we look at the various properties of those matrices, and propose a new type of matrices known as cyclic matrices, a generalisation of the circulant structure inheriting the benefits of circulant matrices while opening up to new possibilities. In Section 5.3, we propose equivalence classes of matrices that preserves the branch number which will help us to significantly reduce the search space. As a result, we managed to complete the search for most lightweight MDS matrices of higher orders that were previously infeasible. Finally, we apply the results from previous sections to search for the lightest MDS matrices and present our findings in Section 5.4. For the convenience of our discussion, we consider the matrix entries to be finite field elements. However, we emphasise that our equivalence classes are

1One may argue that MDS diffusion matrices of higher order, say 8, are not lightweight. However, we refer diffusion matrices as lightweight in the sense that they have low implementation cost for given metric and parameters.

71 72 CHAPTER 5. LIGHTWEIGHT MDS DIFFUSION MATRICES independent of the entry type used in the matrices. Hence, our equivalence classes are generic and can be applied to any entry types.

5.1 Introduction

5.1.1 Motivation

One approach to construct lightweight MDS matrices is to use some special matrix construction to build diffusion matrices that are naturally MDS [15,52] and check what are the lightest matrix we can find in that set. Although the construction is efficient, it only covers a subclass of a specific matrix type. An opposite approach is to pick the lightest matrix from some matrix type and check for MDS property, and extend the search to the next lightest matrix if that candidate was not MDS. Such method basically consists of an exhaustive search. Exhaustive search for lightest MDS matrices. In more details, we first select a set of elements with the lowest possible implementation cost to be the entries of a particular matrix type. Next, we exhaust all possible ordering of the elements to see if any of them leads to an MDS matrix. If none of them leads to an MDS matrix, we pick the next set of elements with the lowest implementation cost and repeat the process. When there is a positive result, by the nature of our search procedure, we are sure to have found the lightest MDS diffusion matrix for that matrix type. The drawback of this method is the potentially very large search space. Take a matrix of order m that can be defined by its first row as an example. For a (multi-)set of m elements, there are up to m! ways to permute them. Thus for every candidate, we have to run the check for MDS property m! times. The complexity increases exponentially as the matrix size increases and quickly becomes intractable. Our strategy. We observed that certain arrangements of the elements always have the same branch number because they are permutation equiva- lents (Definition 2.21). Using this equivalence relation, we can partition the matrices into equivalence classes where matrices within the same equivalence class have the same branch number. Thus, it is sufficient to check one repre- sentative from each equivalence class to exhaust all possible permutations. Therefore, our main goal now is to identify the permutations that preserve the branch number of the diffusion matrices.

5.1.2 Matrix types

We consider the two most commonly used matrix types for building diffusion matrices — Hadamard and circulant matrices. Perhaps one reason that Hadamard and circulant matrices are popular is because each row is repeating, thus substantially simplifying the implementation and analysis. 5.1. INTRODUCTION 73

Hadamard matrices One advantage of Hadamard matrices is that it can easily be transformed into an involutory matrix, which allows designers to reuse the same component for both encryption and decryption process (thus saving area in hardware implementations). Definition 5.1. A Hadamard matrix H of order m = 2h is a matrix that can be represented by two other submatrices H1 and H2 which are also Hadamard matrices: H H ! H = 1 2 . H2 H1

We denote the matrix as Had(h0, h1, ..., hm−1), where hi’s are the entries of the first row of the matrix. The (i, j)-entry of H can be expressed as H[i, j] = hi⊕j. Example 5.2. A Hadamard matrix Had(a, b, c, d) is expressed as a b c d b a d c     . c d a b d c b a 4

Circulant and cyclic matrices Another common way to build an MDS matrix is to start from a circulant matrix as a random circulant matrix has a much higher probability to be MDS than a random square matrix [38]. In addition, MDS circulant matrices can contain identical low implementation cost entries in a row, as seen in the AES diffusion matrix, making it potentially lighter than matrices with distinct entries in a row. Definition 5.3. A circulant matrix C of order m is a matrix where each subsequent row is a right rotation of the previous row. We denote the matrix as cir(c0, c1, ..., cm−1), where ci’s are the entries of the first row of the matrix. The (i, j)-entry of C can be expressed as C[i, j] = c(j−i) mod m. One can also define left-circulant matrix, L, where each subsequent row is a left rotation of the previous row, we denote it as lcr(c0, c1, ..., cm−1). The (i, j)-entry of L can be expressed as L[i, j] = c(i+j) mod m. Note that since > L[j, i] = c(j+i) mod m = c(i+j) mod m = L[i, j], L is symmetric, i.e. L = L. Example 5.4. A circulant matrix cir(a, b, c, d) and a left-circulant lcr(a, b, c, d) can be expressed as follows: a b c d a b c d d a b c b c d a     cir(a, b, c, d) =   , lcr(a, b, c, d) =   . c d a b c d a b b c d a d a b c 4 74 CHAPTER 5. LIGHTWEIGHT MDS DIFFUSION MATRICES

One observation on circulant and left-circulant matrices is that each subsequent row is obtained from the previous row through a same permutation rule. Hence, we generalise the circulant structure by considering other permutations.

Definition 5.5. A cyclic matrix Cρ of order m is a matrix where each subsequent row is some permutation ρ of the previous row, where ρ is a cycle of length m. We denote the matrix as cycρ(c0, c1, ..., cm−1), where the ci’s are the entries of the first row of the matrix. The (i, j)-entry of Cρ can be expressed as Cρ[i, j] = cρi(j). We require the permutation to be a cycle of length m to avoid identical rows, which is a necessary condition for a matrix to be MDS.

Example 5.6. Considering the cycle permutation ρ = (0 2 1 3), we can express cycρ(a, b, c, d) as follows

 (a, b, c, d)  a b c d  ρ(a, b, c, d)  d c a b      2  =   , ρ (a, b, c, d) b a d c ρ3(a, b, c, d) c d b a where the collection of the permutations of each row forms a cyclic group of order 4, h(0 2 1 3)i = {(), (0 2 1 3), (0 1)(2 3), (0 3 1 2)}. 4

Observe that the permutation ρ is an element of the symmetric group Sm, and the collection of the permutations of the m rows of the matrix forms a cyclic group, hence the name cyclic matrices.

5.2 Matrix Properties

We first recall some known properties of Hadamard matrices. Next, we discuss the properties of cyclic matrices and their relation with circulant matrices. For conciseness, we denote matrices that are involutive and MDS as IMDS matrices, and orthogonal and MDS matrices as OMDS matrices.

5.2.1 Properties of Hadamard matrices

Involutory Hadamard matrices Proposition 5.7. A Hadamard matrix over GF(2s) is involutive if and only if the XOR-sum of its first row is 1.

2 Proof. By doing a matrix multiplication, we obtain H × H = c · Im, where 2 2 2 2 2 c = h0 +h1 +h2 +...+hm−1 and Im is an identity matrix of order m. In other words, the product of a Hadamard matrix with itself is a multiple of an identity 2 2 2 matrix, where the multiple c can be rewritten as c = (h0 + h1 + ... + hm−1) 5.2. MATRIX PROPERTIES 75

(since the characteristic of GF(2s) is 2). Thus, a necessary and sufficient condition for a Hadamard matrix to be involutive is having XOR-sum of its first row to be 1.  By Proposition 5.7, we can make a Hadamard matrix an involution simply by multiplying the matrix with the inverse of the XOR-sum of its first row (assuming it is non-zero, otherwise it is singular and non-invertible).

Necessary condition for MDS Hadamard matrices

Proposition 5.8. A necessary condition for Hadamard matrix to be MDS is to have its first row containing 2h distinct entries.

Proof. Suppose hi = hj, where i =6 j, it is easy to see that H[0, i] = H[0, j] = H[i ⊕ j, i] = H[i ⊕ j, j]. By Corollary 2.19, the Hadamard matrix is not MDS. 

5.2.2 Properties of cyclic matrices

Since there are (m − 1)! cycles of length m, there are as many as (m − 1)! different types of cyclic matrices of order m; hence it is infeasible to analyse every single cyclic structure. However, using results from elementary group theory, we can relate cyclic matrices to circulant matrices in terms of branch number and elegantly reduce the problem to simply analysing the circulant matrices.

Relation to circulant matrices

In a nutshell, we show that any cyclic matrix is permutation equivalent to some circulant matrix. More preciously, there is a bijection between the cyclic and circulant matrices satisfying ∼P (see Definition 2.21). To prove this, we use the following proposition from elementary group theory.

Proposition 5.9. [97, Ch. 5.3] Any two permutations ρ, τ which have the same cycle type are conjugate in Sm.

That is to say, there exists a permutation σ ∈ Sm such that σρ = τσ. To compute σ, we place one permutation above the other and view it as a Cauchy’s 2-line notation for permutation.

Example 5.10. Let ρ = (0 2 1 3) (from Example 5.6) and τ = (0 1 2 3) (from Example 5.4), viewing it as a Cauchy’s 2-line notation, we have

0 2 1 3! , 0 1 2 3 from which we see that 0 and 3 are fixed while 1 and 2 are swapped. Therefore, we obtain σ = (1 2) and we can verify that σρ = τσ. 4 76 CHAPTER 5. LIGHTWEIGHT MDS DIFFUSION MATRICES

Theorem 5.11. Given an ordered set S with m elements and some cyclic matrix structure, there exists a bijection between the cyclic matrices and the circulant matrices satisfying the relation ∼P , where both sets of matrices are generated by some index permutation on S.

Proof. Let the permutation of the cyclic matrix be ρ and the permutation of the circulant matrix be τ = (0 1 2 ... m − 1). By Proposition 5.9, there exist some permutation σ such that σρ = τσ. Hence for any row i ∈ {0, 1, ..., m − 1}, we have σρi = τ iσ. In the form of a matrix, the permutation for each row of the matrices can be expressed as

 σ(S)   σ(S)       σ ◦ ρ(S)   τ ◦ σ(S)       σ ◦ ρ2(S)   τ 2 ◦ σ(S)    =   ,  .   .   .   .      σ ◦ ρm−1(S) τ m−1 ◦ σ(S) where σ in the cyclic matrix can be viewed as a column permutation, while in the circulant matrix it is an index permutation on S. Therefore, by Proposition 2.20, the cyclic matrix has the same branch number as a circulant matrix that undergoes index permutation σ. Lastly, one can easily infer that for any index permutation σ0 on the cyclic matrix, it corresponds to a circulant matrix that undergoes index 0 permutation σ ◦ σ .  Example 5.12. Consider a cyclic matrix of order 4 with permutation ρ = (0 2 1 3), while the circulant matrix has τ = (0 1 2 3). From Example 5.10, we have σ = (1 2) that satisfies σρ = τσ. Applying column permutation σ on the cyclic matrix and index permutation σ on the circulant matrix, we obtain the same matrix as follows

a b c d a c b d a b c d d c a b col perm σ d a c b index perm σ d a b c         −−−−−−→   ←−−−−−−−   . b a d c b d a c c d a b c d b a c b d a b c d a

4

This theorem shows that for any cyclic matrix, we have some column permutation σ that transforms it into a circulant matrix (or any other cyclic matrix) while preserving the branch number.

Corollary 5.13. Any cyclic matrix corresponds to some circulant matrix preserving the entries and the branch number.

This is immediate from Theorem 5.11 and the fact that their entries are the same up to some permutation. Thus, we can analyse circulant matrix structure and extend the results to cyclic matrices naturally. 5.2. MATRIX PROPERTIES 77

Necessary condition for MDS cyclic matrices

Unlike Hadamard matrices, MDS circulant matrices can contain the same elements in a row. For instance, the AES diffusion matrix has two ‘1’s in each row. However, it is also trivial to verify whether there are three ‘1’s in each row of a circulant matrix of order 4; if so, it will not be MDS regardless of the choice of the last element. Therefore, it is essential for us to understand the necessary condition for the matrix to be potentially MDS to avoid wasting resources searching for MDS matrices when they do not exist. Given an ordered multi-set of entries {c0, c1, ..., cm−1}, suppose that two of the entries are the same, we can denote it by ci = c(i+d) mod m, where we assume distance d ≤ bm/2c. Suppose d > bm/2c, we can relabel j = (i + d) mod m and see that c(j+m−d) mod m = cj, where the distance m − d ≤ bm/2c.

Lemma 5.14. An MDS circulant matrix of even order m does not have ci = ci+m/2.

Proof. Suppose that there exists ci = ci+m/2. Considering the submatrix of order 2 by taking row 0 and m/2, and column i and i + m/2, we have

c c ! i i+m/2 . c(i−m/2) mod m ci

Since i − m/2 ≡ i + m/2 (mod m), we have a singular submatrix and by Proposition 2.17, there is a contradiction. 

Lemma 5.15. An MDS circulant matrix does not have ci = ci+d and cj = cj+d, where i 6= j.

Proof. Suppose that there exist ci = ci+d and cj = cj+d, where i < j. Consider the submatrix of order 2 by taking row 0 and (i − j) mod m, and column i and i + d, we have

c c ! i i+d , cj cj+d

Since these two columns are identical, we have a singular submatrix and by Proposition 2.17, there is a contradiction.  Theorem 5.16. For order m ≤ 8, a necessary condition for a cyclic matrix to be MDS is to have its first row containing:

Type 0: m distinct entries; Type 1: 1 pair of repeated entries; Type 2: 2 pairs of repeated entries; Type 3: 3 pairs of repeated entries; Type 4: or 3 repeated entries. 78 CHAPTER 5. LIGHTWEIGHT MDS DIFFUSION MATRICES

Proof. From Lemma 5.14 and Lemma 5.15, we can infer that an MDS circulant matrix of order m allows at most b(m − 1)/2c possible distinct distances d. If there are 2 distinct distances, we can have at most 2 pairs of identical entries but not 3 identical entries. This is because if some element has multiplicity 3, say ci = ci+d1 = ci+d2 , then the three distances d1, d2, d2 − d1 must be pairwise distinct. For order m = 7 or 8, it allows 3 possible distinct distances and thus there are at most 3 pairs of repeated entries or 3 repeated entries. It also implies that any higher multiplicity is impossible for an MDS circulant matrix of order 8 as the number of pairwise equalities is more than 3 (a similar property that an MDS matrix of order 8 has at most 24 ‘1’s was proved in [63]). By Corollary 5.13, we can see that this necessary condition also applies to cyclic matrices.  In Table 5.1, we list all the possible types of MDS cyclic matrices for order m ≤ 8. These results can also be extended to higher order cyclic matrices. Table 5.1: Possible types of MDS cyclic matrices of order m ≤ 8

Order Possible d m distinct 1 pair 2 pairs 3 pairs 3 repeated 3 {1}   4 {1}   5 {1, 2}    6 {1, 2}    7 {1, 2, 3}      8 {1, 2, 3}     

Existence of IMDS cyclic matrices In [61], the authors proved that there does not exist IMDS circulant matrix of order 4, and this result was later extended to circulant matrix of any sizes by [53]. Despite the similarity between cyclic and circulant matrices in terms of branch number, the involution property of circulant matrices may not hold for cyclic matrices, which gives us an insight that there might exist IMDS cyclic matrices. In search for IMDS cyclic matrices, we chose left-circulant matrices for their simplicity, close resemble to circulant matrices, and most importantly, a left-rotation permutation exists for any matrix size. However, there are some dimension of left-circulant matrices that can never be IMDS as we will prove in the following. One could easily observe that reversing the order of the rows or the columns of a left-circulant matrix makes it a circulant matrix and vice versa. From this, we obtain the following results. Lemma 5.17. Given some left-circulant matrix L, the two circulant matrices Lr and Lc, obtained from L by reversing the order of the rows and the columns > respectively, are the transpose of each other. i.e. Lr = Lc. 5.3. COMPACT EQUIVALENCE CLASSES OF MATRICES 79

Proof. For any element at the (i, j)-entry of L, reversing the order of the rows will reposition it to the (m − 1 − i, j)-entry of Lr. On the other hand, reversing the order of the columns will reposition the (j, i)-entry of L (which has the same element since L is symmetric) to the (j, m − 1 − i)-entry of Lc. Thus, we have Lr[m − 1 − i, j] = Lc[j, m − 1 − i]. 

From Lemma 5.17, we can derive the following proposition.

Proposition 5.18. A left-circulant matrix L is IMDS if and only if the circulant matrix, obtained by reversing the order of the rows or columns of L, is OMDS.

Proof. Let L be an IMDS left-circulant matrix and L = Lc × P = P × Lr, where P is the permutation matrix that reverses the order of the rows or columns, we have

I = L × L = (Lc × P) × (P × Lr) = Lc × Lr .

From Lemma 5.17, we know that these circulant matrices, Lc and Lr, are orthogonal. By Proposition 2.20, both matrices have the same branch number as L. Therefore, they are OMDS and its converse holds trivially. 

In [54], the authors proved that 2h × 2h OMDS circulant matrices over GF(2s) do not exist. Thus from Proposition 5.18, we can immediately arrive at the following corollary.

Corollary 5.19. There are no 2h × 2h IMDS left-circulant matrices over GF(2s).

Nevertheless, we found IMDS left-circulant matrices for other matrix sizes, detailed in Section 5.4.

5.3 Compact Equivalence Classes of Matrices

For Hadamard or cyclic matrices, given a (multi-)set of m entries, there are up to m! ways to define the first row of the matrix, which can be expressed by the index permutation σ. The number of possible candidates increases exponentially and it is not feasible to check the branch number for every single one of them. To significantly reduces the search space, we introduce the concept of compact equivalence classes (CEC) of Hadamard/cyclic matrices. Matrices in an equivalence class have the same branch number, thus it is sufficient to check one representative from each equivalence class to exhaust the search. By compact, we meant that it is the largest possible equivalence class for generic Hadamard/cyclic matrices. 80 CHAPTER 5. LIGHTWEIGHT MDS DIFFUSION MATRICES

5.3.1 CEC of Hadamard matrices

Theorem 5.20. Given two Hadamard matrices H and Hσ of order m = 2h, σ H ∼P H if and only if σ is some index permutation satisfying σ(i ⊕ j) = σ(i) ⊕ σ(j) ⊕ σ(0), ∀i, j ∈ {1, ..., m − 1}.

Proof. The backward direction follows directly from the proof for the forward direction. It is trivial to see that for Hadamard matrix of order 2, there are only 2 possible Hadamard matrices and they are permutation equivalent, where the index permutation can be expressed as σ(i) = i ⊕ 1. Thus, we consider Hadamard matrices of higher order. σ Given that H ∼P H , by Definition 2.21, there exists permutation matrices P and Q such that Hσ = PHQ, where P (resp. Q) is a row (resp. column) permutation on H. Suppose σ(0) = a, where a 6= 0, we can introduce an index permutation πa(i) = i ⊕ a to obtain πa(σ(0)) = 0. Note that by the nature of the construction of Hadamard matrices, the index permutation πa corresponds to a row permutation. Therefore, for any σ(0) = a, we can always apply some index permutation πa to fix the first element h0 by σ left-multiplying to H a corresponding row permutation Pa, which gives πa◦σ H = PaPHQ, where (πa ◦ σ)(0) = 0. φ Next, we consider index permutation φ that fixes 0 and H ∼P H . For any 4 × 4 submatrix M of Hφ obtained by taking the same row and column 0, i, j, i ⊕ j, where i 6= j and i, j ∈ {1, ..., m − 1}, one can see that M consists φ φ of 4 entries, namely h0, hφ(i), hφ(j) and hφ(i⊕j). Since H = PHQ, M of H corresponds to some submatrix N of H.

Suppose row r0 of H is one of the rows of N, then the 4 columns of H that make up N are columns r0, r0 ⊕ φ(i), r0 ⊕ φ(j) and r0 ⊕ φ(i ⊕ j). Similarly, for another row r1 of H that is also one of the rows of N, columns r1, r1 ⊕ φ(i), r1 ⊕ φ(j) and r1 ⊕ φ(i ⊕ j) of H make up N. Since both sets of columns describe that same submatrix N of H, we want to find a matching between the two sets of columns

{r0, r0 ⊕φ(i), r0 ⊕φ(j), r0 ⊕φ(i⊕j)} and {r1, r1 ⊕φ(i), r1 ⊕φ(j), r1 ⊕φ(i⊕j)} , where r0 6= r1. This implies

{x, x ⊕ φ(i), x ⊕ φ(j), x ⊕ φ(i ⊕ j)} = {0, φ(i), φ(j), φ(i ⊕ j)} , for some x 6= 0. Thus x ∈ {φ(i), φ(j), φ(i ⊕ j)}. If x = φ(i), then {x ⊕ φ(j), x ⊕ φ(i ⊕ j)} = {φ(j), φ(i ⊕ j)} and thus x ⊕ φ(j) = φ(i ⊕ j), that is, φ(i) ⊕ φ(j) = φ(i ⊕ j). Similarly, we can obtain φ(i ⊕ j) = φ(i) ⊕ φ(j) for the φ other cases. Therefore, if H ∼P H with φ(0) = 0, the index permutation φ satisfies the linear property φ(i ⊕ j) = φ(i) ⊕ φ(j). σ Finally, we can see that if H ∼P H and we consider σ = πσ(0) ◦ φ, then we have σ(i ⊕ j) = σ(i) ⊕ σ(j) ⊕ σ(0).  5.3. COMPACT EQUIVALENCE CLASSES OF MATRICES 81

For simplicity, we call a permutation acting on 2h objects an H-permutation if it satisfies σ(i ⊕ j) = σ(i) ⊕ σ(j) ⊕ σ(0), ∀i, j ∈ {1, ..., 2h − 1} (as in Theo- rem 5.20). Next, we are interested to know the number of CEC of Hadamard matrices for a given matrix size.

Theorem 5.21. Given a set of 2h non-zero elements, there are

(2h − 1)! Qh−1 h i i=0 (2 − 2 ) compact equivalence classes of Hadamard matrices of order 2h.

Proof. The cardinality of each CEC is the number of possible H-permutations σ. First, there are 2h choices for σ(0). Next, for σ(1) and σ(2), there are 2h − 1 and 2h − 2 choices respectively. For σ(3), to satisfy the H- permutation, it is pre-determined by σ(0), σ(1) and σ(2). Therefore, we only choose the mapping for σ(2i) and others are simply linear combinations of i Qh−1 h i {σ(2 )}i=0,...,h−1. This means that there are i=0 (2 − 2 ) possible choices. Since there are 2h! Hadamard matrices in the entire space and cardinality of h Qh−1 h i h Qh−1 h i h each CEC is 2 · i=0 (2 −2 ), we have ]{CEC of Had}·2 · i=0 (2 −2 ) = 2 !, thus the formula. 

Last but not least, we need to know how to generate the representatives for checking the MDS property.

Generating representatives for CEC of Hadamard matrices

From Theorem 5.21, one can see that there is only a single CEC of Hadamard matrices for matrix size 2 and 4. This means that for a given set of entries, all index permutations have the same branch number. For matrices of higher order, we need an algorithm to generate a representative for each CEC. The idea is to exhaust all non-H-permutations. Recall that for an H- permutation, the mapping of the indexes is defined in an ascending order, i and we are free to select a mapping for index 0 and {2 }i=0,...,h−1 from the remaining set of entries. Taking the complement, we fixed the mapping of i index 0 and {2 }i=0,...,h−1 to the smallest element in the remaining set and exhaust the combinations for the mapping of the other indexes.

Example 5.22. Given a set of 8 elements for Hadamard matrices of order 8, the first three entries, h0, h1 and h2 are fixed to be the three smallest elements in ascending order. Next, we pick an element from the remaining set in ascending order as h3, there are 5 choices. After which, h4 is selected to be the smallest element among the remaining 4 elements and we permute the remaining 3 elements to be h5, h6 and h7 respectively, there are 6 possible combinations. In total, there are 30 representatives generated, which coincide with Theorem 5.21. 4 82 CHAPTER 5. LIGHTWEIGHT MDS DIFFUSION MATRICES

5.3.2 CEC of cyclic matrices

σ σ Theorem 5.23. Given two cyclic matrices Cρ and Cρ of order m, Cρ ∼P Cρ if and only if σ is some index permutation satisfying σ(i) = (bi + a) mod m, ∀i ∈ {0, 1, ..., m − 1}, where a, b ∈ Zm and gcd(b, m) = 1. Proof. The backward direction follows directly from the proof for the forward direction. As shown in Theorem 5.11, there is a bijection between cyclic matrices and the circulant matrices that preserve the branch number. Thus it is sufficient to present the proof with circulant matrices. σ Assume that C ∼P C . By Definition 2.21, there exists permutation matrices P and Q such that Cσ = PCQ, where P (resp. Q) is a row (resp. column) permutation on C. Since C is circulant, one can observe that if Cσ0 = PC, then the row 0 of Cσ0 is some row of C, this implies that the 0 index permutation σ can be expressed as πa(i) = (i + a) mod m. That is, πa corresponds to a row permutation Pa. Therefore, for any σ(0) = a, we can always apply some index permutation π−a to fix the first element c0 σ by left-multiply to C a corresponding row permutation P−a, which gives π−a◦σ C = P−aPCQ, where (π−a ◦ σ)(0) = 0. Next, we consider some index permutation that fixes 0. Suppose Cφb = PCQ, where φb(0) = 0 and φb(1) = b, then the column permutation Q φb maps column b of C to column 1 of C , and cb is on the right of c0 in row 0 of Cφb . Similarly, the row permutation P maps row m − b of C φb φb to row m − 1 of C for column 0 of C to have cb at the bottom row. φb By the definition of index permutation, we know that C [0, 2] = cφb(2). On the other hand, by the definition of circulant matrices we know that Cφb [0, 2] = Cφb [m − 1, 1]. Since the pre-image of row m − 1 and column 1 of Cφb is row m − b and column b of C, we can express that entry of Cφb as an entry of C, that is Cφb [m − 1, 1] = C[m − b, b]. And again by the definition of circulant matrices, that entry is cb−(m−b) mod m = c2b mod m. That is to say, by defining φb(1) = b, we have restricted the permutation of the next index to be φb(2) = 2b mod m. Following the same argument, we can conclude that φb(i) = bi mod k. In addition, we must have gcd(b, k) = 1 so that φb is a permutation on {0, 1, ..., m − 1}. σ Finally, we can see that if C ∼P C then σ = πa ◦ φb, that is, σ(i) = (bi + a) mod m.  For simplicity, we call a permutation acting on m objects a C-permutation if it satisfies σ(i) = (bi + a) mod m, ∀i ∈ {0, 1, ..., m − 1}, where a, b ∈ Zm and gcd(b, m) = 1 (as in Theorem 5.23). Again, we are interested to know the number of CEC of cyclic matrices for a given matrix size.

Theorem 5.24. Given a set of m non-zero elements, there are

(m − 1)! ϕ(m) 5.3. COMPACT EQUIVALENCE CLASSES OF MATRICES 83 compact equivalence classes of cyclic matrices of order m, where ϕ(m) is the Euler’s totient function.

Proof. The cardinality of each CEC is the number of possible C-permutations σ. There are m possible values for a, and b has to be coprime with m, which means there are ϕ(m) possible values for b. Since there are m! cyclic matrices in the entire space and the cardinality of each CEC is m · ϕ(m), we have ]{CEC of cyc}· m · ϕ(m) = m!, thus the formula.  Again, we want to know how to generate the representatives.

Generating representatives for CEC of cyclic matrices

We present the algorithm with circulant matrices as an example, from the representatives of circulant matrices one can easily convert them to representatives of any cyclic matrices. The idea is to exhaust all non-C- permutations. But before that, we introduce some notation and observations. The position i, where 0 ≤ i ≤ m − 1, is the i-th entry of the first row of the circulant matrix of order m. For example, if the first row of a circulant matrix of order 4 is given as (c1, c3, c2, c0), we can describe it as index 1 is in position 0, index 3 is in position 1, etc. A coprime position is a position i where i is coprime with m. In this case, indices 0 and 3 are in coprime positions, while the others are in non-coprime positions. Recall that σ is a C-permutation if it is of the form σ(i) = (bi + a) mod m. We can always have a representative with index 0 at position 0 by applying a C-permutation with b = 1 and a ≡ −z (mod m), where index 0 is in position z. Therefore, without loss of generality, we fixed index 0 at position 0 and hence a = 0. One can observe that by applying some C-permutation σ(i) = bi mod m, the indexes in coprime positions remain in coprime positions, similarly for the indexes in non-coprime positions. This is a direct result from the modulo arithmetic and we omit the proof here. Using this observation, we can choose any index which is in a coprime position x to be in position 1 by applying a C-permutation with b ≡ x−1 mod m. Without loss of generality, we choose the smallest index among those in coprime positions to be in position 1 for the representatives. With that, we can generate one representative for each equivalence class by first fixing index 0 in position 0. Next, partition the remaining indexes into the coprime and non-coprime positions, choose the smallest index among those in coprime positions to be in position 1. Finally, permutate the remaining indexes within the coprime and non-coprime positions independently. The total number of representatives generated is

m − 1! (m − 1)! × (ϕ(m) − 1)! × (m − 1 − ϕ(m))! = , ϕ(m) ϕ(m) which matches Theorem 5.24. 84 CHAPTER 5. LIGHTWEIGHT MDS DIFFUSION MATRICES

Example 5.25. For a circulant matrix of order 4, we let c0 to be in position 0. We partition the remaining indexes into coprime and non-coprime positions, there are 3 ways to partition them, namely (1, 2 | 3), (1, 3 | 2) and (2, 3 | 1). Next, we pick the smallest index among those in coprime positions to be in position 1. Finally, we permutate remaining elements, however for this case both coprime and non-coprime positions only have 1 element left. Thus we have a total of 3 representatives, namely (c0, c1, c3, c2), (c0, c1, c2, c3) and (c0, c2, c1, c3). 4

5.4 Search and Results

In this section, we discuss the search methodology that we used and present the new lightweight involutory and non-involutory MDS diffusion matrices with various parameters.

5.4.1 Search methodology

We first describe the parameters that we consider, followed by the search strategies that we adopt.

Architecture, metric and parameters

Round-based implementation. In hardware, round-based implementa- tions are probably the most important use-case. It consists in implementing an entire round of a cipher in a single clock cycle. Another popular archi- tecture is the serial implementation which, in a nutshell, reduces the area required at the cost of more clock cycles. Such implementation uses matrix types like serial matrices which have the recursive MDS property, i.e. the matrix is MDS when raised to a certain power2. However in general, the amount of area saved is not proportional to the increment in the number of clock cycles. As a result, the serial im- plementations usually have high latency and consume more energy than the round-based implementation. Having said that, we are interested in looking at the implementation cost of the diffusion matrices in round-based implementation. XOR count of matrix entry. The way to perceive and estimate the implementation cost of the diffusion layer has evolved over time. It was a common belief that finite field elements with low Hamming weight have lower hardware implementation costs. In 2014, Khoo et al. [68] proposed estimating the implementation cost by counting the number of XOR gates needed to implement field multiplication from the multiplication matrix of the field elements. They also showed that, unlike the common belief, higher Hamming

2We also studied serial-type matrices in [122], but we omit the discussion from this thesis. 5.4. SEARCH AND RESULTS 85 weight elements may also have low implementation cost. In this thesis, like other works [78,107], we adopt this metric to estimate the implementation cost of diffusion matrix3. Definition 5.26 (XOR count [68]). The XOR count of a finite field element α ∈ GF(2s)/p(X) is a metric to estimate the number of XOR operations needed to implement the field multiplication by α: x → αx. It is counted as the Hamming weight of the multiplication matrix Mα minus the number of rows, and denoted by XOR(Mα) = wt(Mα) − n, where wt(Mα) is the number of ‘1’s in Mα.

XOR count of diffusion matrix. To quantify the implementation cost of a diffusion matrix, we count the cost for implementing the multiplication of individual entries and the connecting XORs. From Corollary 2.18, we know that the number of connecting XOR is fixed for given parameters.

Example 5.27. The AES diffusion matrix can be implemented as follows,

02 03 01 01 a  02 · a ⊕ 03 · b ⊕ c ⊕ d    b  a ⊕ · b ⊕ · c ⊕ d 01 02 03 01    02 03      =   , 01 01 02 03 c  a ⊕ b ⊕ 02 · c ⊕ 03 · d 03 01 01 02 d 03 · a ⊕ b ⊕ c ⊕ 02 · d where the elements are from GF(28)/0x11b, (a, b, c, d)> denotes an arbitrary input vector and ⊕ denotes the connecting 8-bit XOR. The total cost to implement this diffusion matrix is 4 · (C(02) + C(03) + 3 · 8) XOR gates, where C(α) is the implementation cost of the field multiplication α, i.e. XOR(Mα). By the metric from [68], the XOR count for 02 and 03 are 3 and 11 respectively. Hence, the total implementation cost is 4·14+4·24 = 152. 4

Parameters of diffusion matrix. In block ciphers, the diffusion layer is usually a 4 × 4 or 8 × 8 matrix over GF(24) or GF(28). Diffusion matrix of other orders are also used in other primitives like hash functions. Besides the dimension of the matrix, designers may prefer involutory matrices as the same component can be reused for the decryption process. Therefore in this thesis, we conduct our search for (I)MDS Hadamard and left-circulant matrices of order 3 to 8. For the matrix entry type, we consider all possible finite fields defined by irreducible polynomials of degree 4 and 8.

Search for lightest MDS matrices

Outline. The method for searching the lightest MDS diffusion matrices is rather straight-forward. For every irreducible polynomial of degree 4 and 8, we generate and sort the non-zero field elements according to their XOR count. Next, we pick a (multi-)set of elements with the least total XOR

3Although we have also intensively studied the matrix entries from [60, 106,115], we omit the discussion from this thesis. 86 CHAPTER 5. LIGHTWEIGHT MDS DIFFUSION MATRICES count. From it, we generate the representatives for each equivalence class and check if any of them is MDS. If there exists some representative matrix candidate that is MDS, then we have found the lightest possible MDS matrix for that matrix type. Otherwise, we select the next (multi-)set with the least XOR count and repeat the process. Irreducible polynomials. Our studies show that there is a bijective map- ping between diffusion matrices over finite fields defined by irreducible poly- nomials that are reciprocal of each other (the coefficients of the irreducible polynomials are in the reverse order), where the branch number and XOR count of the diffusion matrices are preserved4. In other words, searching for lightweight MDS matrices over two finite fields, say GF(28)/0x11b and GF(28)/0x1b1, whose irreducible polynomials are reciprocal of each other yields the same results. Thus, the number of irreducible polynomials to consider is almost halved. IMDS matrices. To search for IMDS matrices, we add an additional condition when selecting the (multi-)set of elements. For Hadamard matrices, we know from Proposition 5.7 that the XOR-sum of the elements has to be 1. For left-circulant matrices, we generate the system of equations necessary and sufficient for the matrix to be involutive, and check if the multi-set satisfies the equations. Matrix candidates. For each (multi-)set of elements, we generate the representatives for all the compact equivalence classes and check if any of them has the MDS property. The reduction factor of the search space is simply the cardinality of the equivalence classes presented in Theorem 5.21 and 5.24. Checking MDS. By Proposition 2.17, one of the ways to check if a matrix candidate has the MDS property is by verifying all its square submatrices to be non-singular. In the case of Hadamard and left-circulant matrices, they have several repetitive submatrices due to their regular matrix structures. Thus, we can expedite our MDS check by verifying only the distinct submatrices.

5.4.2 Survey of lightweight (I)MDS matrices

We denote our m × m Hadamard and left-circulant matrices over GF(2s) as Hm,x,s and Lm,x,s respectively, where x = n, i stands for (n)on-involutory and (i)nvolutory matrices. The description of all newly found lightweight (I)MDS diffusion matrices are listed in Table 5.8.

(I)MDS matrices of order 4

The MDS and IMDS matrices of order 4 are listed in Table 5.2 and 5.3 respectively.

4Details can be found in [115] Section 2.2. 5.4. SEARCH AND RESULTS 87

Table 5.2: XOR count of MDS matrices of order 4

Entry Type Matrix Type XOR count Reference GF(24)/0x19 Toeplitz 58 [107] 4 GF(2 )/0x13 cyclic 60 L4,n,4,[68] GF(24)/0x13 serial 4 · 16 = 64 LED [51] 4 GF(2 )/0x13 Hadamard 68 H4,n,4 (new) GF(24)/0x13 circulant 72 Piccolo [112] GF(28)/0x1c3 Toeplitz 123 [107] 8 GF(2 )/0x1c3 cyclic 128 L4,n,8 (new) GF(28)/0x11d circulant 132 [68] GF(28)/0x11d serial 4 · 33 = 132 [68] 8 GF(2 )/0x1c3 Hadamard 148 H4,n,8 (new) GF(28)/0x11b circulant 152 AES [39]

Based on our research published in [115], the authors of [107] extended the search to Toeplitz matrices that have a different multi-set of entries in each row. They found lighter matrices, which is not surprising because they considered a matrix type with a larger degree of freedom in the choice of entries. However, the drawback is that it is difficult to extend their search to matrices of order 8, which we can achieve using our compact equivalence classes. Another drawback is that it will be bigger for serial implementations, because all rows might have different multi-sets of entries.

By Corollary 5.13, our lightweight left-circulant matrix L4,n,4 is permuta- tion equivalent to the circulant matrix presented in [68]. But for GF(28), the authors of [68] only conduct their search on GF(28)/0x11d. On the other hand, we extended the search to other irreducible polynomials and found a lighter MDS matrix L4,n,8. Serial matrices of order m are recursive MDS matrices that need to be recursively implemented to achieve the MDS property; such an implemen- tation requires several clock cycles (usually m clock cycles). To simulate a round-based implementation, one could implement m copies of the matrix5. Hence for the serial matrices from [51] and [68], we take 4 times the imple- mentation cost of their serial matrices. Note that serial matrices were not meant to be efficient in round-based implementations, implementing a series of serial matrices may result in longer delay and higher energy consumption. A Hadamard-like matrix (similar to a Toeplitz matrix) has different entries in each row, hence the search is only feasible for small matrix order; moreover, serial implementations have a larger area which is undesirable. The lightest 4 × 4 IMDS Hadamard matrix over GF(24), denoted as H4,i,4, that we have found was adopted in Joltik, one of the CAESAR

5Another way to implement a serial matrix S in a single clock cycle is to implement its resultant MDS matrix when raised to the power of m, i.e. Sm. But the implementation cost would be significantly higher than implementing m copies of S. 88 CHAPTER 5. LIGHTWEIGHT MDS DIFFUSION MATRICES

Table 5.3: XOR count of IMDS matrices of order 4

Entry Type Matrix Type XOR count Reference GF(24)/0x19 Hadamard-like 64 [107] 4 GF(2 )/0x13 Hadamard 72 H4,i,4, Joltik [58] GF(24)/0x19 Hadamard 72 Prøst [65] 8 GF(2 )/0x165 Hadamard 160 H4,i,8 (new) GF(28)/0x165 Hadamard-like 160 [107] GF(28)/0x11d Hadamard 184 ANUBIS [8] 8 GF(2 )/0x11d Hadamard 184 CLEFIA M0 [114] 8 GF(2 )/0x11d Hadamard 208 CLEFIA M1 [114]

competition submissions. The diffusion matrix used in Prøst (elements from GF(24)/0x19) is, in 4 fact, a counterpart of our matrix H4,i,4 (elements from GF(2 )/0x13) as the underlying irreducible polynomials are reciprocal of each other. CLEFIA uses two different IMDS Hadamard matrices, M0 and M1, which have XOR cost of 184 and 208 respectively; both are outperformed by our lightest IMDS Hadamard matrix H4,i,8. As one can see, we found (I)MDS diffusion matrices lighter than those used in primitives such as Piccolo, AES, CLEFIA and ANUBIS.

(I)MDS matrices of order 8

The MDS and IMDS matrices of order 8 are listed in Table 5.4 and 5.5 respectively.

Table 5.4: XOR count of MDS matrices of order 8

Entry Type Matrix Type XOR count Reference 4 GF(2 )/0x13 serial 8 · 53 = 424 PHOTON A256 [50] 4 GF(2 )/0x13 Hadamard 432 H8,n,4 (new) GF(24)/0x13 Hadamard 448 WHIRLWIND [10] GF(24)/0x13 Hadamard 512 WHIRLWIND [10] 8 GF(2 )/0x1c3 cyclic 688 L8,n,8 (new) 8 GF(2 )/0x1c3 Hadamard 768 H8,n,8 (new) GF(28)/0x11d circulant 840 WHIRLPOOL [11] GF(28)/0x11d circulant 840 ∼ 936 [113] GF(28)/0x11b circulant 1112 Grøstl [46]

[113] proposed several diffusion matrices suitable to be employed in WHIRLPOOL, and these diffusion matrices have implementation cost ranging from 840 to 936, which are outperformed by our matrices L8,n,8 and H8,n,8. 5.4. SEARCH AND RESULTS 89

Our search shows that there is no MDS cyclic matrix over GF(24) for order 7 and 8. Nevertheless, we found an MDS cyclic matrix over GF(28) that is significantly lighter than other candidates. The main reason is that cyclic matrices can contain repeated low implementation cost entries in the rows. It was pointed out by Khoo et al. [68] that they did not find new 8 × 8 circulant MDS matrix over GF(28)/0x11d due to the large search space. But we overcame this limitation and completed the search over all irreducible polynomials. In addition, with the compact equivalence classes of Hadamard matrices, we completed the search for 8 × 8 (I)MDS Hadamard matrices.

Although the serial matrix A256 from PHOTON has a slightly lower im- plementation cost than ours, we reiterate that implementing serial matrices in round-based implementation may result in higher latency and energy consumption.

Table 5.5: XOR count of IMDS matrices of order 8

Entry Type Matrix Type XOR count Reference 4 GF(2 )/0x13 Hadamard 512 H8,i,4 (new) 8 GF(2 )/0x1c3 Hadamard 816 H8,i,8 (new) GF(28)/0x11d Hadamard 1232 KHAZAD [9] GF(28)/0x11b Hadamard 1424 [52]

Again, our matrices have lower XOR count than matrices used in several primitives such as Grøstl, WHIRLWIND, WHIRLPOOL and KHAZAD.

(I)MDS matrices of other orders

The MDS and IMDS matrices of order 3, 5, 6 and 7 are listed in Table 5.6 and 5.7 respectively. For the MDS matrices, we compare our matrices with the serial matrices proposed in PHOTON. As mention before, m copies of a serial matrix have to be implemented to achieve the MDS property in a single clock cycle. Taking m times the implementation cost of their serial matrices, we can see that our matrices either outperform or match the implementation cost of those serial matrices, not to mention that our matrices tend to have a lower latency. For the IMDS matrices in Table 5.7, we state the matrix type as left- circulant rather than cyclic matrices because the involution property may not hold for other cyclic matrices. For matrix order 6 and 7, we did not find any IMDS left-circulant matrices over GF(24). Although IMDS left-circulant matrices do not exist for order 4 and 8 (Corollary 5.19), there are IMDS Hadamard matrices of order 4 and 8. Thus the left-circulant and Hadamard matrices complement each other and together we have a complete collection of IMDS matrices of order 3 to 8. 90 CHAPTER 5. LIGHTWEIGHT MDS DIFFUSION MATRICES

Table 5.6: XOR count of MDS matrices of other orders

Order Entry Type Matrix Type XOR count Reference 4 3 GF(2 )/0x13 cyclic 27 L3,n,4 (new) 8 3 GF(2 )/0x1c3 cyclic 57 L3,n,8 (new)

4 5 GF(2 )/0x13 cyclic 100 L5,n,4 (new) 4 5 GF(2 )/0x13 serial 5 · 20 = 100 PHOTON A100 [50] 8 5 GF(2 )/0x1c3 cyclic 215 L5,n,8 (new)

4 6 GF(2 )/0x13 cyclic 192 L6,n,4 (new) 4 6 GF(2 )/0x13 serial 6 · 34 = 204 PHOTON A144 [50] 8 6 GF(2 )/0x1c3 cyclic 348 L6,n,8 (new) 8 6 GF(2 )/0x11d serial 6 · 63 = 378 PHOTON A288 [50]

8 7 GF(2 )/0x1c3 cyclic 483 L7,n,8 (new)

Table 5.7: XOR count of IMDS matrices of other orders

Order Entry Type Matrix Type XOR count Reference 4 3 GF(2 )/0x1f left-circulant 60 L3,i,4 (new) 8 3 GF(2 )/0x169 left-circulant 138 L3,i,8 (new)

4 5 GF(2 )/0x13 left-circulant 150 L5,i,4 (new) 8 5 GF(2 )/0x165 left-circulant 390 L5,i,8 (new)

8 6 GF(2 )/0x165 left-circulant 516 L6,i,8 (new)

8 7 GF(2 )/0x165 left-circulant 812 L7,i,8 (new)

List of (I)MDS diffusion matrices

We compile the complete list of lightest (I)MDS Hadamard and left-circulant matrices over GF(24) and GF(28) that we have found in Table 5.8. These matrix candidates could be used for future cryptographic primitive designs.

5.5 Summary

We generalised the circulant matrix structure and introduced cyclic matrices. We drew the relation between cyclic matrices and circulant matrices in terms of the branch number, showing that several properties of circulant matrices can be extended to cyclic matrices. To reduce the search space for MDS diffusion matrices, we proposed the compact equivalence classes of Hadamard and cyclic matrices. This way, we were able to complete the search for MDS 5.5. SUMMARY 91

Table 5.8: List of lightweight (I)MDS matrices

Matrix Entry Type Entries (in hexadecimal) 4 L3,n,4 GF(2 )/0x13 lcr(1,1,2) 4 L3,i,4 GF(2 )/0x1f lcr(2,f,c) 4 L4,n,4 GF(2 )/0x13 lcr(1,1,9,4) 4 H4,n,4 GF(2 )/0x13 Had(1,2,8,9) 4 H4,i,4 GF(2 )/0x13 Had(1,4,9,d) 4 L5,n,4 GF(2 )/0x13 lcr(2,2,9,1,9) 4 L5,i,4 GF(2 )/0x13 lcr(1,2,5,4,3) 4 L6,n,4 GF(2 )/0x13 lcr(1,1,9,c,9,3) 4 H8,n,4 GF(2 )/0x13 Had(1,2,6,8,9,c,d,a) 4 H8,i,4 GF(2 )/0x13 Had(2,3,4,c,5,a,8,f)

8 L3,n,8 GF(2 )/0x1c3 lcr(01,01,02) 8 L3,i,8 GF(2 )/0x169 lcr(5a,0a,51) 8 L4,n,8 GF(2 )/0x1c3 lcr(01,01,02,91) 8 H4,n,8 GF(2 )/0x1c3 Had(01,02,04,91) 8 H4,i,8 GF(2 )/0x165 Had(01,02,b0,b2) 8 L5,n,8 GF(2 )/0x1c3 lcr(01,01,02,91,02) 8 L5,i,8 GF(2 )/0x165 lcr(01,02,b3,bb,0a) 8 L6,n,8 GF(2 )/0x1c3 lcr(01,02,e1,91,01,08) 8 L6,i,8 GF(2 )/0x165 lcr(01,01,b3,2c,04,9a) 8 L7,n,8 GF(2 )/0x1c3 lcr(01,01,91,02,04,02,91) 8 L7,i,8 GF(2 )/0x165 lcr(01,02,10,b2,58,a4,5c) 8 L8,n,8 GF(2 )/0x1c3 lcr(01,01,02,e1,08,e0,01,a9) 8 H8,n,8 GF(2 )/0x1c3 Had(01,02,03,08,04,91,e1,a9) 8 H8,i,8 GF(2 )/0x1c3 Had(01,02,03,91,04,70,05,e1)

diffusion matrix of order 8 that was previously regarded as infeasible. We conducted search on a wide range of parameters and presented the complete list of lightest non-involutory and involutory MDS Hadamard and cyclic (left-circulant) matrices of order 3 to 8. We hope that this will serve as a building block for future cryptographic primitive designs. 92 CHAPTER 5. LIGHTWEIGHT MDS DIFFUSION MATRICES Chapter 6

Security of S-boxes

In Chapter 4, we have seen that a poor choice of round constants could lead to unexpected flaws in the encryption algorithm. In the case of Midori64, some tweaks of round constants make it more resistant to the invariant subspace attack, but some lead to even larger weak-key classes. Therefore, we are interested to investigate the resistance of S-boxes against invariant subspace attacks and to eliminate the dependency on the round constants for attacks. In this chapter, we use S-box Sb0 from Midori64 as the main case study subject. We investigate S-box resistance against invariant subspace attacks in a Midori-like structure, which will be detailed later. Our goal is to simultaneously resist invariant subspace attack and satisfy other general S-box design criteria, in particular, those considered in Midori. Contribution of this chapter. In Section 6.2, we draw the relationship between the differential distribution table (DDT) and the affine subspaces of an S-box. We demonstrate how the values in the DDT relate to the existence of affine subspaces and vice versa. We also look at various case studies in Section 6.3, showing that involutory S-boxes can provide a limited level of security (without analysing the round constants) when the key schedule is very simple. On the other hand, with non-involutory S-boxes we can extend the security to arbitrary key schedules.

6.1 Preliminaries

S-box design criteria

Our goal is to find 4-bit S-boxes that satisfy the following criteria1:

• the maximal differential probability (m.d.p.) is 2−2. • the maximal absolute bias of a linear approximation (m.l.a.) is 2−2.

1We would like to point out that these (plus the involution property) are the criteria chosen by the designers of Midori. More generally, the discussion and criteria mentioned in this chapter should be an additional consideration on top of the other criteria (for instance the algebraic degree, especially for lightweight ciphers such as Midori).

93 94 CHAPTER 6. SECURITY OF S-BOXES

Recall that 4-bit S-boxes with the above properties are also known as optimal S-boxes, and there are 16 AE classes of optimal S-boxes. When there are several candidates, we pick an S-box that has: • the smallest number of fixed points, • the smallest depth. Let us define the notion of depth of an S-box. The designers of Midori introduce the metric depth to estimate the path delay of S-boxes as: Definition 6.1 (Depth [5]). The depth is defined as the sum of sequential path delays of basic operations AND, OR, XOR, NAND, NOR, XNOR and NOT2. To maintain consistency, we follow the same assumptions of depth for each basic operation as in [5]. The depths as well as the required number of gates of XOR/XNOR, AND/OR, NAND/NOR and NOT are weighted as 2, 1.5, 1 and 0.5 respectively. For example, the depth of the following function is 3.5. ((c NAND d) NAND (b NAND ( NOT a))) NOR (b NOR (a NAND d)) .

Affine subspace transition through S-boxes

0 s Let S be some invertible s-bit S-box, if there exist linear subspaces A, A ⊆ F2 s and offset values u1, u2 ∈ F2 such that 0 u2 ⊕ A = {S(x)|x ∈ u1 ⊕ A} , we call it an affine subspace transition through the S-box S. We denote it as S 0 u1 ⊕ A −→ u2 ⊕ A . Note that it is possible that u ∈ A, then u ⊕ A = A is simply a linear subspace.

6.2 The Relation between the DDT and Affine Sub- space of an S-box

Affine subspace transitions of an S-box are closely related to its differential distribution table (DDT). The existence of the affine subspace transitions S 0 0 u1 ⊕ A −→ u2 ⊕ A with low dimension of A and A immediately provides information about the DDT and vice versa. In particular, for a 4-bit S-box with m.d.p. of 2−2, all subspace transitions can be recovered solely from its DDT. We show in this section the relation between affine subspace transitions and the DDT of an S-box, and explain how to use the DDT to search for S-boxes that can be used to resist invariant subspace attacks for the whole cipher. 2The original definition of depth in [5] does not include XOR and XNOR, but XOR appears in its example and XNOR is mentioned in gate estimations. We consider both XOR and XNOR here and assume the depth of XNOR is 2. 6.2. DDT AND AFFINE SUBSPACE OF AN S-BOX 95

6.2.1 Deriving the DDT from low dimension affine subspace

If A is a subspace of dimension 2 or less, the number of elements in A will appear in the DDT. Suppose that there exists an affine subspace transition S 0 u1 ⊕A −→ u2 ⊕A . It means that for any input x ∈ A, the S-box can be seen as S(u1 ⊕ x) = l(u1 ⊕ x) ⊕ u2 = l(x) ⊕ u, where l(x) is a linear function that 0 transforms A into A and u is a constant calculated as u = l(u1) ⊕ u2. As l(x) is a linear function, for any input difference it has a differential probability of one. S For instance, let us consider the subspace with dimension 1, i.e. u1⊕A −→ 0 0 u2 ⊕ A , where A = {0, v} and A = {0, v }. This is simply converted into S 0 {u1, u1 ⊕ v} −→ {u2, u2 ⊕ v }. When the difference of two values in the S subspace is considered, it suggests the differential transition ∆I −→ ∆O 0 where ∆I = v and ∆O = v . In the end, we have at least 2 for the entry 0 (∆I , ∆O) = (v, v ) in the DDT. The same is applied to a subspace with dimension 2, suppose A = 0 0 0 0 0 {0, v1, v2, v1 ⊕ v2} and A = {0, v1, v2, v1 ⊕ v2}. Differently from dimension 1, one can create three input differences, namely ∆1 = v1, ∆2 = v2, and ∆3 = v1 ⊕ v2 (all result in different output differences). Thus we will have 0 0 3 different entries in the DDT with element at least 4. By setting ∆1 = v1, 0 0 0 0 0 0 0 0 ∆2 = v2, and ∆3 = v1 ⊕ v2, the entries for (∆1, ∆1), (∆2, ∆2) and (∆3, ∆3) are equal to 4.

6.2.2 Deriving affine subspace from the DDT

Affine subspaces can be derived from the DDT up to subspace level. Having 0 2 for the entry of (∆I , ∆O) = (v, v ) in the DDT indicates that there are 0 exactly 2 input values such that S(x) ⊕ S(x ⊕ v) = v . By setting u1 = x S 0 and u2 = S(x), it can be described as u1 ⊕ {0, v} −→ u2 ⊕ {0, v }. Without the exact specification of the S-box, the DDT does not provide the value of x and S(x) satisfying the difference transition. Therefore, the offset values u1 and u2 cannot be recovered only from the DDT. The case of the value of 4 in the DDT is basically the same. Having 4 for 0 the entry of (∆I , ∆O) = (v, v ) in the DDT indicates that there are exactly 4 0 input values such that S(x) ⊕ S(x ⊕ v) = v , i.e. x ∈ {x1, x1 ⊕ v, x2, x2 ⊕ v}. Let u1 = x1 and w = x1 ⊕ x2. Then, 4 input values can be described as 0 u1 ⊕{0, v, w, w ⊕v}. Similarly, let u2 = S(x1) and w = S(x1)⊕S(x2). Then, 0 0 0 0 4 output values can be described as u2 ⊕{0, v , w , w ⊕v }. Consequently, the S 0 0 0 0 affine subspace u1 ⊕ {0, v, w, w ⊕ v} −→ u2 ⊕ {0, v , w , w ⊕ v } holds. Note that three different entries with value 4 lead to an identical affine subspace of 0 0 dimension 2. Namely, not only (∆I , ∆O) = (v, v ) but also (∆I , ∆O) = (w, w ) 0 0 S 0 0 0 0 and (v ⊕ w, v ⊕ w ) lead to u1 ⊕ {0, v, w, w ⊕ v} −→ u2 ⊕ {0, v , w , w ⊕ v }. In short, we have proven the following proposition. 96 CHAPTER 6. SECURITY OF S-BOXES

Proposition 6.2. For an s-bit S-box with maximal differential probability 2−s+2, every affine subspace transition with dimension 2 corresponds to three entries of 4 in the DDT.

6.2.3 Recovering all affine subspace transitions from the DDT

In this section, we demonstrate how to recover all the affine subspace tran- sitions of Sb0 (up to subspace level) only from the DDT. First of all, the DDT of Sb0 is given in Table 6.1. Note that Sb0 is an involution, thus its DDT is symmetric. Namely, for any entry of the DDT, another entry in the transposed position always has the same value. Also note that Sb0 was generated to satisfy the m.d.p. criteria, thus numbers in all entries (except (0, 0)-entry) are less than or equal to 4.

Table 6.1: The DDT of Sb0 used in Midori64. Superscript alphabets show groups of entries that correspond to an identical subspace transition with dimension 2.

∆O 0123456789abcdef 0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 2 4E 0 2 2 2 0 2 0 0 0 0 0 2 0 2 0 4E 0 0 4E 0 0 0 0 4C 0 0 4B 0 0 0 3 0 0 0 0 2 0 4E 2 2 2 0 0 0 2 0 2 4 0 2 4E 2 2 2 0 0 2 0 0 2 0 0 0 0 5 0 2 0 0 2 0 0 4F 0 2 4A 0 2 0 0 0 6 0 2 0 4E 0 0 0 2 2 0 0 0 2 2 0 2 F D ∆I 7 0 0 0 2 0 4 2 0 0 0 0 2 0 4 2 0 8 0 2 0 2 2 0 2 0 0 2 0 2 2 0 2 0 9 0 0 4C 2 0 2 0 0 2 2 0 2 2 0 0 0 a 0 0 0 0 0 4A 0 0 0 0 4D 0 0 4F 0 4F b 0 0 0 0 2 0 0 2 2 2 0 4C 0 2 0 2 c 0 0 4B 0 0 2 2 0 2 2 0 0 2 0 2 0 d 0 0 0 2 0 0 2 4D 0 0 4F 2 0 0 2 0 e 0 2 0 0 0 0 0 2 2 0 0 0 2 2 4B 2 f 0 0 0 2 0 0 2 0 0 0 4F 2 0 0 2 4A

As discussed in Section 6.2.2, any (v, v0)-entry with value 2 corresponds Sb0 0 to a dimension 1 affine subspace transition u1 ⊕ {0, v} −−→ u2 ⊕ {0, v }. Hence, it is very easy to recover all affine subspace transitions with dimension 1. For example, the affine subspace transition that we used in the attack, Sb0 8 ⊕ {0, 1} −−→ 8 ⊕ {0, 1} corresponds to element 2 for (∆I , ∆O) = (1, 1). In other words, only by looking at the entry of (∆I , ∆O) = (1, 1) in the DDT, we can conclude that no other invariant subspace which is consistent with the round constants of Midori64 exists. 6.2. DDT AND AFFINE SUBSPACE OF AN S-BOX 97

Recovering transitions with dimension 2 is more difficult because one transition with dimension 2 corresponds to 3 different entries of the DDT with element 4. Hence, we need to detect which of 3 different entries imply an identical affine subspace transition. We start from finding triplets related to the diagonal of the DDT.

Firstly, we focus on the entry (∆I , ∆O) = (f, f). Considering that Sb0 is an involution, this indicates that there exists a transition of the following form.

Sb0 u1 ⊕ {0, ∆1, ∆2, f} −−→ u1 ⊕ {0, ∆1, ∆2, f} , ∆1 ⊕ ∆2 = f .

Therefore, the DDT must have elements 4 in the entry of (∆I , ∆O) = (∆1, ∆2) in which ∆1 ⊕ ∆2 = f. Then, there is only one case (∆1, ∆2) = (5, a). In Table 6.1, those triplets are denoted by superscript A. This affine subspace indeed corresponds to A = {0, 5, a, f} in Table 4.4.

With the same analysis, (∆I , ∆O) = (e, e) leads to A = {0, 2, c, e}, (∆I , ∆O) = (b, b) leads to A = {0, 2, 9, b}, and (∆I , ∆O) = (a, a) leads to A = {0, 7, a, d}. The remaining is no longer invariant (iterative with cycle 1) because Sb0 they are not on the diagonal. As long as we have u1 ⊕ {0, a, b, a ⊕ b} −−→ u2 ⊕ {0, x, y, x ⊕ y}, we immediately obtain another subspace transition Sb0 u2 ⊕ {0, x, y, x ⊕ y} −−→ u1 ⊕ {0, a, b, a ⊕ b} due to the involution property. Thus, it is natural to consider 6 entries with element 4 in one group.

We then focus on (∆I , ∆O) = (1, 2) and its inverse (2, 1). Because there is no clue which differential transitions belong to the same group, we do an exhaustive test. When we pick (x, y) (and thus (y, x) for inverse), both of (1 ⊕ x, 2 ⊕ y) and (2 ⊕ y, 1 ⊕ x) must have 4 in the DDT. For example, we pick (5, 7) (and thus (7, 5) for inverse). Then, (1 ⊕ 5, 2 ⊕ 7) = (4, 5) does not have elements 4 in the DDT, showing that (5, 7) is not in the same group as (1, 2). By applying the exhaustive test, we found that (1, 2), (2, 4), (3, 6) and their inverse are forming a group, which leads to A = {0, 1, 2, 3} ←→Sb0 A0 = {0, 2, 4, 6} in Table 4.5. Similarly, (5, 7), (a, d), (f, a) and their inverse are forming a group, which leads to A = {0, 5, a, f} ←→Sb0 A0 = {0, 7, a, d}. In the end, all affine subspace transitions are recovered up to the subspace level.

6.2.4 Remarks on affine subspace with higher dimension

The above discussion does not apply to affine subspace transitions with dimension of 3. This is because the 8 input values that correspond to 4 pairs that share the same output difference through the S-box do not necessarily form an affine space.

We illustrate using an example S-box SE (Table 6.2). We see from the DDT of SE (Table 6.3) that it has value 8 for the entry 98 CHAPTER 6. SECURITY OF S-BOXES

Table 6.2: Specifications of an example S-box

x 0123456789abcdef

SE(x) 10325469a87cbdfe

Table 6.3: DDT of an example S-box

∆O 0123456789abcdef 0 16 1 8 2 2 2 2 2 4 4 4 4 3 4 4 2 2 2 2 4 2 2 4 4 2 2 5 4 2 4 2 2 2 6 2 2 6 2 2 2 ∆I 7 2 2 8 2 2 8 2 2 2 4 2 4 9 2 2 4 2 2 4 a 2 2 2 6 2 2 b 2 2 2 8 2 c 2 2 2 6 2 2 d 2 2 2 6 4 e 2 2 4 2 4 2 f 4 4 4 4

of (∆I , ∆O) = (1, 1), which corresponds to

SE(0) ⊕ SE(0 ⊕ 1) = 1,

SE(2) ⊕ SE(2 ⊕ 1) = 1,

SE(4) ⊕ SE(4 ⊕ 1) = 1,

SE(e) ⊕ SE(e ⊕ 1) = 1.

However, the 8 input values, {0, 1, 2, 3, 4, 5, e, f}, do not form a subspace of dimension 3. The absence of affine subspace transition with dimension 3 for 4-bit S-boxes does not imply that the m.d.p. is 2−2, i.e. it does not ensure that all elements in the DDT are at most 4. This is because an affine subspace transition with dimension 1 and one with dimension 2 can impact an identical entry in the DDT, thus the number in this entry becomes 6. For the example S-box SE, there is no affine subspace with dimension 3, yet the (6, 6)-entry has value 6 which corresponds to

SE(2) ⊕ SE(2 ⊕ 6) = 6,

SE(3) ⊕ SE(3 ⊕ 6) = 6,

SE(9) ⊕ SE(9 ⊕ 6) = 6.

This forms two affine subspace transitions of dimension 2 and 1 respectively. S S i.e. 2 ⊕ {0, 1, 6, 7} −→E 2 ⊕ {0, 1, 6, 7} and 9 ⊕ {0, 6} −→E 8 ⊕ {0, 6}. 6.3. CASE STUDY AND SEARCH FOR STRONG S-BOXES 99

Similarly, ensuring an m.d.p. of 2−2 for a 4-bit S-box, in other words, having 4 or less in all entries of the DDT, does not imply that there is no affine subspace transition with dimension 3 (or higher for larger S-boxes). For example, a transition with dimension 3 could be composed of two transitions with dimension 2.

Early detection of higher dimension affine subspaces from the DDT There is no clear relation between the DDT and the affine subspace transitions with dimension higher than 2. Nonetheless, by observing the DDT of an S-box we can still have some form of an early detection of higher dimension affine subspaces. Suppose that the highest affine subspace transition is of dimension 2, then by Proposition 6.2, the number of entries of 4 in the DDT will be a multiple of 3.

Corollary 6.3. For an s-bit S-box with maximal differential probability 2−s+2, if the number of entries of 4 in the DDT is not a multiple of 3, then there exists an affine subspace transition with dimension higher than 2.

6.3 Case Study and Search for Strong S-boxes

In the following discussion, we mainly discuss SPN ciphers whose round function consists of the following operations.

1. A subkey is XORed to the whole internal state. We consider three types of key schedule that will be explained later. 2. A 4-bit S-box is applied to the entire internal state in parallel. We consider two types of S-boxes, involutive and non-involutive. Some of the discussions can be extended to an S-box of any size. We will mention it explicitly when this is the case. 3. A P-layer may be applied. To discuss the S-box criteria that stand independently of the choice of the P-layer, we assume the worst case, i.e. affine subspace does not change through the P-layer.

6.3.1 Classification for case analysis

Invariant subspace attacks on Midori64 exploit the property that if all the cells of the state are in the same affine subspace, then the linear transformation MixColumn ◦ ShuffleCell preserves this subspace. Thus, the resistance against invariant subspace attacks should be evaluated by considering the S-box, the key schedule function and the round constants. In [76], Leander et al. pointed out that the choice of proper round constants prevents invariant subspace attacks (or makes them probabilistic, thus by increasing the number of rounds, they can be made harmless). Hence, altering round constants in Midori64 is the first choice, and perhaps the easiest, to stop our attacks. That is, instead of the current values of 0 and 1 100 CHAPTER 6. SECURITY OF S-BOXES for the cells of the round constants, one can assign random values for the cells (or even random values of Hamming weight not exceeding one), and expect resistance against invariant subspace attacks after a certain number of rounds. It is possible to turn the problem upside down, and examine the case when the constants are worse (with respect to invariant subspace attacks), but still we expect some level of protection against this type of attacks. We have seen from Section 4.3 that altering Midori64 round constants may lead to larger weak-key classes. In addition, altered key schedules may lead to even larger classes. Can we make sure that regardless of the round constants and of the key schedule (to a certain extent), a proper choice of an S-box may stop invariant subspace attacks or may lead to such an attack but on only a small subset of keys? This line of research brings us a step closer to a provable security against invariant subspace attacks: it suffices to examine only the S-box (or to choose a strong S-box), in order to stop the attacks or to limit their applicability to a small set of weak keys. We will examine the security in respect to the invariant subspace attacks as the one presented on Midori64. In the remaining of the section, we examine the classes of S-boxes that ensure resistance against invariant subspace attacks. We split the analysis according to three criteria as presented further. Choice of involutory/non-involutory S-box. Depending on the cipher design, involutory S-boxes may lead to a lower implementation cost for the whole cipher compared to non-involutory S-boxes. While there is a common belief that non-involutory S-boxes have higher security, we will show that this is true with respect to resistance against invariant subspace attacks. Classes of key schedule function. In general, invariant subspace attacks can be prevented by using a strong key schedule function. On the other hand, practical designs of lightweight cryptography use a simple key schedule function as in Midori. We consider the following three classes of key schedule functions: KSF1: A single key K is used in every round, e.g. Midori128 and LED-64.

KSF2: Two keys K1 and K2 are alternately used every two rounds, e.g. Midori64 and LED-128. KSF3: No assumption on the key schedule function. We will see that KSF1 is relatively easy to protect, i.e. finding suitable S-boxes is easy, because of the limited degrees of freedom that the adversary is given (can select weak keys only from K). In KSF2 the adversary can select weak keys from K1 and K2 independently, thus protecting KSF2 is harder than KSF1. Finally, even though KSF3 is extremely hard to protect, we can still find S-boxes resisting strong invariant subspace attacks. Degree of resistance. S-box design criteria also depend on how “strongly” designers want to avoid the attacks. As described in previous sections, invariant subspace attacks usually work only for a fraction of the entire key 6.3. CASE STUDY AND SEARCH FOR STRONG S-BOXES 101 space, or weak keys. To ensure only a single weak key is harder than to ensure that the number of weak keys is limited to a small size. Goal1: The goal is to ensure that the number of weak keys is limited to c · 2b, where c is a small constant, and b is the number of cells in the key (e.g. 32 for Midori64 this is 2 , as it has a total of 32 cells from K0 and K1). Goal2: The goal is to ensure only a single weak key. If some S-box achieves a certain degree of resistance under KFS2, it can also achieve that degree of resistance, or even better, under KFS1. On the other hand, if an S-box could not achieve a particular degree of resistance under KFS2, neither will it under KFS3. Similarly, we can see that Goal2 offers a higher resistance against invariant subspace attack than Goal1.

6.3.2 Searching for strong involutory S-boxes

Further, we examine the cases of involutory S-boxes that provide a certain degree of resistance (Goal1 and Goal2) against invariant subspace attacks under the aforementioned classes of key schedule function.

Impossibility for KFS3 and KFS2 An involutory S-box, irrespectively of its size, cannot achieve Goal1 (and thus Goal2) under KSF2 (and thus under KSF3). That is, for any choice of an involutory S-box, it is impossible to prove that the number of weak keys in an invariant subspace attack is limited to only c · 2b (which is Goal1) in a cipher that has alternating subkeys (which is KSF2), without considering the round constants. The main reason is that any affine subspace transition is valid for both the S-box and its inverse due to the involution property of the S-box, i.e. S u1 ⊕ A1 ←→ u2 ⊕ A2. The DDT of the S-box must contain entries of 2, thus there exist A1 and A2 with dimension of 1 (for 4-bit S-boxes, as mentioned earlier in Section 6.2, the DDT must contain entries of 4, thus there exist b A1 and A2 with dimension of 2). Let K1 ∈ A2 and K2 ∈ A1, there are 2 such keys. Then an adversary can launch an invariant subspace attack if the plaintext belongs to the affine subspace u1 ⊕ A1. More specifically, after the first substitution layer, the affine subspace u2 ⊕ A2 is XORed with K1 which does not change the affine subspace. The affine subspace is then transformed back to u1 ⊕ A1 under the second substitution layer, which again remains unchanged after adding K2, and this cycle repeats. Hence, the ciphertext will belong to u1 ⊕ A1 or to u2 ⊕ A2, depending on the parity of the number of rounds, and thus the attack is possible under 2b weak keys (22b weak keys for 4-bit S-boxes).

Impossibility for KFS1 with Goal2 Involutory S-boxes always allow weak-key classes of size 2b under KSF1, hence no involutory S-box can achieve Goal2 under KSF1. The property stands 102 CHAPTER 6. SECURITY OF S-BOXES irrespectively of the size of the S-box. Consider an element x such that S(x) 6= x. Note, such element always exists unless the S-box is simply an identity mapping. Since S(S(x)) = x, an invariant subspace transition u ⊕ A ←→S u ⊕ A of dimension 1 always exists, i.e. u ⊕ {0, x ⊕ S(x)} ←→S u ⊕ {0, x ⊕ S(x)}, where u ∈ {x, S(x)}. Thus a subkey K with cells from A will be weak with respect to invariant subspace attack. The number of such keys is at least 2b, which contradicts with Goal2.

KSF1, Goal1

With a proper choice of an S-box, we can achieve Goal1 with KSF1, i.e. we can show that the number of weak keys for invariant subspace attack is limited to only 2b if the cipher has identical subkeys, without considering the round constants (that is, for any round constants). As discussed in Section 6.2, if the DDT of an S-box has a non-zero entry for a pair of input and output differences (v, v0), then there exists an affine S 0 subspace transition u1 ⊕ A1 −→ u2 ⊕ A2, where A1 = {0, v} and A2 = {0, v }. However, under KSF1, an invariant subspace holds if and only if A1 = A2 and K = u1 ⊕ u2 ⊕ A1. Therefore, we can achieve Goal1 under KSF1 if we can avoid non-zero entries in the diagonal of the DDT of an S-box. From the earlier section we have seen that involutory S-box always permits some invariant subspace of dimension 1; however, we can avoid invariant subspaces of dimension 2 by searching for involutory S-boxes with no entries of 4 on the diagonal of their DDT. According to the criteria from Section 6.1, we search for candidate S-boxes with minimal number of fixed points and minimal depth. We found a few S-boxes with only 2 fixed points and a depth of 4 (in comparison, Sb0 in Midori64 has 4 fixed points and depth of 3.5):

new S1 : 1046283957dbeacf (fixed points: b,f) new S2 : 1046283957afedcb (fixed points: a,d) new S3 : 1043286957dfeacb (fixed points: 3,6)

As a proof-of-concept, we list the Boolean function representation of the new 0 0 0 0 first S-box S1 (abcd is the input and a b c d is the output), which clearly shows that the depth is 4:

a0 = (a NAND (b OR c)) NAND (b NAND d) , b0 = (a NOR (b NOR (NOT c))) NOR ((a NAND d) NOR (b XNOR c)) , c0 = (b NAND (a XNOR d)) NAND ((b NAND c) NAND ((a NOR c) NOR (b NOR d))) , d0 = ((c NOR d) NOR (a OR b)) NOR ((a NOR (NOT c)) NOR (b NAND (c NAND d))) .

The depth for the outputs a0,b0,c0,d0 are 3.5, 4, 4, 4, respectively. 6.3. CASE STUDY AND SEARCH FOR STRONG S-BOXES 103

6.3.3 Searching for strong non-involutory S-boxes

As we have seen, non-zero entries on the diagonal of the DDT of involutory S-boxes correspond to iterative invariant subspaces of certain dimension, and it is impossible to avoid them. Hence, it is meaningful to consider non- involutory S-boxes. Below, we show that for this type of S-boxes achieving Goal2 is indeed possible, under some key schedules. Furthermore, we give the best achievable goals under each key schedule function.

Impossibility for KSF3 and KSF2 with Goal2

To achieve Goal2 under KSF2 and KSF3, two types of transitions should be avoided, i.e. ∆ ←→S ∆0 with first type of differences ∆ 6= ∆0 and second type of differences ∆ = ∆0. While the first type transition corresponds to symmetric non-zero entries with respect to the diagonal of the DDT, and could form a 2-transition invariant subspace of the form ∆ −→S ∆0 −→S ∆, the second type transition corresponds to non-zero entries on the diagonal and 1-transition invariant subspace ∆ −→S ∆ with ∆ 6= 0. Unfortunately, we find that none of the optimal S-boxes avoids both types of transitions in the DDT simultaneously.

KSF3 and KSF2, Goal1

To achieve Goal1, two conditions are imposed on the S-box:

1. There are no affine subspace transitions of dimension larger than 2. 2. There are no affine subspace transitions of dimension 2 that can be connected (output subspace of one coincides with input subspace of another).

S-boxes satisfying the above 2 conditions will allow only invariant sub- spaces of dimension at most 1, i.e. they achieve Goal1. Condition 2 is particularly important as it assures that one cannot build iterative affine sub- space characteristics. For 4-bit S-boxes, the only proper affine subspaces of {0, 1}4 with dimension greater than 2 are those of dimension 3, and there are 15 such subspaces in total, hence an exhaustive verification can be performed instantaneously. To fulfil Condition 2, we list all l affine subspace transitions i S i i i u1 ⊕ A1 −→ u2 ⊕ A2 with A1 and A2 of dimension 2, for i = 1, 2, . . . , l. If i j A1 =6 A2 for all i, j = 0, 1, . . . , l, i.e. there is no common input and output i affine subspaces, then A2 cannot be mapped to any dimension 2 affine sub- space in the next round. Computer search shows such S-boxes do exist, and there are many of them. We start with the 16 optimal S-boxes introduced 104 CHAPTER 6. SECURITY OF S-BOXES in [77]. We find there are five of them fulfilling the above two conditions:

012d47f68eb5a93c 012d47f68ec95ba3 012d47f68e95ab3c 012d47f68c53aeb9 012d47f68be3ac59

As mentioned before, Saarinen found golden S-boxes out of the 16 optimal S-boxes [104]. Interestingly, none of the above 5 S-boxes have any intersection with Saarinen’s golden S-boxes. For the purpose of giving convenience to future research, we call the above 5 S-boxes silver S-boxes. Notice that XORing some value to the input/output of the S-boxes only changes the offset values of the affine subspaces while the subspaces remain unchanged, thus still satisfies the above conditions. Simply considering XORing some value to the input/output of the S-boxes, there are a total of 528 S-boxes with no fixed point. Here, we did not enlarge the search space to P ◦ S ◦ Q for linear transformations P and Q. We checked the depth of 528 S-boxes, and found that none of them achieves depth 3.5. Hence, the minimum depth is 4 and the following S-box is one of them.

new S4 : 547812a3dbe0fc69

new We list the Boolean functions to compute each output bit of S4 in the following: a0 =((a XOR c) NAND (b XOR d)) NAND (a NAND (b XNOR d)) b0 =((c NAND d) NAND (b NAND ( NOT a))) NOR (b NOR ( a NAND d)) c0 =((NOT d) NOR (a XOR b)) NOR ((c XOR d) NOR ((NOT b) NOR (a NOR c))) d0 =((a NOR d) NOR (c NAND (NOT b))) NOR ((c XNOR d) NOR (b NOR (a NOR c)))

The depth for the outputs a0,b0,c0,d0 are 4, 3.5, 4, 4, respectively.

KSF1 with Goal2 Note that Goal1 under KSF1 can be achieved with the same approach as Goal1 with KSF2 and KSF3. To achieve Goal2 under KSF1, there exists a simple method: we need to ensure that all dimension 1 invariant subspaces for the S-box cannot be used in the invariant subspace attack on the cipher. Since only 1-iteration transitions are useful here, i.e. those entries in the diagonal of the DDT with input and output differences of the S-box being the same, we search for S-boxes with all zero entries in the diagonal of the DDT. The search space is of the form P ◦ S ◦ Q, where S is one of the 16 optimal S-boxes. Note, there are roughly 219 candidates S-boxes that achieve Goal2. 6.4. SUMMARY 105

Furthermore, we add constants to the input and output of the S-boxes to reduce the number of fixed points. We present below an example S-box that has a single fixed point and a depth of 4, generated from the first optimal S-box.

new S5 : 0d657bea283c9f14

new We list the Boolean functions to compute each output bit of S5 in the following: a0 =(( NOT c) NOR (a XOR b)) NOR (d NOR (( NOT b) NOR (a NOR c))) b0 =((b OR c) NAND (a XNOR d)) NAND (d NAND (a NOR b)) c0 =(b NOR (d NOR (a NOR c))) NOR ((a NAND b) NOR (c NOR ( NOT d))) d0 =(d NAND (a NOR b)) NAND ((c NOR ( NOT b)) NOR (d NOR (a NAND c)))

The depth for the outputs a0,b0,c0,d0 are 4, 4, 4, 4, respectively.

6.4 Summary

The results of the search for S-boxes are summarised in Table 6.4.

Table 6.4: Existence of S-boxes that prevents invariant subspace attacks

Involution Non-involution Goal1   KSF1 Goal2   Goal1   KSF2 Goal2   Goal1   KSF3 Goal2  

Involutory S-boxes and evaluation of Sb0

Table 6.4 clearly illustrates that resisting invariant subspace attacks only by choosing a proper involutory S-box is very hard due to the involution property. Secure constructions based on S-box analysis exist only for Goal1 new with KSF1. In Table 6.5 we compare one such S-box, e.g. S1 , and the original S-box Sb0 used in Midori64. 106 CHAPTER 6. SECURITY OF S-BOXES

Table 6.5: Comparison of involutory S-boxes

Depth Goal1 Optimal #fixed security points Midori64 3.5   4 Ours 4   2

new The new S1 ensures that the maximal size of weak keys against invariant subspace attacks is upper bounded by around 2b independently of the choice of round constants. In comparison, Sb0 allows several weak-key classes of size 22b under a bad constant choice. Our S-box has a smaller number of fixed points as well, and a slightly larger depth (our exhaustive search reveals that no depth 3.5 involutory S-box exist that satisfies all of our requirements). Note, the column of “Optimal security” means that the S-box achieves the best possible security among all of the S-boxes with the same depth.

Non-involutory S-boxes In contrast to involutory S-boxes, there exist non-involutory S-boxes that new provide resistance against invariant subspace attacks under KSF1, e.g. S5 . Under KSF2 and even KSF3 (an arbitrary key schedule), they still provide new resistance up to a small set of weak keys – refer to the S-box S4 . The minimum depth of the S-boxes in both cases is 4, and the number of fixed points is as low as one.

Conclusion We have discussed the topic of provable security against invariant subspace attacks against a certain type of the SPN structure. We examined the possibility to show that a cipher is resistant against invariant subspace attacks (or has only a small set of weak keys) by focusing only on the S-boxes in combination with the key schedule. In certain scenarios, the resistance can be achieved only by focusing on the S-boxes, regardless of the choice of the round constants. With respect to 4-bit S-boxes, there are non-involutory S-boxes that help to achieve a sufficiently high level of security against such attacks, independently of the round constants. We stress that we are not encouraging to remove round constants (this would probably allow other types of attacks, such as slide attacks), but to find strong S-boxes that resist the attack and let the designer free to choose any round constants without worries. Encryption Algorithm Design

107

Chapter 7

Beyond Ultra-Lightweight Block Ciphers

This chapter serves as a prelude to our lightweight block cipher designs. In Section 7.1, we point out the common design choices and challenges faced when designing lightweight block ciphers. Next, we talk about our design strategies and approaches for our new lightweight block cipher designs in Section 7.2. Lastly in Section 7.3, we propose a new framework to consider block ciphers.

7.1 On Lightweight Block Ciphers

7.1.1 Design choices of lightweight block ciphers

In recent years, a number of lightweight block ciphers have been proposed with performance advantages over conventional block ciphers like AES. Here, we summarise the common lightweight block cipher design choices highlighted in the NIST lightweight cryptography report [87]: Smaller block size. Rather than having a 128-bit block size (like in AES), smaller block sizes like 64-bit or 80-bit are chosen to save storage space. However, note that this reduces the maximum number of plaintext blocks that should be encrypted under the same key. Smaller key size. Another strategy to save memory and improve efficiency, it also inherently reduces the security of the cipher. Although there are lightweight block ciphers like PRESENT that have a variant with 80-bit key, the NIST recommended that now through 2030, a minimum of 112-bit security is acceptable. From 2031 and beyond, at least 128 bits of security is recommended [90]. Smaller components. With smaller block size and key size, it is natural to choose smaller components for lightweight block cipher designs. The implementation cost of smaller components tended to be significantly lower too. For instance, 4-bit S-boxes are generally preferred over 8-bit S-boxes.

109 110CHAPTER 7. BEYOND ULTRA-LIGHTWEIGHT BLOCK CIPHERS

Simpler rounds. In the case of SPN-based block ciphers, simple and clean description often leads to reduced control logic in hardware. For example, avoid having the last round to be different from the other rounds, like in AES, could save some control logic. Simpler key schedule. Similar rationale to having simpler rounds, linear (like in Piccolo [112]) or even no key schedule at all (like in Midori128 [5]) allows the round keys to be generated on the fly, unlike AES which has S-boxes in the key schedule. However, one has to pay more attention to related-key attacks. One way to prevent some of these attacks is using a secure (KDF)1.

7.1.2 Challenges in lightweight primitive designs

Designing new lightweight primitives is not an easy task. First and foremost, it has to be secure and to solve a problem for real-life application. Second, the design should have something innovative. There should be some distinctive feature that separates it from the rest. Third, it should be improving over existing proposals in certain aspects. Sometimes designers choose to focus on one particular scenario, making their ciphers excellent for that use-case, but not performant for other use-cases. This can arguably be useful in very specific scenario where other factors are secondary. Most of the times, designers wish to design something that is good in multiple aspects, but every often various factors contradict each other. We summarise some of these contradicting factors in Figure 7.1.

Figure 7.1: Designer’s dilemma

Compactness and security A simple way to reduce the implementation cost is to have a more compact block cipher with smaller block size and key size, as the storage of the internal

1In a nutshell, its goal is to take a source of initial key material and derive from it one or more cryptographically strong secret keys [74]. 7.1. ON LIGHTWEIGHT BLOCK CIPHERS 111 state and key material usually make up a significant portion of the hardware implementation area cost of the cipher. In addition, processing a smaller amount of data would usually require fewer operations and gates, thus further reducing the overall implementation cost. However, with increasing storage capabilities and computational power, having a small block size could lead to code book attack, where an adversary simply stores the entire permutation and decodes any ciphertext without the knowledge of the secret key. On the other hand, a small key size could be vulnerable to a brute force search. Therefore, there is a limitation to how small the block size and key size can be without compromising the security. Having strong properties for the building blocks are desirable for designing secure block ciphers. But every often, these properties come with high implementation costs. For example, in Table 5.2, one can see that the lightest 4 × 4 MDS diffusion matrix over GF(24) costs 58 XOR, but an almost-MDS diffusion matrix can cost as low as 32 XOR (nearly 45% cheaper). This is why some lightweight block ciphers such as PRESENT and Midori chose not to use MDS diffusion matrices. Another challenge is to prove the security of a design. Coming up with a new creative design structure is one thing, but being able to analyse and show that it is immune to classical attacks is a completely different story. Primitives with no known attack do not imply that they are secure, as there could be unintentional flaws or hidden backdoors. Being able to conduct and present a detailed security analysis is crucial to ease one’s mind about the uncertainty and have confidence in the primitive.

Hardware and software implementations

Having an efficient implementation in hardware or software does not nec- essarily apply to the other. For the ease of our discussion, we focus on comparing round-based implementations on ASIC platforms and the bit-slice implementations on high-end processors (because they are the most useful implementations, round-based ASIC is usually the most energy efficient and bit-slice software is usually the fastest). For a start, the metric for evaluating the implementation cost is different. In hardware, the area is usually quantified by GE, where each logic gate has some GE cost. Note that these costs can vary across different libraries (for instance, an AND gate costs 1.33 GE on UMC 180 nm and 1.50 GE on TSMC 65 nm). In software, it is usually quantified by the number of operations needed and not the specific type of operations used (in general, all simple logical operations have the same throughput and latency in software). Thus the estimation of the implementation cost in hardware and software can be very different. The preferred choice of operations is also quite different. In hardware implementations, NAND and NOR gates have generally lower cost than AND and OR gates, making them more favourable to be used in hardware implementations. However in software, AND and OR operations are more 112CHAPTER 7. BEYOND ULTRA-LIGHTWEIGHT BLOCK CIPHERS commonly used and in some cases an additional NOT operation is needed to realise the NAND and NOR computation, which means that NAND and NOR might cost twice the amount of operations compared to AND and OR in software. The implementation of the primitive components can be quite different too. In hardware implementations, a bit-permutation is usually considered free as it can be embedded in the wiring of circuit and the actual choice of bit-permutation has no impact. On the other hand, in software, the bits are packed in registers and rearrangement of bits across registers can be very costly (typically requires three operations — an AND for masking the bits, a shift to move it to desired position and XOR to put them into another register). In general, bit shifting or keeping the bits within the same register is preferable compared to arbitrary bit-permutation. Another dilemma is the allocation of building blocks in a primitive. In round-based hardware implementations, adding a component, say an S-box, has a direct impact on the GE cost. Thus it is more hardware-friendly to include components only when necessary. In bit-slice software implementa- tions, the bits from different words are packed together in the same register, applying a set of instructions (for implementing one component) would up- date all the words in parallel. Additional operations are needed for masking if one wishes to apply the instructions to specific words only. Therefore, it is more software-friendly to have a uniform allocation of components.

Other trade-offs

Even though there are many trade-offs between various hardware architec- tures, we can distinguish between a serial implementation, a round-based implementation and a fully unrolled implementation. In the former, one reduces the datapath and thus the area to the minimum (usually a few bits, like the S-box bit size), but the throughput is greatly reduced. For a round-based implementation, an entire round of the cipher is performed in a single clock cycle, thus ending with the entire ciphering process being done in r cycles and with a moderate area cost (this trade-off is usually a good candidate for energy efficiency). For a fully unrolled implementation, the entire ciphering process is performed in a single clock cycle, but the area cost is then quite important as all rounds need to be directly implemented. The challenge is to design lightweight components that are compact and efficient for all these trade-offs. Another set of factors to consider is the performance metrics such as area requirement, power usage, energy consumption, latency and throughput. Although they are closely related, they could contradict each other. For example, one could imagine a very simple cipher iterating thousands of rounds composed of only a single non-linear Boolean operation, an XOR and some bit wiring. This will have very low area cost and power requirement. However, such a cipher will lead to terrible performance regarding throughput, latency or energy consumption. Therefore, striking a balance between these 7.2. DESIGNING BULB CIPHERS 113 trade-offs is a challenging task.

7.2 Designing BULB Ciphers

7.2.1 Going beyond ultra-lightweight block ciphers

Among the lightweight block cipher proposals, SPN-based PRESENT and Feis- tel Network-based Piccolo have a catchy description: An Ultra-Lightweight Block Cipher. As mentioned in Section 1.3.2, primitives are considered to be “lightweight” if it is suitable to be implemented in constrained devices. So then how does a primitive justify itself as “ultra-lightweight”? To our perspective, it is pushing the boundary of lightweight cryptography through innovative ways to improve the performance and still able to guarantee high level of security. PRESENT [28]. It uses a bit-permutation rather than a classical diffusion matrix for its P-layer, which completely renders null hardware implementation cost for the P-layer (bit wiring is basically free in hardware, but not necessarily in software). To compensate for its weak diffusion, the designers use an optimal 4-bit S-box with branch number 3. Thanks to a careful design of the bit-permutation, they present a provable bound on the minimum number of active S-boxes in any 5-round differential characteristic2. Piccolo [112]. Its F-function is constructed using SPN design principles, the S-layer within the F-function uses an optimal 4-bit S-box that has an extremely low implementation cost, requiring only 4 NOR gates, 3 XOR gates and 1 XNOR gate. In addition, its permutation-based key schedule significantly reduces the implementation cost of the entire cipher. By carefully choosing the key schedule permutation, one can show it resists related-key differential attacks and meet-in-the-middle attacks. SIMON/SPECK [12]. Both designs have impressive performance (even better than PRESENT and Piccolo) for a wide spectrum of lightweight appli- cations, where SIMON is adapted for hardware and SPECK for software. However, the biggest downside is their small security guarantees, which led to dozens of security analysis conducted on SIMON and SPECK by external parties. This reflects cryptographers’ lack of confidence in primitives that do not have adequate security analysis. Having said that, our two new block cipher designs push this lightweight boundary further and go beyond ultra-lightweight block ciphers. Therefore, we called our designs SKINNY and GIFT as BULB ciphers. Parameters. For block ciphers, a common choice of block size is either 64 bits or 128 bits. Thus both our BULB ciphers have 64-bit3 and 128-bit block

2However, they did not provide related-key differential bounds. 3A block size of 64 bits should be avoided as long as the application does not require it [22]. 114CHAPTER 7. BEYOND ULTRA-LIGHTWEIGHT BLOCK CIPHERS size versions. In 2016, NIST recommended to have a minimum of 112-bit security for symmetric-key algorithms [90], thus we believe that a sufficient margin for the key size would be at least 128 bits.

7.2.2 Design strategies

Here are the common strategies which we adopted for both our BULB cipher designs. No whitening keys. We choose to not have any whitening keys and keep every round identical to simplify the control logic for the round function iterations. Any security concern related to the absence of the whitening key can be eased with an additional round iteration. Non-optimal building blocks. Rather than relying on strong but costly building blocks, we choose to use non-optimal lightweight building blocks. Each individual component may be weak, but by carefully crafting the interactions between components we show that “Unity is Strength”4. Simple key schedule. Like LED and Piccolo, we use its simple key schedule permutation without any S-boxes which significantly reduces the implementation cost of the key schedule. This also makes the decryption process simpler as the decryption round keys can be generated on-the-fly. We can apply a single permutation (composition of r key schedule permutations) to obtain the final key state and apply the inverse key schedule to generate the round keys in the reverse order. We believe that the confusion property (described by Shannon) can be achieved within the internal state with a good round function. We also conduct analysis on key schedule-related attacks to show resistance against such threats. Simple round constants. While a common practice is to use a decimal round counter to generate constant value i for i-th round, we use an affine linear-feedback shift register (LFSR) that consists of only one XNOR gate to generate distinct round constants for each round. This is to reduce the implementation cost to a minimum, just to avoid any symmetry and to differentiate the rounds; otherwise our design might be susceptible to slide attacks. Add constants directly to the internal state. Adding the round con- stants directly to the internal state serves the same purpose of breaking the symmetry in the cipher and also keeps the key schedule simple and clean. Half-state key insertion. Updating only half of the internal state with key material effectively saves half of the XOR gates cost for the key insertion operation. Although only half of the internal state is masked with a subkey, we ensure that the entire internal state is unknown after one round function. 4The moral of the story of “The Bundle of Sticks” [117], from one of Aesop’s fables. It tells a story of an old farmer teaching his quarrelling sons the strength of unity by demonstrating to them that anyone of them could easily break a stick, but when the sticks are tied into a bundle, nobody could break the bundle. 7.3. NEW FRAMEWORK FOR BLOCK CIPHERS 115

7.2.3 Design approaches

For our BULB cipher designs, we set our targets on a tweakable block cipher (TBC) and a standard block cipher to cover a wide range of applications. High-security-reduce-area. We adopt the TWEAKEY framework pro- posed by Jean et al. [57] to construct an ad-hoc TBC. While this gives users more diverse usage of the encryption algorithm, it also allows adversaries more freedom in manipulating the tweaks for launching attacks. Therefore, we place the security of our cipher as top priority. Starting with strong security properties, our strategy is to preserve this advantage while “peeling off” and removing any unnecessary operations. We end up with a skinny cipher consists only the core operations necessary for its functionality, thus the name SKINNY cipher. Small-area-increase-security. For the block cipher design, we set our eyes upon lightweightness. What could be a better starting point than the ultra-lightweight block cipher PRESENT? Starting with an area goal even smaller than PRESENT, we try to improve its security through detailed analysis of PRESENT. We end up with nice little present for the symmetric-key cryptography community, a GIFT cipher.

7.3 New Framework for Block Ciphers

As mentioned in Section 1.2.1, there are two main design principles for block ciphers — Substitution-Permutation Network and Feistel Network, each with their own design advantages. However, besides the parameters of a block cipher, for instance the block size and key size, there is no common metric shared between these two design philosophies. In this section, we propose a simple framework ideal rate of influence (IRI) of block ciphers to put them on the same page. This estimates the maximum number of bits that can be under the influence of a single bit after one iteration of the round function. For that, we define a classification for the Substitution-Permutation Network and Feistel Network block ciphers.

7.3.1 Classification of block ciphers o/n-SbPN block cipher. For SPN ciphers, some designs, like AES, use a diffusion layer while some, like PRESENT, use a bit-permutation layer. To place them on the same level, we can shift the XOR components from the P-layer (if any) to the S-layer to form Super-S-boxes5, leaving what remains of the P-layer as a bit-permutation layer. For instance, in the case of AES, we can combine the 8-bit S-boxes and 4 × 4 diffusion matrix together to form a 32-bit Super-S-box, and the ShiftRows can be regarded as a form of bit-permutation layer.

5Similar to the “AES super box” described in [40]. 116CHAPTER 7. BEYOND ULTRA-LIGHTWEIGHT BLOCK CIPHERS

Having said that, we define a Substitution-bitPermutation Network (SbPN) as a subclassification of Substitution-Permutation Network, where the P-layer is a bit-permutation layer. An o/n-SbPN cipher is an n-bit SPN cipher in which its S-layer consists of o-bit (Super-)S-boxes. Under this classification, AES is a 32/128-SbPN cipher. o/n-GFN block cipher. Since the design of DES, there are several vari- ations of Feistel Network such as the unbalanced Feistel Network [110], alternating Feistel Network [2, 83] and type-1/2/3 Feistel Network [133]. In [55], the authors unified all of them under one umbrella as Generalised Feistel Network (GFN). Since the output of an F-function is XORed to another branch in the internal state, the number of bits that can be updated in one round depends on the output size of the F-function. An o/n-GFN cipher is an n-bit GFN cipher that has a maximum of o-bit output from some F-function.

7.3.2 Ideal rate of influence of block ciphers

Under the assumption that the building blocks of the ciphers are ideal, the o indicates the maximum number of bits that could be affected by a single bit in one round of the n-bit block cipher. This gives us a ratio metric to indicate how fast the diffusion can be in the best case scenario. In Table 7.1, we list some of the prominent lightweight block ciphers including our BULB ciphers SKINNY and GIFT.

Table 7.1: Classification of various lightweight block ciphers (∗SKINNY adopts the TWEAKEY framework)

Block (Tweak)Key Number IRI Cipher Classification Ref. Size Size of Rounds 1:32 GIFT 128 128 40 4/128-SbPN new GIFT 64 128 28 4/64-SbPN new 1:16 PRESENT 64 80/128 31 4/64-SbPN [28] RECTANGLE 64 80/128 25 4/64-SbPN [132] SKINNY 64 64/128/192∗ 32/36/40 16/64-SbPN new LED 64 64/128 32/48 16/64-SbPN [51] Midori 64 128 16 16/64-SbPN [5] 1:4 Piccolo 64 80/128 25/31 16/64-GFN [112] SKINNY 128 128/256/384∗ 40/48/56 32/128-SbPN new Midori 128 128 20 32/128-SbPN [5] CLEFIA 128 128/192/256 18/22/26 32/128-GFN [114] SIMON 64 96/128 42/44 32/64-GFN [13] 1:2 SIMON 128 128/192/256 68/69/72 64/128-GFN [13]

Under this framework, we can easily identify our main competitors. One 7.3. NEW FRAMEWORK FOR BLOCK CIPHERS 117 can also see that one of our GIFT variations is the first of its kind. Although SIMON is not in the same category as our BULB ciphers, we include it in our comparison as it has one of the best performance in hardware till date. 118CHAPTER 7. BEYOND ULTRA-LIGHTWEIGHT BLOCK CIPHERS Chapter 8

The SKINNY Family of Block Ciphers

We present a new tweakable block cipher family SKINNY, whose goal is to compete with NSA’s 2013 design SIMON in terms of hardware and software performance, while proving in addition much stronger security guarantees with regards to differential and linear cryptanalysis. In particular, unlike SIMON, we are able to provide strong bounds for all versions, and not only in the single-key model, but also in the related-key or related-tweak model. SKINNY is an SPN cipher that uses a compact S-box, a new very sparse diffusion layer, and a new very light key schedule. Yet, by carefully choosing our components and how they interact, our construction manages to retain very strong security guarantees. For all the SKINNY versions, we are able to prove using mixed-integer linear programming (MILP) very strong bounds with respect to differential and linear cryptanalysis, not only in the single-key model, but also in the much more involved related-key model. Contribution of this chapter. Compared to SIMON, in the single-key model SKINNY needs a much lower proportion of its total number of rounds to provide a sufficient bound on the best differential and linear characteristic. In the related-key model, the situation is even more at SKINNY’s advantage as no such bound is known for any version of SIMON as of today. We remark that the decryption process of SKINNY has almost exactly the same description as the encryption counterpart, thus minimising the decryption overhead. Similar to SIMON, SKINNY very naturally encompasses 64-bit and 128- bit block versions and a wide range of key sizes. However, in addition, SKINNY provides a tweakable capability, which can be very useful not only for leakage resilient implementations, but also to be directly plugged into higher-level operating modes, such as SCT [94]. In order to provide this tweak feature, we have generalised the STK construction [57] to enable more compact implementations while maintaining a high provable security level. The SKINNY specifications are given in Section 8.1. The rationale of our design as well as various theoretical security comparisons are provided in Section 8.2. We conducted an elaborated security analysis in Section 8.3 and

119 120 CHAPTER 8. THE SKINNY FAMILY OF BLOCK CIPHERS we exhibit our implementation results in Section 8.4.

8.1 Specifications of SKINNY

Notation and SKINNY versions

The BULB ciphers of the SKINNY family have 64-bit and 128-bit block versions and let n denote the block size. In both n = 64 and n = 128 versions, the internal state is viewed as a 4 × 4 square array of cells, where each cell is a nibble (in the n = 64 case) or a byte (in the n = 128 case). We denote ISi,j the cell of the internal state located at row i and column j (counting starting from 0). One can also view this 4 × 4 square array of cells as a vector of cells by concatenating the rows. Thus, we denote with a single subscript ISi the cell of the internal state located at position i in this vector (counting starting from 0) and we have that ISi,j = IS4·i+j. SKINNY follows the TWEAKEY framework from [57] and thus takes a tweakey input instead of a key or a pair tweak and key. The user can then choose what part of this tweakey input will be key material and/or tweak material (the classical block cipher view is to use the entire tweakey input as key material only). The family of BULB ciphers SKINNY have three main tweakey size versions: for a block size n, we propose versions with tweakey size t = n, t = 2n and t = 3n (versions with other tweakey sizes between n and 3n are naturally obtained from these main versions) and we denote z = t/n the tweakey size to block size ratio. The tweakey state is also viewed as a collection of z 4 × 4 square arrays of cells of s bits each. We denote these arrays TK1 when z = 1, TK1 and TK2 when z = 2, and finally TK1, TK2 and TK3 when z = 3. Moreover, we denote T Kzi,j the cell of the tweakey state located at row i and column j of the z-th cell array. As for the internal state, we extend this notation to a vector view with a single subscript: TK1i, TK2i and TK3i. Moreover, we define the adversarial model SK (resp. TK1, TK2 or TK3) where the adversary cannot (resp. can) introduce differences in the tweakey state. The number r of rounds to perform during encryption depends on the block and tweakey sizes. The actual values are summarised in Table 8.1.

Table 8.1: Number of rounds for SKINNY-n-t, with n-bit internal state and t-bit tweakey state.

Tweakey size t Block size n n 2n 3n 64 32 rounds 36 rounds 40 rounds 128 40 rounds 48 rounds 56 rounds 8.1. SPECIFICATIONS OF SKINNY 121

Initialisation

The cipher receives a plaintext p = p0kp1k · · · kp14kp15, where the pi are s-bit cells, with s = n/16 (we have s = 4 for the 64-bit block SKINNY versions and s = 8 for the 128-bit block SKINNY versions). The initialisation of the cipher’s internal state is performed by simply setting ISi = pi for 0 ≤ i ≤ 15:   p0 p1 p2 p3    p4 p5 p6 p7  IS =   .    p8 p9 p10 p11 p12 p13 p14 p15

This is the initial value of the cipher internal state and note that the state is loaded row-wise rather than in the column-wise fashion we have come to expect from the AES; this is a more hardware-friendly choice, as pointed out in [88]. The cipher receives a tweakey input tk = tk0ktk1k · · · ktk16z−1, where the tki are s-bit cells. The initialisation of the cipher’s tweakey state is performed by simply setting for 0 ≤ i ≤ 15: TK1i = tki when z = 1, TK1i = tki and TK2i = tk16+i when z = 2, and finally TK1i = tki, TK2i = tk16+i and TK3i = tk32+i when z = 3. We note that the tweakey states are loaded row-wise.

The round function

One encryption round of SKINNY is composed of five operations in the following order: SubCells, AddConstants, AddRoundTweakey, ShiftRows and MixColumns (see illustration in Figure 8.1).

ART ShiftRows MixColumns

>>> 1 SC AC >>> 2 >>> 3

Figure 8.1: The SKINNY round function applies five different transformations: SubCells (SC), AddConstants (AC), AddRoundTweakey (ART), ShiftRows (SR) and MixColumns (MC).

Note that no whitening key is used in SKINNY. Thus, a part of the first and last round do not add any security. We motivate this choice in Section 8.2. SubCells. An s-bit S-box is applied to every cell of the cipher internal state. For s = 4, SKINNY-64 uses an S-box S4 very close to the Piccolo S-box [112]. The action of this S-box in hexadecimal notation is given by the following Table 8.2. Note that S4 can also be described with four NOR and four XOR op- erations, as depicted in Figure 8.2. If x0, x1, x2 and x3 represent the four 122 CHAPTER 8. THE SKINNY FAMILY OF BLOCK CIPHERS

Table 8.2: 4-bit S-box S4 used in SKINNY-64

x 0123456789abcdef

S4[x] c6901a2b385d4e7f −1 S4 [x] 3468ca1e92570bdf

inputs bits of the S-box (x0 being the LSB), one simply applies the following transformation:

(x3, x2, x1, x0) → (x3, x2, x1, x0 ⊕ (x3 ∨ x2)), followed by a left shift bit rotation. This process is repeated four times, except for the last iteration where the bit rotation is omitted.

MSB LSB MSB LSB

MSB LSB MSB LSB

Figure 8.2: S-box S4 circuit Figure 8.3: S-box S8 circuit

For the case s = 8, SKINNY-128 uses an 8-bit S-box S8 that is built in a similar manner as for the 4-bit S-box S4 described above. The construction is simple and is depicted in Figure 8.3. If x0, ..., x7 represent the eight inputs bits of the S-box (x0 being the LSB), it applies the following transformation on the 8-bit state:

(x7, x6, x5, x4, x3, x2, x1, x0) → (x7, x6, x5, x4 ⊕ (x7 ∨ x6), x3, x2, x1, x0 ⊕ (x3 ∨ x2)), followed by the bit-permutation:

(x7, x6, x5, x4, x3, x2, x1, x0) −→ (x2, x1, x7, x6, x4, x0, x3, x5), repeating this process four times, except for the last iteration where there is just a bit swap between x1 and x2. We provide in Table 8.3 and 8.4 the table of S-box S8 and its inverse in hexadecimal notation. 8.1. SPECIFICATIONS OF SKINNY 123

Table 8.3: The 8-bit S-box S8 of SKINNY-128. The row (resp. column) represents the 4 MSB (resp. LSB) of the input byte. For example, S8(d2) = 6e.

0 1 2 3 4 5 6 7 8 9 a b c d e f 0 65 4c 6a 42 4b 63 43 6b 55 75 5a 7a 53 73 5b 7b 1 35 8c 3a 81 89 33 80 3b 95 25 98 2a 90 23 99 2b 2 e5 cc e8 c1 c9 e0 c0 e9 d5 f5 d8 f8 d0 f0 d9 f9 3 a5 1c a8 12 1b a0 13 a9 05 b5 0a b8 03 b0 0b b9 4 32 88 3c 85 8d 34 84 3d 91 22 9c 2c 94 24 9d 2d 5 62 4a 6c 45 4d 64 44 6d 52 72 5c 7c 54 74 5d 7d 6 a1 1a ac 15 1d a4 14 ad 02 b1 0c bc 04 b4 0d bd 7 e1 c8 ec c5 cd e4 c4 ed d1 f1 dc fc d4 f4 dd fd 8 36 8e 38 82 8b 30 83 39 96 26 9a 28 93 20 9b 29 9 66 4e 68 41 49 60 40 69 56 76 58 78 50 70 59 79 a a6 1e aa 11 19 a3 10 ab 06 b6 08 ba 00 b3 09 bb b e6 ce ea c2 cb e3 c3 eb d6 f6 da fa d3 f3 db fb c 31 8a 3e 86 8f 37 87 3f 92 21 9e 2e 97 27 9f 2f d 61 48 6e 46 4f 67 47 6f 51 71 5e 7e 57 77 5f 7f e a2 18 ae 16 1f a7 17 af 01 b2 0e be 07 b7 0f bf f ca ee c6 cf e7 c7 ef d2 f2 de fe d7 f7 df ff

−1 Table 8.4: The inverse S-box S8 of SKINNY-128. The row (resp. column) −1 represents the 4 MSB (resp. LSB) of the input byte. For example, S8 (d2) = f8.

0 1 2 3 4 5 6 7 8 9 a b c d e f 0 ac e8 68 3c 6c 38 a8 ec aa ae 3a 3e 6a 6e ea ee 1 a6 a3 33 36 66 63 e3 e6 e1 a4 61 34 31 64 a1 e4 2 8d c9 49 1d 4d 19 89 cd 8b 8f 1b 1f 4b 4f cb cf 3 85 c0 40 15 45 10 80 c5 82 87 12 17 42 47 c2 c7 4 96 93 03 06 56 53 d3 d6 d1 94 51 04 01 54 91 d4 5 9c d8 58 0c 5c 08 98 dc 9a 9e 0a 0e 5a 5e da de 6 95 d0 50 05 55 00 90 d5 92 97 02 07 52 57 d2 d7 7 9d d9 59 0d 5d 09 99 dd 9b 9f 0b 0f 5b 5f db df 8 16 13 83 86 46 43 c3 c6 41 14 c1 84 11 44 81 c4 9 1c 48 c8 8c 4c 18 88 cc 1a 1e 8a 8e 4a 4e ca ce a 35 60 e0 a5 65 30 a0 e5 32 37 a2 a7 62 67 e2 e7 b 3d 69 e9 ad 6d 39 a9 ed 3b 3f ab af 6b 6f eb ef c 26 23 b3 b6 76 73 f3 f6 71 24 f1 b4 21 74 b1 f4 d 2c 78 f8 bc 7c 28 b8 fc 2a 2e ba be 7a 7e fa fe e 25 70 f0 b5 75 20 b0 f5 22 27 b2 b7 72 77 f2 f7 f 2d 79 f9 bd 7d 29 b9 fd 2b 2f bb bf 7b 7f fb ff 124 CHAPTER 8. THE SKINNY FAMILY OF BLOCK CIPHERS

AddConstants. A 6-bit affine LFSR, whose state is denoted (, rc4, rc3, , rc1, rc0) (with rc0 being the LSB), is used to generate round constants. Its update function uses a single XNOR gate and is defined as:

(rc5krc4krc3krc2krc1krc0) → (rc4krc3krc2krc1krc0krc5 ⊕ rc4) .

The six bits are initialised to zero, and updated before use in a given round. The bits from the LFSR are arranged into a 4 × 4 array (only the first column of the state is affected by the LFSR bits), depending on the size of the internal state:   c0 0 0 0   c1 0 0 0   ,   c2 0 0 0 0 0 0 0 with c2 = 0x2 and

(c0, c1) = (rc3krc2krc1krc0, 0k0krc5krc4) when s = 4 ,

(c0, c1) = (0k0k0k0krc3krc2krc1krc0, 0k0k0k0k0k0krc5krc4) when s = 8 .

The round constants are combined with the state, respecting array po- sitioning, using bitwise XOR. The values of the (rc5, rc4, rc3, rc2, rc1, rc0) constants for each round are given in the table below, encoded to byte values for each round.

Rounds Constants 1 - 16 01,03,07,0f,1f,3e,3d,3b,37,2f,1e,3c,39,33,27,0e 17 - 32 1d,3a,35,2b,16,2c,18,30,21,02,05,0b,17,2e,1c,38 33 - 48 31,23,06,0d,1b,36,2d,1a,34,29,12,24,08,11,22,04 49 - 62 09,13,26,0c,19,32,25,0a,15,2a,14,28,10,20

AddRoundTweakey. The first and second rows of all tweakey arrays are extracted and bitwise XORed to the cipher internal state, respecting the array positioning. More formally, for i = {0, 1} and j = {0, 1, 2, 3}, we have:

• ISi,j = ISi,j ⊕ TK1i,j when z = 1,

• ISi,j = ISi,j ⊕ TK1i,j ⊕ TK2i,j when z = 2,

• ISi,j = ISi,j ⊕ TK1i,j ⊕ TK2i,j ⊕ TK3i,j when z = 3. Then, the tweakey arrays are updated as follows (this tweakey schedule is illustrated in Figure 8.4). First, a permutation PT is applied on the cells positions of all tweakey arrays: for all 0 ≤ i ≤ 15, we set TK1i ← TK1PT [i] with

PT = [9, 15, 8, 13, 10, 14, 12, 11, 0, 1, 2, 3, 4, 5, 6, 7] , 8.1. SPECIFICATIONS OF SKINNY 125

LFSR

LFSR

PT

Extracted 8s-bit subtweakey

Figure 8.4: The tweakey schedule in SKINNY. Each tweakey word TK1, TK2 and TK3 (if any) follows a similar transformation update, except that no LFSR is applied to TK1. and similarly for TK2 when z = 2, and for TK2 and TK3 when z = 3. This corresponds to the following reordering of the matrix cells, where indices are taken row-wise:

(0,..., 15) 7−→PT (9, 15, 8, 13, 10, 14, 12, 11, 0, 1, 2, 3, 4, 5, 6, 7) .

Finally, every cell of the first and second rows of TK2 and TK3 (for the SKINNY versions where TK2 and TK3 are used) are individually updated with an LFSR. The LFSRs used are given in Table 8.5( x0 stands for the LSB of the cell).

Table 8.5: The LFSRs used in SKINNY to update the tweakey cells. The TK parameter gives the number of tweakey words in the cipher, and the s parameter gives the size of cell in bits.

TK s LFSR

4 (x kx kx kx ) → (x kx kx kx ⊕ x ) TK2 3 2 1 0 2 1 0 3 2 8 (x7kx6kx5kx4kx3kx2kx1kx0) → (x6kx5kx4kx3kx2kx1kx0kx7 ⊕ x5) 4 (x kx kx kx ) → (x ⊕ x kx kx kx ) TK3 3 2 1 0 0 3 3 2 1 8 (x7kx6kx5kx4kx3kx2kx1kx0) → (x0 ⊕ x6kx7kx6kx5kx4kx3kx2kx1)

ShiftRows. As in AES, in this layer the rows of the internal state cell array are rotated, but right rotations rather than left rotations are used. More precisely, the second, third, and fourth cell rows are rotated by 1, 2 and 3 positions to the right, respectively. The order of the cells in the internal state array is permuted using P, for all 0 ≤ i ≤ 15, we set ISi ← ISP[i] with

P = [0, 1, 2, 3, 7, 4, 5, 6, 10, 11, 8, 9, 13, 14, 15, 12] .

MixColumns. Each column of the cipher internal state array is multiplied 126 CHAPTER 8. THE SKINNY FAMILY OF BLOCK CIPHERS by the following binary matrix M:

 1 0 1 1     1 0 0 0    M =   .  0 1 1 0    1 0 1 0

The final value of the internal state array provides the ciphertext with cells being unpacked in the same way as the packing during initialisation. Note that decryption is very similar to encryption as all cipher components have very simple inverses (SubCells and MixColumns are based on a generalised Feistel structure, so their respective inverse is straightforward to deduce and can be implemented with the exact same number of operations).

Extending to other tweakey sizes

The three main versions of SKINNY have tweakey sizes t = n, t = 2n and t = 3n, but one can easily extend this to any size1 of tweakey n ≤ t ≤ 3n: • for any tweakey size n < t < 2n, one simply uses exactly the t = 2n version but the last 2n − t bits of the tweakey state are fixed to the zero value. Moreover, the corresponding cells in the tweakey state TK2 will not be updated throughout the rounds with the LFSR. • for any tweakey size 2n < t < 3n, one simply uses exactly the t = 3n version but the last 3n − t bits of the tweakey state are fixed to the zero value. Moreover, the corresponding cells in the tweakey state TK3 will not be updated throughout the rounds with the LFSR.

We note that some of our SKINNY-64 versions allow small key sizes (down to 64-bit). We emphasise that we propose these versions mainly for simplicity in the description of the SKINNY family of ciphers. Yet, as advised by the NIST [90], one should not to use key sizes that are smaller than 112 bits.

Instantiating the tweakey state with key and tweak material

Following the TWEAKEY framework [57], SKINNY takes as inputs a plaintext or a ciphertext and a tweakey value, which can be used in a flexible way by filling it with key and tweak material. Whatever the situation, the user must ensure that the key size is always at least as big as the block size. In the classical setting where only key material is input, we use exactly the specifications of SKINNY described previously. However, when some tweak material is to be used in the tweakey state, we dedicate TK1 for this purpose and XOR a bit set to “1” every round to the second bit of the

1For simplicity we do not include here tweakey sizes that are not a multiple of s bits. However, such cases can be trivially handled by generalising the tweakey schedule description to the bit level. 8.2. RATIONALE OF SKINNY 127

top cell of the third column (i.e. the second bit of IS0,2). In other words, when there is some tweak material, we add an extra “1” in the constant matrix from AddConstants). Besides, in situations where the user might use different tweak sizes, we recommend dedicating some cells of TK1 to encode the size of the tweak material in order to ensure proper separation. Note that these are only recommendations, thus not strictly part of the specifications of SKINNY.

8.2 Rationale of SKINNY

8.2.1 The designing of SKINNY

Several design choices of SKINNY have been borrowed from existing ciphers, but most of our components are new, optimised for our goal: a cipher well suited for most lightweight applications. Design approach. SKINNY was designed with a high-security-reduce-area approach, that is to maintain a strong security property while removing any unnecessary operation (hence the name of our proposal). We end up with the sound property that removing any component or using weaker version of a component from SKINNY would lead to a much weaker (or actually insecure) cipher. Therefore, the construction of SKINNY has been done through several iterations, trying to reach the exact spot where good performance meets strong security arguments. We detail in this section how we tried to follow this direction for each layer of the cipher. We note that one could have chosen a slightly smaller S-box or a slightly sparser diffusion layer, but our preliminary implementations showed that these options represent a worse trade-off overall. When designing a lightweight encryption algorithm, several use-cases must be taken in account. While area optimised implementations are important for some very constrained applications, throughput or throughput-over-area optimised implementations are also very relevant. Actually, looking at the efficiency measurements introduced by Khoo et al. [68], one can see that our designs choices are good for many types of implementations, which is exactly what makes a good general-purpose lightweight encryption algorithm. Estimating area. In order to discuss the rationale of our design, we first quickly describe an estimation in gate equivalent (GE) of the ASIC area cost of several simple bit operations (for UMC 180 nm 1.8 V [126]): a NOR/NAND gate costs 1 GE, a OR/AND gate costs 1.33 GE, an XOR/XNOR gate costs 2.67 GE and a NOT gate costs 0.67 GE. Of course, these numbers depend on the library used, but it will give us at least some rough and easy evaluation of the design choices we will make. Architectures and performance. The ultimate goal of a good lightweight encryption primitive is to use lightweight components, but also to ensure that these components are compact and efficient for several different hardware 128 CHAPTER 8. THE SKINNY FAMILY OF BLOCK CIPHERS architectures, like serial implementation, a round-based implementation and a fully unrolled implementation. This is what SIMON designers have managed to produce, but sacrificing a few security guarantees. SKINNY offers similar (sometimes even better) performance than SIMON, while providing much stronger security arguments with regard to classical differential or linear cryptanalysis.

8.2.2 General design and components rationale

A first and important decision was to choose between a SPN or a Feistel Network. We started from a SPN construction as it is generally easier to provide stronger bounds on the number of active S-boxes. However, we note that there is a dual bit-slice view of SKINNY that resembles some Generalised Feistel Network. Somehow, one can view the cipher as a primitive in between an SPN and an “AND-rotation-XOR” function like SIMON. We try to get the best of both worlds by benefiting from the nice implementation trade-offs of the latter, while organising the state in an SPN view so that bounds on the number of active S-boxes can be easily obtained. The absence of whitening key is justified by the reduction of the control logic: by always keeping the exact same round during the entire encryption process we avoid the control logic induced by having a last non-repeating layer at the end of the cipher. Besides, this simplifies the general description and implementation of the primitive. Obviously, having no whitening key means that a few operations of the cipher have no impact on the security. This is actually the case for both the beginning and the end of the ciphering process in SKINNY since the key addition is done in the middle of the round, with only half of the state being involved with this key addition every round. A crucial feature of SKINNY is the easy generation of several block size or tweakey size versions, while keeping the general structure and most of the security analysis untouched. Going from the 64-bit block size versions to the 128-bit block size versions is simply done by using a 8-bit S-box instead of a 4-bit S-box, therefore keeping all the structural analysis identical. Using bigger tweakey material is done by following the STK construction [57], which allows automated analysis tools to still work even though the input space become very big (in short, the superposition trick makes the TK2 and TK3 analysis almost as time consuming as the normal and easy TK1 case). Besides, unlike previous lightweight block ciphers, this complete analysis of the TK2 and TK3 cases allows us to dedicate a part of this tweakey material to be potentially some tweak input, therefore making SKINNY a flexible tweakable block cipher. Also, we directly obtain related-key security proofs using this general structure.

SubCells

The choice of the S-box is obviously a crucial decision in an SPN cipher and we have spent a lot of efforts on looking for the best possible candidate. For 8.2. RATIONALE OF SKINNY 129 the 4-bit case, we have designed a tool that searches for the most compact candidate that provides some minimal security guarantees. Namely, with the bit operations cost estimations given previously, for all possible combinations of operations (NAND/NOR/XOR/XNOR) up to a certain limit cost, our tool checks if certain security criterion of the tested S-box are fulfilled. More precisely, we have forced the m.d.p. of the S-box to be 2−2 and the m.l.a. to be 2−2. When both criteria are satisfied, we have filtered our search for S-box with high algebraic degree.

4-bit S-box. Our results show that the S-box used in the Piccolo block cipher [112] is close to be the best one: our 4-bit S-box candidate S4 is essentially the Piccolo S-box with the last NOT gate at the end being removed (see Figure 8.2). We believe this extra NOT gate was added by the Piccolo designers to avoid fixed points (actually, if fixed points were to be removed at the S-box level, the Piccolo candidate would be the best choice), but in SKINNY the fixed points are handled with the use of constants to save some extra GE. Yet, omitting the last bit rotation layer removes already a lot of fixed points (the efficiency cost of this omission being null).

The S-box S4 can therefore be implemented with only 4 NOR gates and 4 XOR gates, the rest being only bit wiring (basically free in hardware). According to our previously explained estimations, this should cost 14.68 GE, but as remarked in [112], some libraries provide special gates that further save area. Namely, in our library the 4-input AND-NOR and 4-input OR- NAND gates with two inputs inverted cost 2 GE and they can be used to directly compute an XOR or an XNOR. Thus, S4 can be implemented with only 12 GE. In comparison, the PRESENT S-box requires 4 NAND/NOR, 9 XOR/XNOR gates, which amounts to 22 GE (or 28.03 GE without the special 4-input gates).

All in all, our 4-bit S-box S4 has the following security properties: Dmax = −2 −2 2 , Lmax = 2 , branch number 2, algebraic degree 3 and one fixed point S4(f) = f. 8-bit S-box. The search space for 8-bit S-box was too wide for our automated tool. Therefore, we instead considered a subclass of the entire search space: by reusing the general structure of S4, we have tested all possible S-boxes built by iterating several times a NOR-XOR combination and a bit-permutation. Our search found that the m.d.p. and m.l.a. of the S-boxes are larger than 2−2 when we have less than 8 iterations of the NOR-XOR combination and bit-permutation. With 8 iterations of the NOR-XOR combination and bit- −2 −2 permutation, we found S-boxes with desired Dmax = 2 and Lmax = 2 with algebraic degree 6. However, the algebraic degree of the inverse S-boxes of all these candidates is 5 rather than 6. In addition, having 8 iterations may result in higher latency when we consider a serial hardware implementation. Therefore, we considered having 2 NOR-XOR combinations in every iteration and reduce the number of iteration from 8 to 4. As a result, we found several S-boxes with the desired m.d.p. and m.l.a., while reaching algebraic degree 6 for both the S-box and its inverse (thus better than the 8 iterations case). 130 CHAPTER 8. THE SKINNY FAMILY OF BLOCK CIPHERS

Although such S-box candidates have 3 fixed points when we omit the last bit-permutation layer like the 4-bit case, we can easily reduce the number of fixed points by introducing a different bit-permutation from the intermediate bit-permutations to the last layer without any additional cost. With 2 NOR-XOR combinations and a bit-permutation iterated 4 times, S8 can be implemented with only 8 NOR gates and 8 XOR gates (see Fig- ure 8.3), the rest being only bit wiring (basically free in hardware). The total area cost should be 24 GE according to our previously explained estimations and using special 4-input AND-NOR and 4-input OR-NAND gates. In com- parison, while ensuring an m.d.p. (resp. m.l.a.) of 2−6 (resp. 2−4), the AES S-box costs 180 GE [34]. Even the lightweight 8-bit S-box proposal from Canteaut et al. [35] requires 12 AND/OR gates and 26 XOR gates, which amounts to 64 GE, for a m.d.p. (resp. m.l.a.) of 2−5 (resp. 2−3), but their optimisation goal was different from ours.

All in all, we believe our 8-bit S-box candidate S8 provides a good trade- −2 −2 off between security and area cost. It has Dmax = 2 , Lmax = 2 , branch number 2, algebraic degree 6 and a single fixed point S8(ff) = ff (for the S-box we have chosen, swapping two bits in the last bit-permutation was probably the simplest method to achieve only a single fixed point).

Note that both our S-boxes S4 and S8 have the interesting feature that their inverse is computed almost identically to the forward direction (as they are based on a generalised Feistel structure) and with exactly the same number of operations. Thus, our design reasoning also holds when considering the decryption process.

AddConstants

The constants in SKINNY have several goals: differentiate the rounds, dif- ferentiate the columns and avoid symmetries, complicate invariant subspace cryptanalysis (see Section 8.3.3) and attacks exploiting fixed points from the S-box. In order to differentiate the rounds, we simply need a counter, and since the number of rounds of all SKINNY versions is smaller than 64, the most hardware-friendly solution is to use a very cheap 6-bit affine LFSR (like in LED [51]) that requires only a single XNOR gate per update. The 6 bits are then dispatched to the two first rows of the first column (this will maximise the constants spread after the ShiftRows and MixColumns), which will already break the columns symmetry. In order to avoid symmetries, fixed points and more generally subspaces to spread, we need to introduce different constants in several cells of the internal state. The round counter will already naturally have this goal, yet, in order to increase that effect, we have added a “1” bit to the third row, which is almost free in terms of implementation cost. This will ensure that symmetries and subspaces are broken even more quickly, and in particular independently of the round counter. 8.2. RATIONALE OF SKINNY 131

AddRoundTweakey The tweakey schedule of SKINNY follows closely the STK construction from [57] (that allows to easily get bounds on the number of active S-boxes in the related-tweakey model). Yet, we have changed a few parts. Firstly, instead of using multiplications by 2 and 3 in a finite field, we have instead re- placed these tweakey cells updates by cheap 4-bit or 8-bit LFSRs (depending on the size of the cell) to minimise the hardware cost. All our LFSRs require only a single XOR for the update, and we have checked that the differential cancellation2 behaviour of these interconnected LFSRs is as required by the STK construction: for a given position, a single cancellation can only happen every 15 rounds for TK2, and the same with two cancellations for TK3. Another important generalisation of the STK construction is the fact that every round we XOR only half of the internal state with some subtweakey. The goal was clearly to optimise hardware performance of SKINNY, and it actually saves an important amount of XORs in a round-based implementation. The potential danger is that the bounds we obtain would dramatically drop because of this change. Yet, surprisingly, the bounds remained actually good and this was a good security/performance trade-off to make. Another advantage is that we can now update the tweakey cells only before they are incorporated to the cipher internal state. Thus, half of tweakey cells only will be updated every round and the period of the cancellations naturally doubles: for a certain cell position, a single cancellation can only happen every 30 rounds for TK2 and two cancellations can only happen every 30 rounds for TK3. The tweakey permutation PT has been chosen to maximise the bounds on the number of active S-boxes that we could obtain in the related-tweakey model (note that it has no impact in the single-key model). Besides, we have enforced for PT the special property that all cells located in third and fourth rows are sent to the first and second rows, and vice-versa. Since only the first and second rows of the tweakey states are XORed to the internal state of the cipher, this ensures that both halves of the tweakey states will be equally mixed to the cipher internal state (otherwise, some tweakey bytes might be more involved in the ciphering process than others). Finally, the cells that will not be directly XORed to the cipher internal state can be left at the same relative position. On top of that, we only considered those variants of PT that consist of a single cycle. We note that since the cells of the first tweakey word TK1 are never updated, they can be directly hardwired to save some area if the situation allows.

ShiftRows and MixColumns Competing with SIMON’s impressive hardware performance required choos- ing an extremely sparse diffusion layer for SKINNY, which was in direct 2While XORing the cells in a tweakey state to form the subkeys, a differential cancel- lation occurs when the cells have an XOR-sum of zero difference. 132 CHAPTER 8. THE SKINNY FAMILY OF BLOCK CIPHERS contradiction with our original goal of obtaining good security bounds for our primitive. After several design iterations, we came to the conclusion that binary matrices were the best choice. More surprisingly, while most block cipher designs are using very strong diffusion layers (like an MDS matrix), and even though a 4 × 4 binary matrices with branch number 4 exist, we preferred a much sparser candidate which we believe offers the best security/performance trade-off (this can be measured in terms of Figure Of Adversarial Merit [68]). Due to its strong sparseness, the SKINNY binary diffusion matrix M has only a differential or linear branch number 2. This seems to be worrisome that diffusion might be slow; however, we designed M such that when propagation from one non-zero input difference to one non-zero output difference occurs, the next round will likely lead to a much higher branch number. Looking at M, the only way to meet branching two is to have an input difference in either the second or the fourth input only. This leads to an input difference in the first or third element for the next round, which then diffuses to many output elements. The differential characteristic with a single active S-box per round is therefore impossible, and actually we will be able to prove at least 96 active S-boxes for 20 rounds. Thus, for the very low cost of a binary diffusion matrix with differential branch number two, we are getting a better security level than expected when looking at the iteration of several rounds. The effect is the same with linear branching (for which we only need to look at the transpose of the inverse of M, i.e. (M−1)>). We have considered all possibilities for M that can be implemented with at most three XOR operations and eventually kept the MixColumns matrices that, in combination with ShiftRows, guaranteed high diffusion and led to strong bounds on the minimal number of active S-boxes in the single-key model. Note that another important criterion came into play regarding the choice of the diffusion layer of SKINNY: it is important that the key material impacts as fast as possible the cipher internal state. This is in particular a crucial point for SKINNY as only half of the state is mixed with some key material every round, since there are no whitening keys. Besides, having a fast key diffusion will reduce the impact of meet-in-the-middle attacks. Once the two first rows of the state were arbitrarily chosen to receive the key material, given a certain subtweakey, we could check how many rounds were required (in both encryption and decryption directions) to ensure that the entire internal state depends on this subtweakey. Our final choice of MixColumns is optimal: only a single round is required in both forward and backward directions to ensure this diffusion.

8.2.3 Comparing differential bounds

Our entire design has been crafted to allow good provable bounds on the minimal number of differential or linear active S-boxes, not only for the single-key model, but also in the related-key model (or more precisely the 8.2. RATIONALE OF SKINNY 133

Table 8.6: Proved bounds on the minimal number of differential active S- boxes for SKINNY-64-128 and various lightweight 64-bit block 128-bit key ciphers. Model SK denotes the single-key scenario and model TK2 denotes the related-tweakey scenario where differences can be inserted in both states TK1 and TK2.

Rounds Cipher Model 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 SKINNY SK 1 2 5 8 12 16 26 36 41 46 51 55 58 61 66 75 (36 rounds) TK2 0 0 0 0 1 2 3 6 9 12 16 21 25 31 35 40 LED SK 1 5 9 25 26 30 34 50 51 55 59 75 76 80 84 100 (48 rounds) TK2 0 0 0 0 0 0 0 0 1 5 9 25 26 30 34 50 Piccolo SK 0 5 9 14 18 27 32 36 41 45 50 54 59 63 68 72 (31 rounds) TK2 0 0 0 0 0 0 0 5 9 14 18 18 23 27 27 32 Midori SK 1 3 7 16 23 30 35 38 41 50 57 62 67 72 75 84 (16 rounds) TK2 ------TWINE SK 0 1 2 3 4 6 8 11 14 18 22 24 27 30 32 - (36 rounds) TK2 ------related-tweakey model in our case). We provide in Table 8.6 a comparison of our bounds with the best known proven bounds for other lightweight block ciphers at the same security level (all the ciphers in the table use 4-bit −2 S-boxes with Dmax = 2 ). We give in Section 8.3 more details on how the bounds of SKINNY were obtained. First, we emphasise that most of the bounds we obtained for SKINNY are not tight, and we can hope for even higher minimal numbers of active S-boxes. This is not the case of LED for which the bounds are tight. From the table, we can see that LED obtains better bounds for SK. Yet, the situation is inverted for TK2: due to a strong plateau effect in the TK2 bounds of LED, it stays at 50 active S-boxes until Round 24, while SKINNY already reaches 72 active S-boxes at Round 24 (Table 8.8). Besides, the LED performance will be quite bad compared to SKINNY, due to its strong MDS diffusion layer and strong S-box. Regarding Piccolo, the bounds3 are really similar to SKINNY for SK but worse for TK2. Yet, our round function is lighter (no use of an MDS layer). No related-key bounds are known for Midori or TWINE. Regarding Midori in SK, while our bounds are slightly worse, we emphasise again that our round function is much lighter and thus will lead to much better performance. Lastly, regarding TWINE, we offer significantly better bounds in SK. Comparing differential bounds with SIMON is not as simple as with SPN ciphers. Yet, bounds on the best differential/linear characteristics for SIMON

3 We estimate the number of active S-boxes for Piccolo to d4.5 · Nf e, where Nf is the number of active F -functions taken from [112]. 134 CHAPTER 8. THE SKINNY FAMILY OF BLOCK CIPHERS

Table 8.7: Comparison between AES-128 and SIMON/SKINNY versions for the proportion of total number of rounds needed to provide a sufficiently good differential characteristic probability bound according to the cipher block size (i.e. < 2−64 for 64-bit block size and < 2−128 for 128-bit block size). Results for SIMON are updated results taken from [72].

Cipher Single-key Related-key SKINNY-64-128 8/36 = 0.22 15/36 = 0.42 SIMON-64-128 19/44 = 0.43 no bound known SKINNY-128-128 14/40 = 0.35i 19/40 = 0.47 SIMON-128-128 37/68 = 0.54 no bound known AES-128 4/10 = 0.40 6/10 = 0.60

i An updated result from [1]. have been provided by [72]4. Assuming (very) pessimistically for SKINNY that a maximum differential transition probability of 2−2 is always possible for each active S-box in the differential paths with the smallest number of active S-boxes, we can directly obtain easy bounds on the best differential/linear characteristics for SKINNY. We provide in Table 8.7 a comparison between SIMON and SKINNY versions for the proportion of total number of rounds needed to provide a sufficiently good differential characteristic probability bound according to the cipher block size. One can see that SKINNY needs a much smaller proportion of its total number of rounds compared to SIMON to ensure enough confidence with regards to simple differential/linear attacks. Actually the related-key ratios of SKINNY are even smaller than single-key ratios of SIMON (no related-key bounds are known as of today for SIMON). Finally, in terms of diffusion, all versions of SKINNY achieve full diffusion after only 6 rounds (forwards or backwards), while SIMON versions with 64-bit block size requires 9 rounds, and even 13 rounds for SIMON versions with 128-bit block size [72](AES-128 reaches full diffusion after 2 of its 10 rounds). Again, the diffusion comparison according to the total number of rounds is at SKINNY’s advantage. 8.3 Security Analysis

In this section, we provide an in-depth analysis of the security of the SKINNY family of block ciphers. We emphasise that we do not claim any security in the chosen-key or known-key model, but we do claim security in the related-key model. Moreover, we chose not to use any constant to differentiate between different block sizes or tweakey sizes versions of SKINNY, as we believe such a separation should be done at the protocol level, for example by deriving

4Their article initially contained results only for the smallest versions of SIMON, but the authors provided us updated results for all versions of SIMON. 8.3. SECURITY ANALYSIS 135 different keys (note that, if needed, this can easily be done by encoding these sizes and use them as fixed extra constant material every round).

8.3.1 Differential and linear cryptanalysis

In order to argue for the resistance of SKINNY against differential and linear attacks, we computed lower bounds on the minimal number of active S-boxes, both in the single-key and related-tweakey model. We recall that, in a differential (resp. linear) characteristic, an S-box is called active if it contains a non-zero input difference (resp. input linear mask). In contrast to the single-key model, where the round tweakeys are constant and thus do not influence the activity pattern, an adversary is allowed to introduce differences (resp. linear masks) within the tweakey state in the related-tweakey model. For that, we considered the three cases of choosing input differences in TK1 only, both TK1 and TK2, and in all of the tweakey states TK1, TK2 and TK3, respectively. Table 8.8 presents lower bounds on the number of differential active S-boxes for 1 up to 30 rounds. For computing these bounds, we generated a MILP model following the approach explained in [89,119].

Table 8.8: Lower bounds on the number of active S-boxes in SKINNY. Note that the bounds on the number of linear active S-boxes in the single-key model are also valid in the related-tweakey model. In case the MILP optimisation was too long, we provide upper bounds between parentheses.

Model 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 SK 1 2 5 8 12 16 26 36 41 46 51 55 58 61 66 TK1 0 0 1 2 3 6 10 13 16 23 32 38 41 45 49 TK2 0 0 0 0 1 2 3 6 9 12 16 21 25 31 35 TK3 0 0 0 0 0 0 1 2 3 6 10 13 16 19 24 SK Lin 1 2 5 8 13 19 25 32 38 43 48 52 55 58 64

Model 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 SK 75 82 88 92 96 102 108 (114) (116) (124) (132) (138) (136) (148) (158) TK1 54 59 62 66 70 75 79 83 85 88 95 102 (108) (112) (120) TK2 40 43 47 52 57 59 64 67 72 75 82 85 88 92 96 TK3 27 31 35 43 45 48 51 55 58 60 65 72 77 81 85 SK Lin 70 76 80 85 90 96 102 107 (110) (118) (122) (128) (136) (141) (143)

For lower bounding the number of linear active S-boxes, we used the same approach. For that, we considered the transpose of the inverse diffusion matrix, i.e. (M−1)>. However, for the linear case, we only considered the single-key model. As it is described in [73], there is no cancellation of active S-boxes in linear characteristics. Thus, the bounds for SK give valid bounds also for the case where the adversary is allowed to not only control the message but also the tweakey input. The above bounds are for single characteristic, thus it will be interesting to take a look at differentials and linear hulls. As this is a rather complex task, we leave this as future work. 136 CHAPTER 8. THE SKINNY FAMILY OF BLOCK CIPHERS

8.3.2 Impossible differential cryptanalysis

We searched for impossible differential characteristics with the miss-in-the- middle technique. In short, 16 input truncated differentials and 16 output truncated differentials with single active cell are propagated through the encryption function and decryption function, respectively, until no cell can be inactive or active with probability one. Then, we pick up the pair contradicting each other in the middle. Consequently, we found that the longest impossible differential characteristics reach 11 rounds and there are 16 such characteristics in total. An example of a 11-round impossible differential characteristic is as follows (also depicted in Figure 8.5):

11R (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ∆, 0, 0, 0) 9 (0, 0, 0, 0, 0, 0, 0, 0, ∆0, 0, 0, 0, 0, 0, 0, 0).

r1 r2 r3 r4

contradiction

= r5 r6 6 r7

r8 r9 r10 r11

Figure 8.5: 11-round impossible differential characteristic, contradicting cell at IS3,3. ri stands for the round function at round i. Active cells are in black while unknown cells are in grey.

Several rounds can be appended before and after the 11-round impossible differential characteristic. The number of rounds appended depend on the key size. For example, when the block size and the key size are the same, two rounds and three rounds can be appended before and after the characteristic respectively, which yields a 16-round key-recovery attack. The plaintext difference becomes (0, 0, 0, ∗, ∗, ∗, ∗, 0, 0, ∗, 0, ∗, 0, 0, ∗, 0) and the ciphertext difference becomes (∗, ∗, ∗, ∗, ∗, ∗, ∗, 0, 0, 0, ∗, ∗, ∗, ∗, ∗, 0), where ∗ denotes a non-zero difference. The entire differential characteristic is illustrated in Figure 8.6. The analysis is slightly different from standard SPN ciphers due to the lack of whitening key and unique order of the AddRoundTweakey (ARK) operation. For Round 1, ARK can be moved after the ShiftRows (SR) and MixColumns (MC) operations by applying the corresponding linear transformation to the 8.3. SECURITY ANALYSIS 137

Round 1 X ART’ X SC AC SR MC X MC SR(TK1) ◦

Round 2 ART SC AC SR MC TK2

Round 3

11-round Impossible Differential Characteristic

Round 14 ART SC AC SR MC TK14

X Round 15 ART SC AC SR MC TK15

X X X Round 16 Ciphertext X ART SC AC SR MC TK16

Figure 8.6: 16-round key-recovery with impossible differential attack for SKINNY-64 with 64-bit tweakey and SKINNY-128 with 128-bit tweakey. X stands for the guessed key cells and TKi is the round key at round i. tweakey value, which confirms that the first round acts as a keyless operation. Then, the analysis can start by regarding input difference to Round 2 as the plaintext difference and this is masked by the equivalent tweakey for the first round. It also shows that the number of (equivalent) tweakey cells involved is 3 in the first two rounds and 5 in the last three rounds; hence, 8 cells in total. The adversary constructs 2x structures at the input to Round 2 and each structure consists of 23s values, where s is the cell size (i.e. 4 bits for SKINNY-64 and 8 bits for SKINNY-128). In total, 2x+6s−1 pairs can be constructed from those 2x+3s values. All the 2x+3s values are inverted by one keyless round to obtain the corresponding original plaintexts and further queried the encryption oracle to obtain their corresponding ciphertexts. The adversary only picks up the pair which has 9 inactive cells after inverting the last MC operation. 2x−3s−1 pairs are expected to remain after the filtering. For each such pair, the adversary can generate all tweakey values for 8 cells leading to the impossible differential characteristic by guessing 5 internal state 138 CHAPTER 8. THE SKINNY FAMILY OF BLOCK CIPHERS cells, which are 1-cell differences after MC in Round 3 and 4-cell differences before MC in Round 14 and Round 15. In the end, the adversary obtains 2x−3s−1+5c = 2x+2s−1 wrong key suggestions for 8 tweakey cells, which makes the remaining tweakey space

28s · (1 − 2−8s)2x+2s−1 = 28s · e−2x−6s−1 .

When s = 8, we choose x = 54.5, which makes the remaining key space 264 · 2−65.3 < 1. When s = 4, we choose x = 29.5, which makes the remaining key space 232 · 2−32.6 < 1. All in all, the data complexity amounts to 2x+3s chosen plaintexts, and time and memory complexities are max{2x+3s, 2x+2s−1}. Hence, data, time and memory complexities reach 288.5 for SKINNY-128 with 128-bit key (s = 8) and 241.5 for SKINNY-64 with 64-bit key (s = 4).

Related-tweakey impossible differential cryptanalysis

Besides the single-key model, we also consider the related-key model for the impossible differential attack. Figure 8.7 presents a 11-round related-key differential characteristic for SKINNY-64-128. Using this impossible differential characteristic, we can obtain a 21-round attack on SKINNY-64-128. We can further extend the attack by one or two rounds by choosing tweak values at specific positions. Table 8.9 summarises the complexity for various attacks on SKINNY-64-128.

Table 8.9: Complexity of related-tweakey impossible differential cryptanalysis on SKINNY-64-128

Rounds Tweak/Key (bits) Time Data Memory 21 0/128 271.4 271.4 268.0 22 48/80 271.6 271.4 264.0 23 48/80 279 271.4 264.0

Given the similarity in the structure, the attacks can be trivially extended to SKINNY-128. Detailed discussion of the attacks can be found in [4]. Since SKINNY-64-128 and SKINNY-128-256 have 36 and 48 rounds respectively, we believe that SKINNY has sufficient security margin to resist both single-key and related-key impossible differential attacks.

8.3.3 Invariant subspace attacks

Invariant subspace cryptanalysis makes use of affine subspaces that are invariant under the round function. As the round key addition translates this invariant subspace, ciphers exhibit weak keys when all round keys are such that the affine subspace stays invariant including the key-addition. 8.3. SECURITY ANALYSIS 139

K1 K2 K3 K4

r1 r2 r3 r4

K5 K6 K7

contradiction

= r5 6 r6 r7

K8 K9 K10 K11

r8 r9 r10 r11

Figure 8.7: 11-round related-key impossible differential characteristic, contra- i dicting cell at IS3,0. K is the combined key of TK1 and TK2, and ri stands for the round function at round i. Active cells are in black while unknown cells are in grey.

Therefore, those attacks are mainly an issue for block ciphers that use identical round keys. For SKINNY the non-trivial key-scheduling already provides a good protection against such attacks for a larger number of rounds. The main concern that remains are large-dimensional subspaces that propagate invariant through the S-box. We checked that no such invariant subspaces exist. Moreover, for the 8-bit S-box, we computed all affine subspaces of dimension larger than two that get mapped to (different) affine subspaces and checked if those can be chained to what could be coined a subspace characteristic (refer to [47] for a similar approach).

It turns out that those subspaces can be chained only for a very small number of rounds. Figure 8.8 shows as an example the affine spaces of dimension five. Thus to conclude, the non-trivial key-scheduling and the use of round-constants seem to sufficiently protect SKINNY against those attacks. 140 CHAPTER 8. THE SKINNY FAMILY OF BLOCK CIPHERS

01, 08, 10, 20, 42 01, 02, 08, 10, 20

0a, 10, 20, 40, 80 0b, 10, 20, 40, 80 08, 10, 20, 40, 80 09, 10, 20, 40, 80

02, 05, 10, 40, 80 03, 04, 10, 40, 80 02, 04, 10, 40, 80

01, 02, 04, 08, 40 01, 02, 04, 08, 50 01, 02, 04, 08, d0 01, 02, 04, 08, c0

01, 06, 08, 10, 20 01, 04, 08, 10, 20

Figure 8.8: The graph showing all 5-dimensional affine spaces that get mapped to (different) 5-dimensional spaces by applying the 8-bit S-box of SKINNY-128. The nodes are the subspaces and the edges show which spaces are mapped to which spaces. The affine offset is ignored in this graph. The main point to make here is that the graph has no cycle.

8.3.4 Algebraic attacks

We show that algebraic attacks do not threaten SKINNY. The S-box S4 and S8 have algebraic degree a = 3 and a = 6 respectively. We can see from Table 8.8 that under the single-key scenario, for any consecutive 7-round differential characteristic of SKINNY, there are at least 26 active S-boxes. One can easily check that for all SKINNY variants, we have a · 26 · br/7c > n, where r is the number of rounds and n is the block size. Moreover, S4 is described by e = 21 quadratic equations in the v = 8 input/output variables over GF (2). The entire system for a fixed-key SKINNY permutation therefore consists of 16 · r · e quadratic equations in 16 · r · v variables. For example, in the case of SKINNY-64-64, there are 10752 quadratic equations in 4096 variables. In comparison, the entire system for a fixed-key AES permutation consists of 6400 equations in 2560 variables [50]. While the applicability of algebraic attacks on AES remains unclear, those numbers tend to indicate that SKINNY offers a high level of protection. Other cryptanalysis. We also conducted security analysis on SKINNY with meet-in-the-middle, integral, improved integral (using division property) and slide attacks and showed that they do not threaten the security of SKINNY. Details of these attacks can be found in [14].

8.3.5 Third-party security analysis of SKINNY

Since the publication of the SKINNY cipher [14], SKINNY has drawn a lot of attention from the symmetric-key community. Table 8.10 summarises the state-of-the-art cryptanalysis conducted by the research community. One can see from the Table 8.10 that SKINNY continues to have substan- tial security margin. 8.4. PERFORMANCE AND COMPARISON 141

Table 8.10: Summary of third-party cryptanalysis results on SKINNY, where ID, RK, Trun., Rect. and ZC denotes impossible differential, related- key(tweakey), truncated differential, rectangle and zero correlation cryptanal- ysis respectively. SKINNY-64 has 32, 36 and 40 rounds while SKINNY-128 has 40, 48 and 56 rounds for t = n, 2n and 3n respectively.

n t Rounds Time Data Memory Attack Ref. 14 262 262.58 264 ZC [105] 18 257.1 247.52 258.52 ID [123] 64 19 263.03 261.47 256 RK-ID [81] 19 262.83 261.30 248.30 RK-ID [105] 64 18 2128 262.68 264 ZC [105] 20 2121.08 247.69 274.69 ID [123] 128 23 2125.91 262.47 2124 RK-ID [81] 23 2124.01 262.47 277.47 RK-ID [105] 192 27 2165.5 263.5 280 Trun. RK-Rect. [81] 18 2116.94 292.42 2115.42 ID [123] 128 19 2124.60 2122.47 2112 RK-ID [81] 19 2124.43 2122.47 297.47 RK-ID [105] 20 2245.72 292.1 2147.1 ID [123] 128 256 23 2251.47 2124.47 2248 RK-ID [81] 23 2240.7 2124.41 2155.41 RK-ID [105] 27 2351 2127 2160 Trun. RK-Rect. [81] 384 27 2351 2123 2155 RK-Rect. [81]

8.4 Performance and Comparison

8.4.1 Hardware implementations

Our SKINNY family has great performance on ASIC platform; we used the Synopsys DesignCompiler version A-2007.12-SP1 to synthesise the designs with the UMCL18G212T3 [126] standard cell library, which is based on the UMC L180 0.18 µm 1P6M logic process with a typical voltage of 1.8 V. For the synthesis, we advised the compiler to keep the hierarchy and use a clock frequency of 100 KHz, which allows a fair comparison with the benchmark of other block ciphers reported in the literature.

Round-based implementation

Table 8.11 compares our implementations with other round-based implemen- tations of lightweight ciphers taken from the literature. In particular, SKINNY-64-128 offers the smallest area footprint compared to other lightweight ciphers. Note, that even SIMON-64-128 implemented in a round-based fashion cannot compete with our design in terms of area 142 CHAPTER 8. THE SKINNY FAMILY OF BLOCK CIPHERS

Table 8.11: Round-based implementations of SKINNY and other lightweight block ciphers.

Area Delay Clock Throughput Ref.

Cycles @100 KHz @maximum

GE ns # KBit/s MBit/s

SKINNY-64-64 1223 1.77 32 200.00 1130.00 new LED-64-64 2695 - 32 198.90 - [51] SKINNY-64-128 1696 1.87 36 177.78 951.11 new SIMON-64-128 1751 1.60 46 145.45 870.00 [13] Piccolo-64-128 1773 i - 33 193.94 - [112] LED-64-128 3036 - 48 133.00 - [51] SKINNY-64-192 2183 2.02 40 160.00 792.00 new SKINNY-128-128 2391 2.89 40 320.00 1107.20 new SIMON-128-128 2342 1.60 70 188.24 1145.00 [13] SKINNY-128-256 3312 2.89 48 266.67 922.67 new SIMON-128-256 3419 1.60 74 177.78 1081.00 [13] SKINNY-128-384 4268 2.89 56 228.57 790.86 new i This figure excludes the additional 576 GE for key storage that is not considered in the original work. although it has a smaller critical path, hence can be operated at higher fre- quencies and provides better throughput. However, comparing the throughput at a frequency of 100 KHz, SKINNY provides better results since the number of rounds is substantially lower than for SIMON. Using block sizes of 128 bits, SKINNY-128-128 is only slightly larger than SIMON-128-128, while SKINNY-128-256 again has a better area footprint. Besides, the throughput behaves in a similar manner as for SKINNY-64, since SIMON-128 still has a smaller critical path (due to less complex logic functions in terms of hardware gates). Still, it can be stated that SKINNY outperforms most existing lightweight ciphers, including SIMON, in terms of area and throughput considering hardware architectures in a round-based style.

Serial implementation

As a common implementation fashion for lightweight ciphers, we have also considered byte- and nibble-serial architectures to examine the performance of SKINNY. Serial implementations have the smallest area footprint for hardware implementations by updating only a small number of bits per clock cycle. 8.4. PERFORMANCE AND COMPARISON 143

However, the throughput and performance of such implementations is de- creased significantly. Often, only a single instance of an S-box is implemented and re-used to update the internal state of the round function in a serial fashion. Depending on the size of the S-box, we call these implementations nibble-serial (4-bit S-box) or byte-serial (8-bit S-box) respectively. In Table 8.12, we list results for nibble-serial implementations of all SKINNY-64 variants as well as results for byte-serial implementations of all SKINNY-128 variants. Obviously, our implementations cannot compete with SIMON considering nibble-serial and byte-serial implementations while area and performance results still are comparable to results for LED and Piccolo found in the literature. Table 8.12: Serial implementations of SKINNY-64 (nibble) and SKINNY-128 (byte).

Area Delay Clock Throughput Ref.

Cycles @100 KHz @maximum

GE ns # KBit/s MBit/s

SKINNY-64-64 988 1.03 704 9.09 88.26 new LED-64-64 966 - 1248 5.1 - [51] SKINNY-64-128 1399 0.95 788 8.12 85.49 new SIMON-64-128 1000 - 384 16.7 - [13] Piccolo-64-128 758i - 528 12.12 - [112] LED-64-128 1265 - 1872 3.4 - [51] SKINNY-64-192 1806 0.95 872 7.34 77.26 new SKINNY-128-128 1840 1.03 872 14.68 142.51 new SIMON-128-128 1317 - 560 22.9 - [13] SKINNY-128-256 2655 0.95 1040 12.31 129.55 new SIMON-128-256 1883 - 608 21.1 - [13] SKINNY-128-384 3474 0.95 1208 10.60 111.54 new i This figure excludes the additional 576 GE for key storage that is not considered in the original work.

8.4.2 Software implementations

In this section, we consider two of the latest Intel processors using SIMD in- struction sets to perform efficient parallel computations of several input blocks (exact setting given in Table 8.13). We give in particular the performance figures for a bit-slice implementations of SKINNY. While we count the costs for packing and unpacking the data, we chose to benchmark encryption given pre-expanded subkeys. The motivation is 144 CHAPTER 8. THE SKINNY FAMILY OF BLOCK CIPHERS

Table 8.13: Machine used to benchmark the software implementations (Turbo Boost disabled).

Name Processor Launch date Linux kernel gcc version Haswell i7-4770S Q2 2013 4.4.0-22 5.3.1 Skylake i7-6700 Q3 2015 4.2.3-040203 5.2.1 twofold: first, many modes of operation make this assumption practical and second, the key schedules of our proposals are light and would not induce big differences in the results. There are scenarios in practice for which the costs of the key schedule play a non-negligible role as pointed out in [20] and we expect the lower costs of the SKINNY key schedule to provide a good performance. In Table 8.14, we give the detailed performance figures of our implemen- tations in the case of SKINNY-64 and compare it with other ciphers. Note that these implementations take into account all data transformations which are required. The bit-slice implementations for SIMON processing 32 (resp. 64) blocks have been provided by the designers to allow us a fair comparison in the same setting.

Table 8.14: Bit-slice implementations of SKINNY and other lightweight block ciphers. Performance numbers are given in cycles per byte, with pre- expanded subkeys. For SKINNY-64 and SIMON we encrypted 2000 64-bit blocks to obtain the results. Cells with dashes (-) represent non-existing implementations to date.

Haswell Skylake Ref. Parallelisation ρ 16 32 64 16 32 64 SKINNY-64-128 - - 2.58 - - 2.48 new SIMON-64-128 - - 1.58 - - 1.51 [129] LED-64-128 22.6 13.7 - 23.1 13.3 - [20] Piccolo-64-128 9.2 - - 9.2 - - [20] SKINNY-128-128 - - 3.78 - - 3.43 new SIMON-128-128 - - 2.38 - - 2.21 [129]

8.5 Conclusion

Though several iterations and thorough analysis, we reach an interesting point where good performance meets strong security level. On top of competitive performance against SIMON, SKINNY is an ad-hoc TBC family of block ciphers that provides more diverse usage and high security guarantee against classical and state-of-the-art cryptanalysis techniques. All in all, we believe that SKINNY is a good general-purpose lightweight encryption algorithm. Chapter 9

GIFT: A Small Present

Ten years after the original publication of PRESENT [28], we revisit the design strategy of PRESENT, leveraging all the advances provided by the research community in construction and cryptanalysis since its publication, to push the design up to its limits. We obtain an improved version, named GIFT, a very simple and clean design that provides a much increased efficiency in all domains (smaller and faster), while correcting the well-known weakness of PRESENT with regards to linear cryptanalysis [30]. GIFT reaches a point where almost the entire implementation area is taken by the storage and the S-boxes, where any cheaper choice of S-box would lead to a very weak proposal. In essence, GIFT is composed of only S-box and bit-wiring, but its natural bit-slice data flow ensures excellent performance in all scenarios, from area-optimised hardware implementations to very fast software implementation on high-end platforms. We conducted a thorough analysis of our design with regards to state- of-the-art cryptanalysis, and we provide strong bounds with regards to differential and linear cryptanalysis. Contribution of this chapter. In this chapter, we present a new BULB cipher GIFT, improving over PRESENT in both security and efficiency. Inter- estingly, our cipher GIFT offers extremely good performance for round-based implementations. This indicates that GIFT is probably the cipher most suited for the very important low-energy consumption use-cases. Due to its simplicity and natural bit-slice organisation of the inner data flow, our cipher is very versatile and performs also very well in software, reaching similar performance as SIMON, the current fastest lightweight candidate in software. In more details, we have revisited the PRESENT design strategy and pushed it to its limits, while providing special care to the known weak point of PRESENT: the linear hulls. The P-layer of PRESENT being composed of only a bit-permutation, most of the security of PRESENT relies on its S-box. This S-box presents excellent cryptographic properties, but is quite costly. Indeed, it is trivial to see that the PRESENT S-box needs to have a branch number 3, or very good differential characteristics (for the adversary) would exist otherwise (with only a single active S-box per round). We managed to remove this constraint by carefully crafting the bit-permutation layer in

145 146 CHAPTER 9. GIFT: A SMALL PRESENT conjunction with the DDT and LAT of the S-box. Since any weaker choice of the S-box would lead to a very insecure design, we argue that GIFT is probably very close to reaching the area limit of lightweight encryption. In terms of security, we are able to provide strong security bounds for simple differential and linear attacks. We can even show that GIFT is very resistant against linear hulls, and the clustering effect is greatly reduced when compared to PRESENT, thus correcting its main weak point. We have conducted a thorough security analysis of our candidate with state-of-the-art cryptanalysis techniques. We first give the specification of GIFT in Section 9.1, and we provide the design rationale in Section 9.2. A thorough security analysis is performed in Section 9.3, while performance numbers are given in Section 9.4.

9.1 Specifications

We propose two versions of GIFT, GIFT-64-128 is a 28-round SPN cipher and GIFT-128-128 is a 40-round SPN cipher, both versions have a key size of 128-bit. For short, we call them GIFT-64 and GIFT-128 respectively. GIFT can be perceived in different representations. The classical 1D repre- sentation, describing the bits in a row like PRESENT (see Section 9.1.1). It can also be described in bit-slice 2D, a rectangular array like RECTANGLE [132] (see Section 9.1.2).

9.1.1 Specification

Initialisation

The cipher receives an n-bit plaintext bn−1bn−2...b0 as the internal state IS, where n = 64, 128 and b0 being the LSB. The internal state can also be expressed as g many 4-bit nibbles IS = wg−1kwg−2k...kw0, where g = 16, 32. The cipher also receives a 128-bit key K = k7kk6k...kk0 as the key state, where ki is a 16-bit word.

Round function

Each round of GIFT consists of 3 steps: SubCells, PermBits, and AddRound- Key, which is conceptually similar to wrapping a gift: 1. Put the content into a box (SubCells); 2. Wrap the ribbon around the box (PermBits); 3. Tie a knot to secure the content (AddRoundKey).

Figure 9.1 illustrates 2 rounds of GIFT-64, with line i representing bi.

SubCells. Both versions of GIFT use the same invertible 4-bit S-box, SG. The S-box is applied to every nibble of the internal state.

wi ← SG(wi), ∀i ∈ {0, ..., s − 1} . 9.1. SPECIFICATIONS 147

63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

SG SG SG SG SG SG SG SG SG SG SG SG SG SG SG SG

SG SG SG SG SG SG SG SG SG SG SG SG SG SG SG SG

Figure 9.1: 2 Rounds of GIFT-64

Table 9.1: Specifications of GIFT S-box SG

x 0123456789abcdef

SG(x) 1a4c6f392db7508e −1 SG (x) d0862c4be71a39f5

The action of this S-box in hexadecimal notation is given in Table 9.1. PermBits. The bit-permutation used in GIFT-64 and GIFT-128 are given in Table 9.2 and 9.3 respectively. It maps bits from bit position i of the internal state to bit position P(i).

bP(i) ← bi, ∀i ∈ {0, ..., n − 1} .

The permutations can also be expressed as:

 i  $i mod 16% ! ! P (i) = 4 + 16 3 + (i mod 4) mod 4 + (i mod 4) , 64 16 4  i  $i mod 16% ! ! P (i) = 4 + 32 3 + (i mod 4) mod 4 + (i mod 4) . 128 16 4

AddRoundKey. This step consists in adding the round key and round constants. An n/2-bit round key RK is extracted from the key state, it is further partitioned into 2 g-bit words RK = UkV = ug−1...u0kvg−1...v0, where g = 16, 32 for GIFT-64 and GIFT-128 respectively. For GIFT-64, U and V are XORed to {b4i+1} and {b4i} of the internal state respectively.

b4i+1 ← b4i+1 ⊕ ui, b4i ← b4i ⊕ vi, ∀i ∈ {0, ..., 15} . 148 CHAPTER 9. GIFT: A SMALL PRESENT

Table 9.2: Specifications of GIFT-64 bit-permutation

i 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

P64(i) 0 17 34 51 48 1 18 35 32 49 2 19 16 33 50 3 i 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

P64(i) 4 21 38 55 52 5 22 39 36 53 6 23 20 37 54 7 i 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

P64(i) 8 25 42 59 56 9 26 43 40 57 10 27 24 41 58 11 i 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

P64(i) 12 29 46 63 60 13 30 47 44 61 14 31 28 45 62 15

Table 9.3: Specifications of GIFT-128 bit-permutation

i 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

P128(i) 0 33 66 99 96 1 34 67 64 97 2 35 32 65 98 3 i 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

P128(i) 4 37 70 103 100 5 38 71 68 101 6 39 36 69 102 7 i 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

P128(i) 8 41 74 107 104 9 42 75 72 105 10 43 40 73 106 11 i 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

P128(i) 12 45 78 111 108 13 46 79 76 109 14 47 44 77 110 15 i 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79

P128(i) 16 49 82 115 112 17 50 83 80 113 18 51 48 81 114 19 i 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95

P128(i) 20 53 86 119 116 21 54 87 84 117 22 55 52 85 118 23 i 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111

P128(i) 24 57 90 123 120 25 58 91 88 121 26 59 56 89 122 27 i 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127

P128(i) 28 61 94 127 124 29 62 95 92 125 30 63 60 93 126 31

For GIFT-128, U and V are XORed to {b4i+2} and {b4i+1} of the internal state respectively.

b4i+2 ← b4i+2 ⊕ ui, b4i+1 ← b4i+1 ⊕ vi, ∀i ∈ {0, ..., 31} .

For both versions of GIFT, a single bit 1 and a 6-bit round constant (rc5, rc4, rc3, rc2, rc1, rc0) are XORed into the internal state at bit position n − 1, 23, 19, 15, 11, 7 and 3 respectively.

bn−1 ← bn−1 ⊕ 1 ,

b23 ← b23 ⊕ rc5 , b19 ← b19 ⊕ rc4 , b15 ← b15 ⊕ rc3 ,

b11 ← b11 ⊕ rc2 , b7 ← b7 ⊕ rc1 , b3 ← b3 ⊕ rc0 . 9.1. SPECIFICATIONS 149

Key schedule and round constants The key schedule and round constants are the same for both versions of GIFT, the only difference is the round key extraction. A round key is first extracted from the key state before the key state update. For GIFT-64, two 16-bit words of the key state are extracted as the round key RK = UkV . U ← k1,V ← k0 . For GIFT-128, four 16-bit words of the key state are extracted as the round key RK = UkV . U ← k5kk4 ,V ← k1kk0 . The key state is then updated as follows,

k7kk6k...kk1kk0 ← k1 ≫ 2kk0 ≫ 12k...kk3kk2 , where ≫ i is an i-bit right rotation within a 16-bit word. The round constants are generated using a 6-bit affine LFSR (same as SKINNY), whose state is denoted as (rc5, rc4, rc3, rc2, rc1, rc0). Its update function uses a single XNOR gate is defined as:

(rc5krc4krc3krc2krc1krc0) ← (rc4krc3krc2krc1krc0krc5 ⊕ rc4) . The six bits are initialised to zero, and updated before being used in a given round. The values of the constants for each round are given in the table below, encoded to byte values for each round, with rc0 being the LSB.

Rounds Constants 1 - 16 01,03,07,0f,1f,3e,3d,3b,37,2f,1e,3c,39,33,27,0e 17 - 32 1d,3a,35,2b,16,2c,18,30,21,02,05,0b,17,2e,1c,38 33 - 48 31,23,06,0d,1b,36,2d,1a,34,29,12,24,08,11,22,04

9.1.2 GIFT in 2-dimensional array

This section provides an alternative description of GIFT that resembles the description of the block cipher RECTANGLE. This representation is similar to the principle of LS-designs introduced by Grosso et al. [48].

Initialisation For GIFT-64 and GIFT-128, the plaintext is arranged into 4 rows of g bits in a top-down, right to left manner, where g = 16, 32 respectively. The internal state is described in a two-dimensional array, as illustrated below.     bn−4 ... b8 b4 b0 s0,g−1 ... s0,2 s0,1 s0,0     b ... b b b  s ... s s s   n−3 9 5 1  1,g−1 1,2 1,1 1,0   ⇒   b ... b b b  s ... s s s   n−2 10 6 2  2,g−1 2,2 2,1 2,0 bn−1 ... b11 b7 b3 s3,g−1 ... s3,2 s3,1 s3,0 150 CHAPTER 9. GIFT: A SMALL PRESENT

The key, on the other hand, is arranged into 4 rows of 32 bits in a right to left, top-down manner.

    k31 ... k2 k1 k0 t0,31 ... t0,2 t0,1 t0,0      k ... k k k  t ... t t t   63 34 33 32  1,31 1,2 1,1 1,0   ⇒    k ... k k k  t ... t t t   95 66 65 64  2,31 2,2 2,1 2,0 k127 ... k98 k97 k96 t3,31 ... t3,2 t3,1 t3,0

The round function

SubCells. Both versions of GIFT use the same invertible 4-bit S-box, SG. The S-box is applied in parallel to every column of the state.

(s3,jks2,jks1,jks0,j) ← SG(s3,jks2,jks1,jks0,j), ∀j ∈ {0, ..., g − 1} .

PermBits. Four different bit-permutations are applied to the rows of the internal state independently. It maps bits from bit position (i, j) to bit position (i, Pi(j)). Refer to Table 9.4 and 9.5 for each row of the bit- permutation for GIFT-64 and GIFT-128 respectively.

si,Pi(j) ← si,j, ∀i ∈ {0, ..., 3}, ∀j ∈ {0, ..., g − 1} .

Table 9.4: Specifications of the GIFT-64 bit-permutation for si,j

j 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

P0(j) 0 12 8 4 1 13 9 5 2 14 10 6 3 15 11 7

P1(j) 4 0 12 8 5 1 13 9 6 2 14 10 7 3 15 11

P2(j) 8 4 0 12 9 5 1 13 10 6 2 14 11 7 3 15

P3(j) 12 8 4 0 13 9 5 1 14 10 6 2 15 11 7 3

AddRoundKey. An n/2-bit round key is extracted from the key state and XORed to 2 rows of the internal state. For GIFT-64, the key state row 0 is extracted, the first 16 bits are XORed to the internal state row 0 while the next 16 bits are XORed to the internal state row 1.

s0,j ← s0,j ⊕ t0,j ,

s1,j ← s1,j ⊕ t0,j+16, ∀j ∈ {0, ..., 15} .

For GIFT-128, the key state row 0 and 2 are extracted and XORed to the internal state row 1 and 2 respectively.

s1,j ← s1,j ⊕ t0,j ,

s2,j ← s2,j ⊕ t1,j, ∀j ∈ {0, ..., 31} . 9.1. SPECIFICATIONS 151

Table 9.5: Specifications of the GIFT-128 bit-permutation for si,j

j 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

P0(j) 0 24 16 8 1 25 17 9 2 26 18 10 3 27 19 11

P1(j) 8 0 24 16 9 1 25 17 10 2 26 18 11 3 27 19

P2(j) 16 8 0 24 17 9 1 25 18 10 2 26 19 11 3 27

P3(j) 24 16 8 0 25 17 9 1 26 18 10 2 27 19 11 3

j 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

P0(j) 4 28 20 12 5 29 21 13 6 30 22 14 7 31 23 15

P1(j) 12 4 28 20 13 5 29 21 14 6 30 22 15 7 31 23

P2(j) 20 12 4 28 21 13 5 29 22 14 6 30 23 15 7 31

P3(j) 28 20 12 4 29 21 13 5 30 22 14 6 31 23 15 7

In addition to the round key, for both version of GIFT, a single bit 1 is XORed to the leftmost bit of the internal state row 3 and a 6-bit round constant (rc5, rc4, rc3, rc2, rc1, rc0) is XORed to first 6 bits of the internal state row 3.

s3,g−1 ← s3,g−1 ⊕ 1 ,

s3,j ← s3,j ⊕ rcj, ∀j ∈ {0, ..., 5} .

Key schedule The round key is first extracted from the key state before the key state update. The key state update first rotates 2 blocks of 16 bits of the key state row 0 independently as follows,

≫12 (t0,11kt0,10k...kt0,13kt0,12) ←−−− (t0,15kt0,14k...kt0,1kt0,0) , ≫2 (t0,17kt0,16k...kt0,19kt0,18) ←−− (t0,31kt0,30k...kt0,17kt0,16) . Next, the key state rows are rotated upwards to form the final key state. The entire key state update is depicted in the following:     t0,31 ... t0,16 t0,15 ... t0,0 t1,31 ... t1,16 t1,15 ... t1,0     t ... t t ... t  t ... t t ... t   1,31 1,16 1,15 1,0  2,31 2,16 2,15 2,0    ←   t ... t t ... t  t ... t t ... t   2,31 2,16 2,15 2,0  3,31 3,16 3,15 3,0  t3,31 ... t3,16 t3,15 ... t3,0 t0,17 ... t0,18 t0,11 ... t0,12 The round constant update is exactly the same as in Section 9.1.1. Remark: GIFT aims at single-key security, so we do not claim any related- key security (even though no attack is known in this model as of today). In case one wants to protect against related-key attacks as well, we advise to double the number of rounds. 152 CHAPTER 9. GIFT: A SMALL PRESENT

9.2 Design Rationale

9.2.1 The Designing of GIFT

Before we discuss the design rationale of GIFT, we would like to share some background story about GIFT, its design approach, and its comparison with other PRESENT-like ciphers.

The origin of GIFT. It all started with a casual remark “What if the S- boxes in PRESENT are replaced with some smaller S-boxes, say the Piccolo S-box? It will be extremely lightweight since the core cipher only has some S-boxes and nothing else...”. We quickly tested it but only to realise that the differential bounds became very low because the S-box does not have branch number 3. That is when we started analysing the differential characteristics and studying the interaction between the bit-permutation layer and the S-box. Surprisingly, we found that by carefully crafting the bit-permutation layer based on the properties of the S-box, we were able to achieve the same differential bound as PRESENT without the constraint of S-box with branch number 3. In addition, this result can also be applied to improve the linear cryptanalysis resistance which was lacking in PRESENT. Eventually, a small present — GIFT was created.

Design approach. GIFT adopts a small-area-increase-security design ap- proach, that is to start from a small area goal, we try to improve its security as much as possible. Our small area goal implies that S-boxes with branch number 3 (usually heavy to implement) are out of the equation. Yet, one of our goals is to have a minimum security level that is on par with PRESENT, for instance having an average of 2 active S-boxes per round for the differen- tial bound. Through analysing the bit-permutation, we propose new S-box criteria which guarantee some level of security, and we search for our ideal S-box within our area budget.

Other PRESENT-like ciphers. Besides PRESENT, one may also compare GIFT-64 with RECTANGLE [132] since both are 4/64-SbPN ciphers and an improvement on the design of PRESENT. RECTANGLE was designed to be software-friendly and to achieve a better resistance against linear cryptanalysis compared to PRESENT. However, although its bit-permutation (ShiftRow) was designed to be software-friendly, little analysis was done on how differential and linear characteristics propagate through the cipher. Whereas for GIFT, we study the interplay of the bit-permutation layer and the S-box to achieve better differential and linear bounds. In addition, the ShiftRow of RECTANGLE achieves full diffusion in 4 rounds at best. Whereas GIFT-64 achieves full diffusion in 3 rounds like PRESENT, which is optimal for 4/64-SbPN ciphers. 9.2. DESIGN RATIONALE 153

9.2.2 Design of the GIFT bit-permutation

To better understand the design rationale of the GIFT bit-permutation layer, we first look at the bit-permutation layer of PRESENT to analyse the issue when the S-box is replaced with another S-box that does not have branch number 3. Next, we show how we can solve this issue by carefully designing the bit-permutation.

Bit-permutation layer of PRESENT The bit-permutation of PRESENT is given in Table 9.6.

Table 9.6: Bit-permutation of PRESENT

i 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

PP (i) 0 16 32 48 1 17 33 49 2 18 34 50 3 19 35 51 i 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

PP (i) 4 20 36 52 5 21 37 53 6 22 38 54 7 23 39 55 i 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

PP (i) 8 24 40 56 9 25 41 57 10 26 42 58 11 27 43 59 i 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

PP (i) 12 28 44 60 13 29 45 61 14 30 46 62 15 31 47 63

Observe that the bit-permutation can be partitioned into 4 independent bit-permutations, mapping the output of 4 S-boxes to the input of 4 S-boxes in the next round. i i i For convenience, we label the S-boxes in i-th round as S0, S1, ..., Sg−1, where g = n/4. These S-boxes can be grouped in 2 different ways - the Quotient and Remainder groups, Qx and Rx, defined as

• Qx = {S4x, S4x+1, S4x+2, S4x+3},

• Rx = {Sx, Sq+x, S2q+x, S3q+x}, where q = g/4, 0 ≤ x ≤ q − 1.

i i i i i In PRESENT, n = 64 and output bits of Qx = {S4x, S4x+1, S4x+2, S4x+3} i+1 i+1 i+1 i+1 i+1 map to input bits of Rx = {Sx , S4+x, S8+x, S12+x}, this group mapping is defined in Table 9.7. The (x, y)-entry with a 2-tuple (a, b), starting index from 0, denotes that the a-th output bit of the S-box corresponding to the row x at i-th round will map to the b-th input bit of the S-box corresponding to the column y at the (i + 1)-st round. For example, suppose x = 2, row and column start at 0, then the (2, 3)-entry with the 2-tuple (3, 2) means that the 3-rd output bit i i+1 of S10 maps to 2-nd input bit of S14 , thus PP (43) = 58 (see Table 9.6). The PRESENT bit-permutation can be realised in hardware with wires only (no logic gates required). Further, full diffusion is achieved in 3 rounds; from 1 bit to 4, then 4 to 16 and then 16 to 64. But, if there exists a Hamming weight 1 to Hamming weight 1 differential transition, or a 1−1 bit differential transition, then there exist consecutive single active bit transitions. 154 CHAPTER 9. GIFT: A SMALL PRESENT

Table 9.7: PRESENT group mapping from Qxi to Rxi+1. P PP Rxi+1 PP Si+1 Si+1 Si+1 Si+1 i PP x 4+x 8+x 12+x Qx PP i S4x (0, 0) (1, 0) (2, 0) (3, 0) i S4x+1 (0, 1) (1, 1) (2, 1) (3, 1) i S4x+2 (0, 2) (1, 2) (2, 2) (3, 2) i S4x+3 (0, 3) (1, 3) (2, 3) (3, 3)

We define a 1−1 bit DDT as a sub-table of the DDT containing Hamming weight 1 differences. Consider some example S-box with the following 1 − 1 bit DDT (see Table 9.8). Let ∆x and ∆y denote the differential in the input and output of the S-box respectively. It is evident that this S-box has branch number 2.

Table 9.8: 1 − 1 bit DDT Example 1 Table 9.9: 1 − 1 bit DDT Example 2

H H HH ∆y HH ∆y H 10002 01002 00102 00012 H 10002 01002 00102 00012 ∆x HH ∆x HH 10002 2 0 0 0 10002 0 2 2 0

01002 0 0 0 0 01002 0 0 0 0

00102 0 0 0 0 00102 0 0 0 0

00012 0 0 0 0 00012 0 2 2 0

It is trivial to see that there exists a single active bit path which results in a differential characteristic with single active S-boxes in each round. Let the (i) input differences be at the 3-rd bit of S15 . According to the 1 − 1 bit DDT (Table 9.8), there exists a differential transition from ∆10002 to ∆10002. (i) From the group mapping (Table 9.7), the 3-rd output bit of S15 maps to the (i+1) 3-rd input bit of S15 . And then the differential continues from the 3-rd (i+1) (i+2) output bit of S15 to the 3-rd input bit of S15 and so on. Not only that, S if there exists any 1 − 1 bit transition (not necessarily ∆10002 −→ ∆10002), one can verify that there always exists some differential characteristic with single active S-box per round for at least 4 consecutive rounds. To overcome this problem, we propose a new construction paradigm, “Bad Output must go to Good Input” or BOGI in short. We explain this in the context of the differential of an S-box, but the analysis is same for linear case also.

Bad Output must go to Good Input (BOGI)

The existence of the single active bit path is because the bit-permutation allows a 1 − 1 bit transition from some S-box in i-th round to propagate to some S-box in the (i + 1)-st round that again would produce a 1 − 1 bit transition. To overcome this problem, it must be ensured that such a path 9.2. DESIGN RATIONALE 155

does not exist. In the 1 − 1 bit DDT, let us define ∆x = x3x2x1x0 be a good input if the corresponding row has all zero entries, else it is a bad input. Similarly, we define ∆y = y3y2y1y0 to be a good output if the corresponding column has all zero entries, else it is a bad output. In Table 9.8, ∆10002 is both a bad input and a bad output, the other differences are good. Consider another 1−1 bit DDT in Table 9.9. Let GI, GO, BI, BO denote the set of good inputs, good outputs, bad inputs and bad outputs respectively. Then, in Table 9.9, GI = {01002, 00102}, GO = {10002, 00012}, BI = {10002, 00012} and BO = {01002, 00102}. Or, if we represent these binary strings by integers considering the position of the “1” (rightmost position is 0) in these strings, we may rewrite GI = {2, 1}, GO = {3, 0}, BI = {3, 0} and BO = {2, 1}. An output belonging to BO (bad output) could potentially come from a single bit transition through some S-box in this round. Thus we want to map this active output bit to some GI (good input) in the next round, which guarantees that it will not propagate to another 1 − 1 bit transition. As a result, it avoids a single active bit path in 2 consecutive rounds.

BOGI: Let |BO| ≤ |GI| and σ1 : BO → GI be an injective map. To ensure that σ1 is an injective map, it is required that |BO| ≤ |GI| (the cardinality of the set BO must be less than or equal to the cardinality of the set GI). C Let σ2 : GO → σ1(BO) (the complement of σ1(BO)) be another injective map. The map σ1 ensures that “Bad Output must go to Good Input”. A combined map σ : BO ∪ GO → BI ∪ GI is defined as σ(e) = σ1(e) if and only if e ∈ BO, otherwise σ(e) = σ2(e). For example, consider Table 9.9. The injective maps σ1 : {2, 1} → {2, 1} and σ2 : {3, 0} → {3, 0} both have 2 choices which altogether make 4 choices for the combined map σ. An example BOGI mapping would be σ(0) = 0, σ(1) = 1, σ(2) = 2, σ(3) = 3, which happens to be an identity mapping. Any choice of σ may be used to define the bit-permutation. We call these σs differential BOGI permutations as derived from the 1 − 1 bit DDT.

Remark: A similar analysis is performed for the linear case. Analogous to the 1 − 1 bit DDT, our analysis is performed on the basis of the 1 − 1 bit LAT and BOGI permutations are found for the linear case too. We call them linear BOGI permutations. We can now choose any common permutation from the set of both differential and linear BOGI permutations.

BOGI bit-permutation for GIFT

Let σ : {0, 1, 2, 3} → {0, 1, 2, 3} be a common permutation from the set of both differential and linear BOGI permutations. Table 9.10 shows the group mapping. Note that we made some left rotations to the rows of the bit mapping, because we need the inputs to each S-box in the (i + 1)-st round to come from 4 different bit positions. 156 CHAPTER 9. GIFT: A SMALL PRESENT

Table 9.10: BOGI bit-permutation mapping from Qxi to Rxi+1 P PP Rxi+1 PP Si+1 Si+1 Si+1 Si+1 i PP x q+x 2q+x 3q+x Qx PP i S4x (0, σ(0)) (1, σ(1)) (2, σ(2)) (3, σ(3)) i S4x+1 (1, σ(1)) (2, σ(2)) (3, σ(3)) (0, σ(0)) i S4x+2 (2, σ(2)) (3, σ(3)) (0, σ(0)) (1, σ(1)) i S4x+3 (3, σ(3)) (0, σ(0)) (1, σ(1)) (2, σ(2))

In GIFT, we chose an S-box that has a common BOGI permutation that is an identity mapping, that is σ(i) = i. Figure 9.2 illustrates the group mapping from Q0 to R0 in GIFT-64. The same BOGI permutation is applied to all the q group mappings to form the final n-bit bit-permutation for both versions of GIFT.

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

i i i i S3 S2 S1 S0

i+1 i+1 i+1 i+1 S12 S8 S4 S0 51 50 49 48 35 34 33 32 19 18 17 16 3 2 1 0

Figure 9.2: Group mapping from Q0 to R0 in GIFT-64

Some results about our bit-permutation Let Q0,Q1, ··· ,Q(q−1) be q different Quotient groups and R0,R1, ··· ,R(q− 1) be q different Remainder groups. Then, for 0 ≤ x ≤ q − 1, 1. The input bits of an S-box in Rx come from 4 distinct S-boxes in Qx. 2. The output bits of an S-box in Qx go to 4 distinct S-boxes in Rx. 3. The input bits of 4 S-boxes from the same Qx come from 16 different S-boxes. 4. The output bits of 4 S-boxes from the same Rx go to 16 different S-boxes. Lemma 9.1. When the number of S-boxes in a round is 16 or 32, the proposed bit-permutation achieves optimal full diffusion which is achievable by a bit-permutation. 9.2. DESIGN RATIONALE 157

Proof. Considering 4-bit S-boxes and a bit-permutation as P-layer, an input bit to an S-box will influence 4 output bits which then again influence inputs of 4 S-boxes in the next round influencing altogether 16 bits. By using a similar argument, after r rounds, a single bit will influence at most 4r number of bits. For 4r ≥ 64, r must be greater than or equal to 3. And for 4r ≥ 128, r ≥ 4. If the number of S-boxes in each round is 16, by using the proposed bit-permutation, the number of active S-boxes goes from 1 to 16 in three rounds — by using arguments (2) The output bits of an S-box in Qx group go to 4 distinct S-boxes of the Rx group and (4) The output bits of 4 S-boxes from the same Rx group go to 16 different S-boxes. Therefore, a single bit influences 16 S-boxes and thus 64 bits in three rounds which is optimal. It could again be checked that if the number of S-boxes in a round is 32, the proposed bit-permutation needs four rounds to achieve full diffusion, i.e. a single bit influences 128 bits in four rounds which is optimal.  Lemma 9.2. In the proposed bit-permutation, there does not exist any single active bit transition for two consecutive rounds in both differential and linear characteristics. Proof. We prove it by showing for the differential case only. However, a similar argument holds for the linear case. Let the first transition has input ∆x(1) and output ∆y(1) where both ∆x(1) and ∆y(1) have Hamming weight 1. There can be two cases: 1. ∆x(1) ∈ BI. Then ∆y(1) ∈ BO. Because of BOGI, BO → GI, and therefore, ∆x(2) ∈ GI. Thus, the Hamming weight of ∆y(2) will be greater than or equal to 2.

2. ∆x(1) ∈ GI. Then the Hamming weight of ∆y(1) will be greater than or equal to 2, contrary to our assumption.

 Definition 9.3. The differential (resp. linear) score of an S-box is |GI|+|GO| observed from the 1 − 1 bit DDT (resp. LAT). Lemma 9.4. There exists a differential (resp. linear) BOGI permutation for an S-box if and only if the differential (resp. linear) score of an S-box is at least 4.

Proof. As per BOGI, σ1 : BO → GI. Since, σ1 is injective, |BO| ≤ |GI|. Furthermore, |GO| + |BO| = 4. Combining both, we get, |GO| + |BO| = 4 ≤ |GO| + |GI|. Conversely, let |GO| + |GI| ≥ 4. We have |GO| + |BO| = 4, hence |GO| + |GI| ≥ |GO| + |BO| which then implies |GI| ≥ |BO|.  It is essential that our S-box has at least score 4 for both the differential and linear cases, and has a common BOGI permutation. These are two of the main criteria for the selection of the GIFT S-box. 158 CHAPTER 9. GIFT: A SMALL PRESENT

Remark: A BOGI permutation is a group mapping that is independent of the number of groups. Thus, this permutation design is scalable to any bit-permutation size that is a multiple of 16. This allows us to potentially design larger state size such as 256 bits that is useful for designing hash functions.

9.2.3 Selection of the GIFT S-box

We first introduce a metric to estimate the hardware implementation cost of S-boxes. Heuristic S-box implementation. We use a simplified metric to estimate the implementation cost of S-boxes. We denote NOT, NAND and NOR as N-operations and XOR and XNOR as X-operations, and estimate the cost of an N-operation to be 1 unit and X-operations to be 2 units. We consider the following 4 types of instruction for the construction of the S-boxes: a ← NOT (a); a ← a X b; a ← a X (b N c); a ← a X ((b N c) N d)1, where a, b, c, d are distinct bits of an S-box input. These instructions are self-inverse, thus we can implement the inverse of the S-box simply by reversing the order of the sequence of the instructions. In addition, the implementation cost of the inverse S-box would be the same as the direct S-box since the same set of instructions is used. Under this metric, we found that the PRESENT S-box requires 4N + 9X operations, a cost of 22 units. While the RECTANGLE S-box requires 4N + 7X operations, a cost of 18 units. Hence, one of the criteria for our S-box is to have implementation cost less than 18 units2.

Search for the GIFT S-box

Our primary design criteria for the GIFT S-box are:

1. Implementation cost of at most 17 units. 2. With a score of at least 4 in both differential and linear, i.e. for both the differential and linear case, |GO| + |GI| ≥ 4. 3. There exists a common BOGI permutation for both differential and linear.

From the list of 302 AE S-boxes presented in [41], we generate the PE S-boxes and check their implementation cost. The reason being that the 1 − 1 bit differential (resp. linear) transition is same up to some row and column permutation of the DDT (resp. LAT) under the PE class. That is to say that the score of an S-box is preserved under the PE class but not the AE

1We do not consider AND and OR because it is equivalent to some other instructions that have been taken into account. For instance, a XOR (b AND c) ≡ a XNOR (b NAND c). In addition, NAND and NOR are generally cheaper than AND and OR in hardware. 2This “unit” metric is to facilitate the S-box search, the S-boxes are later synthesised to obtain their GE in Section 9.4.1. 9.2. DESIGN RATIONALE 159 class. Our heuristic search shows that there is no optimal S-box (S-box with −2 −2 Dmax = 2 and Lmax = 2 ) that satisfies all 3 criteria, hence we extended −1.415 our search to non-optimal S-boxes. For S-boxes with Dmax = 2 and −2 Lmax = 2 , we found some S-boxes with implementation cost of 16 units. For a cost of 15 units, the best possible S-boxes (in terms of Dmax and Lmax) −0.415 −1.415 that satisfies the criteria have Dmax = 2 and Lmax = 2 . And S- −1 boxes with cost of at most 14 units have either Dmax = 1 or Lmax = 2 . To maximise the resistance against differential and linear attacks while satisfying −1.415 −2 the S-box criteria, we consider S-boxes with Dmax = 2 , Lmax = 2 and implementation cost of 16 units. In order to reduce the occurrence of sub-optimal differential transition, we impose two additional criteria:

4 4 −2 4. ]{(∆I , ∆O) ∈ F2 × F2|DS(∆I , ∆O) > 2 } ≤ 2. −2 5. For DS (∆I , ∆O) > 2 , wt(∆I ) + wt(∆O) ≥ 4, where wt(·) is the Hamming weight.

Criterion (5) ensures that when sub-optimal differential transition occurs, there is a total of at least 4 active S-boxes in the previous and next round. Finally, we pick an S-box with a common BOGI permutation for differen- tial and linear that is an identity, i.e. σ(i) = i.

Properties of GIFT S-box. Our GIFT S-box SG can be implemented with 4N + 6X operations (smaller than the S-boxes in PRESENT and RECTANGLE); −1.415 an implementation is given in Table 9.14. It has a Dmax = 2 and −2 Lmax = 2 , branch number 2, algebraic degree 3 and no fixed point. For the sub-optimal differential transitions with probability 2−1.415, there are only 2 such transitions and the sum of the Hamming weights of the input and output differentials is 4. We provide in Table 9.11 and 9.12 the DDT and LAT of SG.

9.2.4 Design of the GIFT key schedule

Key state update

One of our main goals when designing the key schedule is to minimise the hardware area, and thus we chose a bit-permutation only key schedule which has no hardware area at all. For it to be also software-friendly, we consider the entire key state rotation to be in blocks of 16 bits, and bit rotations within some 16-bit blocks. Since it is redundant to apply bit rotations within key state blocks that have not been introduced to the internal state, we update the key state blocks only after they have been extracted as round keys. To introduce the entire key material into the internal state as fast as possible, the key state blocks that are extracted as the round key are chosen such that all the key materials are introduced into the internal state in the least possible number of rounds. 160 CHAPTER 9. GIFT: A SMALL PRESENT

Table 9.11: DDT of GIFT S-box

∆O 0123456789abcdef 0 16 1 2 2 2 2 2 2 2 2 2 4 4 2 2 2 2 3 2 2 2 2 2 2 2 2 4 2 4 6 2 2 5 2 2 2 2 2 2 4 6 4 6 2 2 2

∆I 7 2 2 2 2 2 4 2 8 4 4 4 4 9 2 2 2 2 2 2 2 2 a 4 4 2 2 2 2 b 2 2 2 2 2 2 2 2 c 4 4 2 2 2 2 d 2 2 4 2 2 2 2 e 4 4 2 2 2 2 f 2 2 4 2 2 2 2

Table 9.12: LAT of GIFT S-box

ΓO 0123456789abcdef 0 8 1 2 -2 -2 2 4 -4 -2 -2 -2 -2 2 -4 4 2 2 2 -2 2 -2 2 2 3 -4 -2 -2 2 -2 2 -2 -2 -2 4 4 -4 -4 4 -4 5 2 -2 2 -2 4 4 2 2 -2 -2 6 -4 4 -2 -2 -2 2 2 -2 -2 -2

ΓI 7 4 2 2 2 -2 2 -2 -2 -2 4 8 -4 -4 -4 4 9 -2 -2 2 2 4 4 2 -2 2 -2 a -4 4 -2 -2 2 -2 -2 -2 2 -2 b 4 2 -2 2 2 -2 2 -2 -2 4 c 4 4 -4 4 d -4 4 -2 2 2 -2 -2 -2 -2 -2 e -4 4 2 -2 2 2 -2 2 2 2 f -4 -2 -2 -2 2 -2 -2 2 -2 4

Adding round keys

To optimise the hardware performance of GIFT, we XOR the key bits to only half of the internal state. This saves a significant amount of hardware area in a round-based implementation. For it to be software-friendly too, we XOR the round key at the same i-th bit positions of each nibble. This makes the bit-slice implementation more efficient. In addition, since all nibbles contain some key material, the entire state will be dependent on the key 9.3. SECURITY ANALYSIS 161 after a SubCells operation. The choice of the positions for adding the round key and 16-bit rotations were chosen to optimise the related-key differential bounds. However, we would like to reiterate that more rounds are advised to resist related-key attacks.

Round constants For the round constants, instead of using a typical decimal counter, we use a 6-bit affine LFSR (like in SKINNY). It requires only a single XNOR gate per update which probably has the smallest possible hardware area for a counter. Each of the 6 bits is XORed to a different nibble to break the symmetry. In addition, we add a 1 at the MSB to further increase the effect.

9.3 Security Analysis

In this section, we describe the cryptanalysis that we had conducted on GIFT.

9.3.1 Differential and linear cryptanalysis

Differential cryptanalysis (DC) and linear cryptanalysis (LC) are among the most powerful techniques available for block ciphers. Analysing the resistance of a cipher against DC and LC is perhaps the most common and fundamental security analysis. One of the ways to gauge the resistance of a cipher is to find the lower bound for the number of active S-boxes involved in a differential or linear characteristic. In our work, we use mixed-integer linear programming (MILP) to compute the lower bounds for the number of active S-boxes in both DC and LC for 1 to 9 rounds, the results are summarised in Table 9.13. The MILP solution provides us the actual differential or linear characteristic, which allows us to compute the actual differential probability and absolute linear approximation from the DDT (Table 9.11) and LAT (Table 9.12) of SG. Recall that one of our main goals is to match the differential bounds of PRESENT, that is having an average of 2 active S-boxes per round, but with a lighter S-box and without the constraint of branch number of 3. In addition, we aim for the same ratio for the linear bound which was not accomplished by PRESENT. These targets were achieved at GIFT reduced to 9 rounds. Hence, our DC and LC discussions focus on 9 rounds.

Differential cryptanalysis

To compute the 9-round differential probability of GIFT, we first find a differential characteristic with the least number of active S-boxes. Next, by fixing the input and output differences, we repeatedly search for the next best possible differential characteristic and we sum up the probabilities. 162 CHAPTER 9. GIFT: A SMALL PRESENT

Table 9.13: Lower bounds for number of active S-boxes

Rounds Cipher DC/LC 1 2 3 4 5 6 7 8 9 DC 1 2 3 5 7 10 13 16 18 GIFT-64 LC 1 2 3 5 7 9 12 15 18 DC 1 2 4 6 10 12 14 16 18 PRESENT LC 1 2 3 4 5 6 7 8 9 DC 1 2 3 4 6 8 11 13 14 RECTANGLE LC 1 2 3 4 6 8 10 12 14 DC 1 2 3 5 7 10 13 17 19 GIFT-128 LC 1 2 3 5 7 9 12 14 18

The search terminates when the subsequent differential characteristic has an insignificant contribution in improving the differential probability further. GIFT-64 has a 9-round differential with probability 2−44.415; taking the average per round and propagate forward, we expect that the differential probability will be lower than 2−63 when we have 14 rounds3. Therefore, we believe 28-round GIFT-64 is sufficient to resist DC. Using the same methodology, we found that PRESENT with a 9-round differential probability of 2−40.702 is expected to require 14 rounds. While RECTANGLE with a 9-round differential probability of 2−38.704 is expected to require 15 rounds. Note that our estimation matches the belief of the RECTANGLE designers that it is impossible to construct an efficient 15-round differential distinguisher for RECTANGLE [132]. GIFT-128 has a 9-round differential with probability 2−46.99, which suggests that 26 rounds are sufficient to achieve a differential probability lower than 2−127. Therefore, we believe 40-round GIFT-128 is sufficient to resist DC. For both GIFT versions, it is interesting to note that in most cases, the first optimal differential characteristic found is the only one with the least number of active S-boxes; subsequent differential characteristics under the same input and output differences have significantly more active S-boxes than the initial characteristic. Thus the differential probability is close to the probability of the optimal differential characteristic. Unlike PRESENT, which due to its symmetry structure, tend to have several optimal differential characteristics for some fixed input and output differences.

3Mathematically, we need 13 rounds to achieve a differential with probability lower than 2−63. However, since there is no whitening key at the beginning, the differential probability actually starts from round 2. Hence we added an additional round to the estimation. 9.3. SECURITY ANALYSIS 163

Linear cryptanalysis

Similar to differential, we first find an optimal linear characteristic, fix the input and output linear mask to find the next best possible linear characteristics and take the summation. GIFT-64 has a 9-round linear hull bias of 2−25.999. Using the piling-up lemma (Lemma 2.28), it is expected to require 13 rounds4 to achieve a linear bias lower than 2−32. Therefore, we believe 28-round GIFT-64 is sufficient to resist LC. Using the same methodology, we found that PRESENT with a 9-round linear hull bias of 2−14.593 is expected to require 22 rounds. While RECTANGLE with a 9-round linear hull bias of 2−19.287 is expected to require 17 rounds. GIFT-128 has a 9-round linear hull bias of 2−23.995, which means that we would need around 27 rounds to achieve a linear bias lower than 2−64. Therefore, we believe 40-round GIFT-128 is sufficient to resist LC. As in the differential case, the first optimal linear characteristic found is the only one with the least number of active S-boxes; subsequent linear char- acteristics under the same input and output linear masks have significantly more active S-boxes than the initial characteristic. Thus it has negligible linear hull effect, which is not the case for PRESENT [93].

Related-key differential cryptanalysis

For GIFT-64, it takes 4 rounds for all the key material to be introduced into the internal state; hence it is trivial to see that it is possible to have no active S-boxes from round 1 to round 4. Thus we start our compu- tation on the related-key differential bounds from round 5 onwards. For 5-round to 12-round, the probability of these differential characteristics is 2−1.415, 2−5, 2−6.415, 2−10, 2−16, 2−22, 2−27, 2−33 respectively. Even if we sup- pose that the probability of a 12-round characteristic is lower bounded by 2−33, it is doubtful that 28 rounds are secure against related-key differential cryptanalysis. Therefore, we strongly recommend to increase the number of rounds to achieve security against related-key attacks. For GIFT-128, we start our computation from round 3 onwards. For 3-round to 9-round, the probabilities are 2−1.415, 2−5, 2−7, 2−11, 2−20, 2−25, 2−31 respectively. Similar to GIFT-64, it is doubtful that 40 rounds are secure against related-key differential cryptanalysis.

9.3.2 Impossible differential cryptanalysis

Given that the GIFT-64 achieves full diffusion only after 3 rounds, there does not exist any 6-round truncated impossible differentials. We then an implemented impossible differential search tool based on MILP [37, 108], to take into account the differential distribution through the S-box. We

4Similar to the differential, an additional round is added to the estimation. 164 CHAPTER 9. GIFT: A SMALL PRESENT exhaustively tested input and output differences satisfying the following conditions:

• The input difference activates only one of the first four S-boxes. • The output difference activates only one S-box.

For the first condition, there are 4 × 15 = 60 such input differences. For the second condition, there are 16 × 15 = 240 such output differences. Hence, we tested 60 × 240 = 14, 400 pairs of input and output differences. The search results show that 11,904 choices out of 14,400 choices are actually impossible. Hence, with a high probability, a pair of input and output differences with 1 active nibble is impossible after 6 rounds. We further extend this search procedure to 7 rounds, and obtained that there does not exist any impossible differential from the 14,400 pairs for 7 rounds. The number of rounds of impossible differentials is much smaller than the full-round cipher, thus we omit the detailed analysis of the key-recovery part. Full-round GIFT offers sufficient resistance against an impossible differential attack.

9.3.3 Invariant subspace attacks

Since the round constant is XORed only in the MSB of several S-boxes, invariant subspace attacks [49,75,76] can be a potential threat. The attack utilises a linear subspace A and a constant u which is invariant for the round transformation. Its generalised version utilises the property that the subspace u ⊕ A is mapped to another subspace v ⊕ A0 after the round transformation. Then if round keys and constants belong to the subspace u ⊕ v ⊕ (A ∩ A0), the state value always stays in the subspace A ∩ A0 thus the cipher can be distinguished only with a single query. We searched for the subspace transition through the GIFT S-box5. There does not exist any subspace transition for subspaces A with dimension 2 or 3. For dimension 1, there are five transitions shown below:

S a ⊕ {0, 5} −→G b ⊕ {0, 5} , k ∈ 1 ⊕ {0, 5} , S 0 ⊕ {0, a} −→G 1 ⊕ {0, a} , k ∈ 1 ⊕ {0, a} , S 2 ⊕ {0, c} −→G 4 ⊕ {0, c} , k ∈ 6 ⊕ {0, c} , S 5 ⊕ {0, d} −→G 2 ⊕ {0, d} , k ∈ 7 ⊕ {0, d} , S 0 ⊕ {0, f} −→G 1 ⊕ {0, f} , k ∈ 1 ⊕ {0, f} , where k is a 4-bit nibble of the key. In any case, XORing the constant to MSB, i.e. XORing 0x8 to some nibble, breaks the invariant subspace, thus GIFT resists the invariant subspace attacks.

5For simplicity, we consider the case where A = A0. 9.4. PERFORMANCE AND COMPARISON 165

9.3.4 Algebraic attacks

We show that algebraic attacks do not threaten GIFT. The S-box SG has algebraic degree 3, and from Table 9.13 we see that in a 9-round differential characteristic of GIFT, there are at least 18 active S-boxes. One can easily check that 3 · 18 · br/9c > n, where r is the number of rounds for GIFT-n. Moreover, SG is described by 21 quadratic equations in the 8 input/output variables over the binary field. The entire system for a fixed-key GIFT permutation therefore consists of 16 · r · 21 quadratic equations in 16 · r · 8 variables. For example, in the case of GIFT-64, there are 9408 quadratic equations in 3584 variables. In comparison, the entire system for a fixed-key AES permutation consists of 6400 equations in 2560 variables [50]. While the applicability of algebraic attacks on AES remains unclear, those numbers tend to indicate that GIFT offers a high level of protection. Other cryptanalysis. We also conducted security analysis on GIFT with meet-in-the-middle, improved integral (using division property) and non- linear invariant attacks and showed that they do not threaten the security of GIFT. Details of these attacks can be found in [7].

9.4 Performance and Comparison

9.4.1 Hardware implementations

GIFT is surprisingly efficient on ASIC platforms across various degrees of serialisation. This is mainly due to the extremely lightweight round function that performs key addition on only half of the state and uses a bit-permutation as the only diffusion mechanism.

Round-based implementation

GIFT includes various design strategies in order to minimise gate count. GIFT employs key addition to only half of the internal state and saves silicon area in the process. Additionally, like PRESENT, the GIFT P-layer consists of a bit-permutation instead of a lightweight MDS or almost-MDS matrix which decreases the area further. However, the PRESENT S-box occupies around 22.5 GE when synthesised with the standard cell library of the STM 90 nm logic process. The GIFT S-box occupies only 16.5 GE using the same library, see Table 9.14 for a hardware-optimised implementation. Furthermore the key schedule function used in GIFT is also a bit- permutation, and so this module is constructed by a simple wire shuffle and takes no area at all. In Table 9.15, we compare the hardware performance of GIFT with other lightweight ciphers. The advantages of GIFT over PRESENT and RECTANGLE are the simple key schedule (no logic gates required), half the amount of XORs required for 166 CHAPTER 9. GIFT: A SMALL PRESENT

Table 9.14: Area-optimised hardware implementation of the GIFT S-box.

/* Input: (MSB) x[3], x[2], x[1], x[0] (LSB) */ x[1] = x[1] XNOR (x[0] NAND x[2]); x[0] = x[0] XNOR (x[1] NAND x[3]); x[2] = x[2] XNOR (x[0] NOR x[1]); x[3] = x[3] XNOR x[2]; x[1] = x[1] XNOR x[3]; x[2] = x[2] XNOR (x[0] NAND x[1]);

/* Output: (MSB) x[0], x[2], x[1], x[3] (LSB) */

Table 9.15: Comparison of performance metrics for round-based imple- mentations synthesised with STM 90 nm Standard cell library (∗Piccolo implemented in dynamic key mode)

Area Delay Cycles TPMAX Power (µW) Energy (GE) (ns) (MBit/s) (@10MHz) (pJ) GIFT-64 1345 1.83 29 1249.0 74.8 216.9 PRESENT-128 1560 1.63 33 1227.0 71.1 234.6 SIMON-64-128 1458 1.83 45 794.8 72.7 327.3 Midori64 1542 2.06 17 1941.7 60.6 103.0 Piccolo-128 ∗ 1868 2.32 32 889.9 79.4 254.1 RECTANGLE-128 1637 1.61 27 1472.2 76.2 206.0 LED-128 1831 5.25 50 243.8 131.3 656.5 GIFT-128 1997 1.85 41 1729.7 116.6 478.1 SIMON-128-128 2064 1.87 69 1006.6 105.6 728.6 Midori128 2522 2.25 21 2844.4 89.2 187.3 AES-128 7215 3.83 11 3038.2 730.3 803.3

subkey addition, and using a smaller S-box. In comparison to SIMON, the round function of SIMON also employs subkey addition to only half of the internal state. Although the SIMON non-linear layer is smaller than that of GIFT, it has a much heavier key-schedule function and thus its total area exceeds that of GIFT for both the 64-bit and 128-bit versions. Lastly, one of the main factors that make GIFT lighter than other lightweight ciphers such as Midori, LED and Piccolo is the omission of the costly diffusion layer.

Serial implementation

The serial implementation of GIFT-64 uses a mixed datapath of size 4 bits on the state side and 16 bits on the key side. On the other hand, GIFT-128 uses a mixture of 4 bits datapath in the state side and a 32 bits datapath on the key side. We also implemented bit serial versions of GIFT as per the techniques outlined in [56]. In Table 9.16, we list the performance 9.4. PERFORMANCE AND COMPARISON 167 comparisons of GIFT with other block ciphers.

Table 9.16: Comparison of performance metrics for serial implementations synthesised with STM 90 nm Standard cell library (∗ AES-128 implementa- tion figures from [6])

Degree of Area Delay Cycles TPMAX Power (µW) Energy Serialisation (GE) (ns) (MBit/s) (@10MHz) (nJ) GIFT-64 4/16 1113 2.14 522 57.3 39.0 2.04 GIFT-64 1 930 2.67 2816 8.5 35.9 10.11 PRESENT-128 4 1158 1.94 576 57.3 58.0 3.34 SIMON-64-128 1 794 1.10 1536 37.9 44.7 6.87 LED-128 4 1225 2.54 1904 13.2 49.8 9.48 GIFT-128 4/32 1455 2.25 714 79.7 61.7 4.40 GIFT-128 1 1213 2.46 6528 8.0 40.3 26.30 SIMON-128-128 1 1077 1.17 4480 25.1 60.5 27.10 AES-128 ∗ 8 2060 5.79 246 88.6 129.7 3.19

While the bit serial implementation of SIMON is probably the most compact due to the nature of the design, but the performance of GIFT is comparable or better with other ciphers with similar level of serialisation.

9.4.2 Software implementations

Due to its inherent bit-slice structure, it seems natural to consider that the most efficient software implementations of GIFT will be bit-slice implemen- tations.

Table 9.17: Bit-slice software implementations of GIFT and other lightweight block ciphers. Performance numbers are given in cycles per byte, with messages composed of 2000 64-bit blocks to obtain the results.

Speed Speed Ref. Ref. (c/B) (c/B)

GIFT-64 2.10 new GIFT-128 2.57 new SIMON-64-128 1.74 [129] SIMON-128-128 2.55 [129]

Benchmarks. We have produced this bit-slice implementation for AVX2 registers and we give in Table 9.17 the benchmarking results on an Intel Haswell processor (i5-4460U). We have benchmarked the bit-slice implemen- tations of SIMON and SKINNY (available online) on the same computer for fairness. Comments. Bit-slice implementations can be used for any parallel mode (as it is the case for most modern operating modes), but can also be used 168 CHAPTER 9. GIFT: A SMALL PRESENT for serial modes when several users are communicating in parallel. In this setting, the implementation would be exactly the same, as our key preparation does not assume that the keys have to be the same for all blocks. In the scenario of a serial mode for a single user, a classical table-based or VPERM implementation will probably be the most efficient option [20].

9.5 Conclusion

By designing the bit permutation in conjunction with the properties of an S-box, we can drop the huge constraint on the choice of the S-box and select a much lighter one. We end up with a very natural and clean cipher, with a simple round function and key schedule (composed of only a bit permutation, thus essentially free in hardware). Since any weaker choice of the S-box would lead to a very insecure design, we argue that GIFT is probably very close to reaching the area limit of lightweight encryption. Further Discussion

169

Chapter 10

Conclusion

10.1 Summary

In this thesis, we conducted cryptanalysis on lightweight primitive proposals, searched for new efficient and secure building blocks, and designed two new BULB ciphers. The results are summarised as follows: In Chapter 3 we conducted cryptanalysis on JAMBU in both the nonce- misuse and nonce-respecting scenarios. We showed that, in contrast to the designers’ claim, we are able to break the confidentiality of JAMBU even when the messages do not have the same prefix. In Chapter 4 we performed invariant subspace attacks on Midori. We demonstrated that Midori64 has a class of 232 weak keys that can be distinguished with a single chosen-plaintext query. This can further lead to a key-recovery attack with just two queries. In Chapter 5 we proved the existence of compact equivalence classes of Hadamard and cyclic matrices. We leveraged these to complete our exhaustive searches (some were previously infeasible) for MDS diffusion matrices and we found a wide range of new efficient involutory and non-involutory MDS diffusion matrices suitable for lightweight primitives. In Chapter 6 we analysed the security properties of S-boxes. We studied the relationship between the affine subspace and the DDT of an S-box. We also showed that non-involutory S-boxes can achieve a sufficiently high level of resistance against invariant subspace attacks. After conducting cryptanalysis on symmetric-key primitives and studying the building blocks of SPN ciphers, in Chapter 7 we discussed the challenges and design strategies for designing lightweight block ciphers and set the stage for our two state-of-the-art beyond ultra-lightweight block (BULB) ciphers. In Chapter 8 we proposed a new tweakable BULB cipher family SKINNY with high security guarantees under both single-key and related-key mod- els. In terms of performance, SKINNY is comparable with SIMON, making SKINNY one of its main competitors. In Chapter 9 we revisited the ultra-lightweight block cipher PRESENT and designed a new BULB cipher GIFT. GIFT improves over its precursor

171 172 CHAPTER 10. CONCLUSION in several aspects, including performance and resistance against linear crypt- analysis. With a very little surface remaining to potentially optimise, we believe that GIFT is reaching towards the limit of lightweight encryption.

10.2 Other Works

Beside the works presented in this thesis, I have been active in other research works. Here are some of the ongoing and published works that are not included in this thesis:

• Studied recursive MDS matrices like serial matrices and performed search on lightweight serial-type matrices that are suitable for serial implementations. We proposed a new type of serial-type matrix, called Diagonal-Serial Invertible (DSI) matrices, with great performance and security properties. We also showed for diffusion matrices of order 4 that our matrix achieves the least number of connecting XORs1. This work was published in [122].

• Presented the first pen-and-paper proof on the minimum number of active S-boxes for AES-128 under the related-key model. Our proven bounds match those found by computer-aided tools and our analysis also gave us an insight on how to design lightweight key schedules. This work was published in [67].

• Investigated the behaviour and distribution of the XOR count of finite field elements under different basis. The empirical evidence from our investigation suggested that the recommended choice of basis for constructing lightweight MDS diffusion matrices is the conventional polynomial basis. This work was published in [106].

• Developed an automated search tool called LIGHTER to search for area optimised implementations of lightweight building blocks. For small functions like 4-bit S-boxes, our tool outperformed the state-of-the-art synthesis tool ABC almost all the time. This work was published in [60].

• Conducted analysis on SCREAM and iSCREAM, a submission to the CAESAR competition, and found a practical forgery attack on them. This attack motivated the designers to patch their designs and release an updated version SCREAMv2 and iSCREAMv2. This result was posted on [116].

10.3 Future Research

To highlight some potential future research directions:

1XOR operations computing the components of the output vector, see Example 5.27. 10.3. FUTURE RESEARCH 173

1. On the study of the diffusion layer, we can look at other matrix types, for instance the Toeplitz or Hadamard-like matrices. To investigate equivalence classes of those matrices in a hope to complete the search for higher order matrices.

2. On the same topic, we can also investigate lightweight almost-MDS matrices especially for diffusion matrices of higher order. For diffusion matrix of order 4, there exists a simple branch number 4 binary matrix, for instance the one used in Midori. It will be interesting to investigate if there exists a binary matrix of order m with branch number m (or close to m) for large m.

3. While using MILP to analyse our BULB ciphers, we developed new techniques to improve and shorten its runtime. These techniques could be generalised and used for analysing other primitives.

4. Conduct analysis on Feistel-network ciphers, this could involve tech- niques that are very different from the SPN-based ciphers. For instance, we can analyse the security of SIMON under the related-key model. 5. Investigate the efficiency and security of F-functions in Feistel-networks. Although one may perceive an F-function as a big S-box, the method to analyse an 8-bit S-box may not be suitable for a 32-bit F-function due to its large parameters. Thus, we need to adopt different methods to analysis F-functions. 174 CHAPTER 10. CONCLUSION Bibliography

[1] Abdelkhalek, A., Sasaki, Y., Todo, Y., Tolba, M., Youssef, A.: MILP Modeling for (Large) S-boxes to Optimize Probability of Differen- tial Characteristics. IACR Transactions on Symmetric Cryptology 2017(4), 99–129 (2017), https://tosc.iacr.org/index.php/ ToSC/article/view/805

[2] Anderson, R.J., Biham, E.: Two Practical and Provably Secure Block Ciphers: BEARS and LION. In: Fast Software Encryption, Third International Workshop, Cambridge, UK, February 21-23, 1996, Proceedings. pp. 113–120 (1996), https://doi.org/10.1007/ 3-540-60865-6_48

[3] Andreeva, E., Bogdanov, A., Luykx, A., Mennink, B., Tischhauser, E., Yasuda, K.: Parallelizable and Authenticated Online Ciphers. In: Advances in Cryptology - ASIACRYPT 2013 - 19th International Conference on the Theory and Application of Cryptology and In- formation Security, Bengaluru, India, December 1-5, 2013, Proceed- ings, Part I. pp. 424–443 (2013), http://dx.doi.org/10.1007/ 978-3-642-42033-7_22

[4] Ankele, R., Banik, S., Chakraborti, A., List, E., Mendel, F., Sim, S.M., Wang, G.: Related-Key Impossible-Differential Attack on Reduced- Round Skinny. In: Applied Cryptography and Network Security - 15th International Conference, ACNS 2017, Kanazawa, Japan, July 10-12, 2017, Proceedings. pp. 208–228 (2017), https://doi.org/ 10.1007/978-3-319-61204-1_11

[5] Banik, S., Bogdanov, A., Isobe, T., Shibutani, K., Hiwatari, H., Akishita, T., Regazzoni, F.: Midori: A Block Cipher for Low En- ergy. In: Advances in Cryptology - ASIACRYPT 2015 - 21st In- ternational Conference on the Theory and Application of Cryptol- ogy and Information Security, Auckland, New Zealand, November 29 - December 3, 2015, Proceedings, Part II. pp. 411–436 (2015), http://dx.doi.org/10.1007/978-3-662-48800-3_17

[6] Banik, S., Bogdanov, A., Regazzoni, F.: Atomic-AES v 2.0. Cryptology ePrint Archive, Report 2016/1005 (2016)

175 176 BIBLIOGRAPHY

[7] Banik, S., Pandey, S.K., Peyrin, T., Sasaki, Y., Sim, S.M., Todo, Y.: GIFT: A Small Present - Towards Reaching the Limit of Lightweight Encryption. In: Cryptographic Hardware and Embedded Systems - CHES 2017 - 19th International Conference, Taipei, Taiwan, September 25-28, 2017, Proceedings. pp. 321–345 (2017), https://doi.org/ 10.1007/978-3-319-66787-4_16 [8] Barreto, P., Rijmen, V.: The Anubis Block Cipher. Submis- sion to the NESSIE Project (2000), http://www.larc.usp.br/ ~pbarreto/AnubisPage.html [9] Barreto, P., Rijmen, V.: The Khazad Legacy-Level Block Cipher. First Open NESSIE Workshop (2000), www.larc.usp.br/~pbarreto/ KhazadPage.html [10] Barreto, P.S.L.M., Nikov, V., Nikova, S., Rijmen, V., Tischhauser, E.: Whirlwind: a new cryptographic hash function. Des. Codes Cryptogra- phy 56(2-3), 141–162 (2010) [11] Barreto, P.S.L.M., Rijmen, V.: Whirlpool. In: Encyclopedia of Cryp- tography and Security (2nd Ed.), pp. 1384–1385 (2011) [12] Beaulieu, R., Shors, D., Smith, J., Treatman-Clark, S., Weeks, B., Wingers, L.: The SIMON and SPECK Families of Lightweight Block Ciphers. IACR Cryptology ePrint Archive 2013, 404 (2013), http: //eprint.iacr.org/2013/404 [13] Beaulieu, R., Shors, D., Smith, J., Treatman-Clark, S., Weeks, B., Wingers, L.: SIMON and SPECK: Block Ciphers for the Internet of Things. IACR Cryptology ePrint Archive 2015, 585 (2015), http: //eprint.iacr.org/2015/585 [14] Beierle, C., Jean, J., Kölbl, S., Leander, G., Moradi, A., Peyrin, T., Sasaki, Y., Sasdrich, P., Sim, S.M.: The SKINNY Family of Block Ciphers and Its Low-Latency Variant MANTIS. In: Advances in Cryptology - CRYPTO 2016 - 36th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 14-18, 2016, Proceed- ings, Part II. pp. 123–153 (2016), http://dx.doi.org/10.1007/ 978-3-662-53008-5_5 [15] Beierle, C., Kranz, T., Leander, G.: Lightweight Multiplication in GF(2ˆn) with Applications to MDS Matrices. In: Advances in Cryp- tology - CRYPTO 2016 - 36th Annual International Cryptology Con- ference, Santa Barbara, CA, USA, August 14-18, 2016, Proceed- ings, Part I. pp. 625–653 (2016), https://doi.org/10.1007/ 978-3-662-53018-4_23 [16] Bellare, M., Desai, A., Jokipii, E., Rogaway, P.: A Concrete Secu- rity Treatment of Symmetric Encryption. In: 38th Annual Sympo- sium on Foundations of Computer Science, FOCS ’97, Miami Beach, BIBLIOGRAPHY 177

Florida, USA, October 19-22, 1997. pp. 394–403 (1997), http://doi. ieeecomputersociety.org/10.1109/SFCS.1997.646128 [17] Bellare, M., Namprempre, C.: Authenticated Encryption: Relations among notions and analysis of the generic composition paradigm. IACR Cryptology ePrint Archive 2000, 25 (2000), http://eprint.iacr. org/2000/025 [18] Bellare, M., Namprempre, C.: Authenticated Encryption: Relations among Notions and Analysis of the Generic Composition Paradigm. In: Advances in Cryptology - ASIACRYPT 2000, 6th International Con- ference on the Theory and Application of Cryptology and Information Security, Kyoto, Japan, December 3-7, 2000, Proceedings. pp. 531–545 (2000), http://dx.doi.org/10.1007/3-540-44448-3_41 [19] Bellare, M., Namprempre, C.: Authenticated Encryption: Relations among Notions and Analysis of the Generic Composition Paradigm. J. Cryptology 21(4), 469–491 (2008), http://dx.doi.org/10. 1007/s00145-008-9026-x [20] Benadjila, R., Guo, J., Lomné, V., Peyrin, T.: Implementing Lightweight Block Ciphers on x86 Architectures. In: Selected Areas in Cryptography - SAC 2013 - 20th International Conference, Burnaby, BC, Canada, August 14-16, 2013, Revised Selected Papers. pp. 324–351 (2013), https://doi.org/10.1007/978-3-662-43414-7_17 [21] Bertoni, G., Daemen, J., Peeters, M., Assche, G.V.: Keccak. In: Ad- vances in Cryptology - EUROCRYPT 2013, 32nd Annual International Conference on the Theory and Applications of Cryptographic Tech- niques, Athens, Greece, May 26-30, 2013. Proceedings. pp. 313–314 (2013), https://doi.org/10.1007/978-3-642-38348-9_19 [22] Bhargavan, K., Leurent, G.: On the Practical (In-)Security of 64-bit Block Ciphers: Collision Attacks on HTTP over TLS and OpenVPN. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, October 24-28, 2016. pp. 456–467 (2016), http://doi.acm.org/10.1145/2976749. 2978423 [23] Biham, E., Biryukov, A., Shamir, A.: Cryptanalysis of Re- duced to 31 Rounds Using Impossible Differentials. In: EUROCRYPT 1999. Lecture Notes in Computer Science, vol. 1592, pp. 12–23. Springer (1999)

[24] Biham, E., Shamir, A.: Differential Cryptanalysis of DES-like Cryp- tosystems. In: Advances in Cryptology - CRYPTO ’90, 10th An- nual International Cryptology Conference, Santa Barbara, Califor- nia, USA, August 11-15, 1990, Proceedings. pp. 2–21 (1990), https: //doi.org/10.1007/3-540-38424-3_1 178 BIBLIOGRAPHY

[25] Biham, E., Shamir, A.: Differential Cryptanalysis of the Full 16- Round DES. In: Advances in Cryptology - CRYPTO ’92, 12th Annual International Cryptology Conference, Santa Barbara, California, USA, August 16-20, 1992, Proceedings. pp. 487–496 (1992), https://doi. org/10.1007/3-540-48071-4_34

[26] Biryukov, A., Cannière, C.D., Braeken, A., Preneel, B.: A Toolbox for Cryptanalysis: Linear and Affine Equivalence Algorithms. In: Advances in Cryptology - EUROCRYPT 2003, International Conference on the Theory and Applications of Cryptographic Techniques, Warsaw, Poland, May 4-8, 2003, Proceedings. pp. 33–50 (2003), https://doi.org/ 10.1007/3-540-39200-9_3

[27] Biryukov, A., Wagner, D.A.: Slide Attacks. In: Fast Software En- cryption, 6th International Workshop, FSE ’99, Rome, Italy, March 24-26, 1999, Proceedings. pp. 245–259 (1999), https://doi.org/ 10.1007/3-540-48519-8_18

[28] Bogdanov, A., Knudsen, L.R., Leander, G., Paar, C., Poschmann, A., Robshaw, M.J.B., Seurin, Y., Vikkelsoe, C.: PRESENT: An Ultra- Lightweight Block Cipher. In: Cryptographic Hardware and Embedded Systems - CHES 2007, 9th International Workshop, Vienna, Austria, September 10-13, 2007, Proceedings. pp. 450–466 (2007), http://dx. doi.org/10.1007/978-3-540-74735-2_31

[29] Bogdanov, A., Rechberger, C.: A 3-Subset Meet-in-the-Middle Attack: Cryptanalysis of the Lightweight Block Cipher KTANTAN. In: SAC 2010. Lecture Notes in Computer Science, vol. 6544, pp. 229–240. Springer (2010)

[30] Bogdanov, A., Tischhauser, E., Vejre, P.S.: Multivariate Profiling of Hulls for Linear Cryptanalysis. Cryptology ePrint Archive, Report 2016/667 (2016), http://eprint.iacr.org/2016/667

[31] Borghoff, J., Canteaut, A., Güneysu, T., Kavun, E.B., Knezevic, M., Knudsen, L.R., Leander, G., Nikov, V., Paar, C., Rechberger, C., Rom- bouts, P., Thomsen, S.S., Yalçin, T.: PRINCE - A Low-Latency Block Cipher for Pervasive Computing Applications - Extended Abstract. In: Advances in Cryptology - ASIACRYPT 2012 - 18th International Con- ference on the Theory and Application of Cryptology and Information Security, Beijing, China, December 2-6, 2012. Proceedings. pp. 208–225 (2012), http://dx.doi.org/10.1007/978-3-642-34961-4_ 14

[32] Buchberger, B.: An Algorithm for Finding the Basis Elements in the Residue Class Ring Modulo a Zero Dimensional Polynomial Ideal. Ph.D. thesis BIBLIOGRAPHY 179

[33] CAESAR: Competition for Authenticated Encryption: Security, Appli- cability, and Robustness (2014), https://competitions.cr.yp. to/caesar.html [34] Canright, D.: A Very Compact S-Box for AES. In: Cryptographic Hardware and Embedded Systems - CHES 2005, 7th International Workshop, Edinburgh, UK, August 29 - September 1, 2005, Proceedings. pp. 441–455 (2005), http://dx.doi.org/10.1007/11545262_ 32 [35] Canteaut, A., Duval, S., Leurent, G.: Construction of Lightweight S-Boxes Using Feistel and MISTY Structures. In: Selected Areas in Cryptography - SAC 2015 - 22nd International Conference, Sackville, NB, Canada, August 12-14, 2015, Revised Selected Papers. pp. 373–393 (2015), https://doi.org/10.1007/978-3-319-31301-6_22 [36] Chaum, D., Evertse, J.: Crytanalysis of DES with a Reduced Num- ber of Rounds: Sequences of Linear Factors in Block Ciphers. In: CRYPTO 1985. Lecture Notes in Computer Science, vol. 218, pp. 192–211. Springer (1985) [37] Cui, T., Jia, K., Fu, K., Chen, S., Wang, M.: New Automatic Search Tool for Impossible Differentials and Zero-Correlation Linear Approximations. IACR Cryptology ePrint Archive 2016, 689 (2016), http://eprint.iacr.org/2016/689 [38] Daemen, J., Knudsen, L.R., Rijmen, V.: The Block Cipher Square. In: Fast Software Encryption, 4th International Workshop, FSE ’97, Haifa, Israel, January 20-22, 1997, Proceedings. pp. 149–165 (1997), https://doi.org/10.1007/BFb0052343 [39] Daemen, J., Rijmen, V.: The Design of Rijndael: AES - The Advanced Encryption Standard. Information Security and Cryptography, Springer (2002), http://dx.doi.org/10.1007/978-3-662-04722-4 [40] Daemen, J., Rijmen, V.: Understanding Two-Round Differentials in AES. In: Security and Cryptography for Networks, 5th International Conference, SCN 2006, Maiori, Italy, September 6-8, 2006, Proceedings. pp. 78–94 (2006), https://doi.org/10.1007/11832072_6 [41] De Cannière, C.: Analysis and Design of Symmetric Encryption Al- gorithms. Ph.D. thesis, Katholieke Universiteit Leuven (2007), bart Preneel (promotor) [42] Des: Data Encryption Standard. In: In FIPS PUB 46, Federal Infor- mation Processing Standards Publication. pp. 46–2 (1977) [43] Diffie, W., Hellman, M.E.: Special Feature Exhaustive Cryptanalysis of the NBS Data Encryption Standard. IEEE Computer 10(6), 74–84 (1977), https://doi.org/10.1109/C-M.1977.217750 180 BIBLIOGRAPHY

[44] Dinur, I., Shamir, A.: Cube Attacks on Tweakable Black Box Poly- nomials. In: Advances in Cryptology - EUROCRYPT 2009, 28th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Cologne, Germany, April 26-30, 2009. Proceedings. pp. 278–299 (2009), https://doi.org/10.1007/ 978-3-642-01001-9_16 [45] Fleischmann, E., Forler, C., Lucks, S.: McOE: A Family of Almost Fool- proof On-Line Authenticated Encryption Schemes. In: Fast Software Encryption - 19th International Workshop, FSE 2012, Washington, DC, USA, March 19-21, 2012. Revised Selected Papers. pp. 196–215 (2012), http://dx.doi.org/10.1007/978-3-642-34047-5_12 [46] Gauravaram, P., Knudsen, L.R., Matusiewicz, K., Mendel, F., Rech- berger, C., Schläffer, M., Thomsen, S.S.: Grøstl – a SHA-3 candidate (Updated version). Submitted to the SHA-3 competition (2011) [47] Grassi, L., Rechberger, C., Rønjom, S.: Subspace Trail Cryptanal- ysis and its Applications to AES. IACR Trans. Symmetric Cryptol. 2016(2), 192–225 (2016), http://tosc.iacr.org/index.php/ ToSC/article/view/571 [48] Grosso, V., Leurent, G., Standaert, F., Varici, K.: LS-Designs: Bitslice Encryption for Efficient Masked Software Implementations. In: Fast Software Encryption - 21st International Workshop, FSE 2014, London, UK, March 3-5, 2014. Revised Selected Papers. pp. 18–37 (2014), https://doi.org/10.1007/978-3-662-46706-0_2 [49] Guo, J., Jean, J., Nikolic, I., Qiao, K., Sasaki, Y., Sim, S.M.: Invari- ant Subspace Attack Against Midori64 and The Resistance Criteria for S-box Designs. IACR Trans. Symmetric Cryptol. 2016(1), 33–56 (2016), http://tosc.iacr.org/index.php/ToSC/article/ view/534 [50] Guo, J., Peyrin, T., Poschmann, A.: The PHOTON Family of Lightweight Hash Functions. In: Advances in Cryptology - CRYPTO 2011 - 31st Annual Cryptology Conference, Santa Barbara, CA, USA, August 14-18, 2011. Proceedings. pp. 222–239 (2011), https: //doi.org/10.1007/978-3-642-22792-9_13 [51] Guo, J., Peyrin, T., Poschmann, A., Robshaw, M.J.B.: The LED Block Cipher. In: Cryptographic Hardware and Embedded Systems - CHES 2011 - 13th International Workshop, Nara, Japan, September 28 - October 1, 2011. Proceedings. pp. 326–341 (2011), http://dx. doi.org/10.1007/978-3-642-23951-9_22 [52] Gupta, K.C., Ray, I.G.: On Constructions of Involutory MDS Matri- ces. In: Progress in Cryptology - AFRICACRYPT 2013, 6th Interna- tional Conference on Cryptology in Africa, Cairo, Egypt, June 22-24, BIBLIOGRAPHY 181

2013. Proceedings. pp. 43–60 (2013), https://doi.org/10.1007/ 978-3-642-38553-7_3

[53] Gupta, K.C., Ray, I.G.: On Constructions of Circulant MDS Matrices for Lightweight Cryptography. In: ISPEC. pp. 564–576 (2014)

[54] Gupta, K.C., Ray, I.G.: Cryptographically significant MDS matrices based on circulant and circulant-like matrices for lightweight appli- cations. Cryptography and Communications 7(2), 257–287 (2015), http://dx.doi.org/10.1007/s12095-014-0116-3

[55] Hoang, V.T., Rogaway, P.: On Generalized Feistel Networks. In: Advances in Cryptology - CRYPTO 2010, 30th Annual Cryptol- ogy Conference, Santa Barbara, CA, USA, August 15-19, 2010. Proceedings. pp. 613–630 (2010), https://doi.org/10.1007/ 978-3-642-14623-7_33

[56] Jean, J., Moradi, A., Peyrin, T., Sasdrich, P.: Bit-Sliding: A Generic Technique for Bit-Serial Implementations of SPN-based Primitives – Applications to AES, PRESENT and SKINNY. Cryptology ePrint Archive, Report 2017/600, to appear in Cryptographic Hardware and Embedded Systems - CHES 2017 - Taipei, Taiwan, September 25-28, 2017 (2017), http://eprint.iacr.org/2017/600

[57] Jean, J., Nikolic, I., Peyrin, T.: Tweaks and Keys for Block Ci- phers: The TWEAKEY Framework. In: Advances in Cryptology - ASIACRYPT 2014 - 20th International Conference on the Theory and Application of Cryptology and Information Security, Kaoshiung, Tai- wan, R.O.C., December 7-11, 2014, Proceedings, Part II. pp. 274–288 (2014), https://doi.org/10.1007/978-3-662-45608-8_15

[58] Jean, J., Nikolić, I., Peyrin, T.: Joltik v1.3 (2015), Submission to the CAESAR competition, http://www1.spms.ntu.edu.sg/ ~syllab/Joltik

[59] Jean, J., Nikolić, I., Peyrin, T., Seurin, Y.: Deoxys v1.4 (2016), Submission to the CAESAR competition, http://www1.spms.ntu. edu.sg/~syllab/Deoxys

[60] Jean, J., Peyrin, T., Sim, S.M.: Optimizing Implementations of Lightweight Building Blocks. IACR Cryptology ePrint Archive, to appear in IACR Trans. Symmetric Cryptol. 2017 2017, 101 (2017), http://eprint.iacr.org/2017/101

[61] Jr., J.N., Abrahão, É.: A New Involutory MDS Matrix for the AES. I. J. Network Security 9(2), 109–116 (2009), http://ijns.femto.com.tw/contents/ijns-v9-n2/ ijns-2009-v9-n2-p109-116.pdf 182 BIBLIOGRAPHY

[62] Juels, A., Weis, S.A.: Authenticating Pervasive Devices with Human Protocols. In: Advances in Cryptology - CRYPTO 2005: 25th Annual International Cryptology Conference, Santa Barbara, California, USA, August 14-18, 2005, Proceedings. pp. 293–308 (2005), https://doi. org/10.1007/11535218_18 [63] Junod, P., Vaudenay, S.: Perfect Diffusion Primitives for Block Ci- phers. In: Selected Areas in Cryptography, 11th International Work- shop, SAC 2004, Waterloo, Canada, August 9-10, 2004, Revised Se- lected Papers. pp. 84–99 (2004), http://dx.doi.org/10.1007/ 978-3-540-30564-4_6 [64] Katz, J., Yung, M.: Complete Characterization of Security Notions for Probabilistic Private-Key Encryption. In: Proceedings of the Thirty- Second Annual ACM Symposium on Theory of Computing, May 21-23, 2000, Portland, OR, USA. pp. 245–254 (2000), http://doi.acm. org/10.1145/335305.335335 [65] Kavun, E.B., Lauridsen, M.M., Leander, G., Rechberger, C., Schwabe, P., Yalçin, T.: Prøst v1.1 (2014), Submission to the CAESAR competition, https://competitions.cr.yp.to/ round1/proestv11.pdf [66] Kerckhoffs, A.: La cryptographie militaire. Journal des sciences mili- taires, vol. IX, pp. 5-38, Jan. 1883 [67] Khoo, K., Lee, E., Peyrin, T., Sim, S.M.: Human-readable Proof of the Related-Key Security of AES-128. IACR Cryptology ePrint Archive, to appear in IACR Trans. Symmetric Cryptol. 2017 2016, 25 (2016), http://eprint.iacr.org/2016/025 [68] Khoo, K., Peyrin, T., Poschmann, A.Y., Yap, H.: FOAM: Searching for Hardware-Optimal SPN Structures and Components with a Fair Comparison. In: Cryptographic Hardware and Embedded Systems - CHES 2014 - 16th International Workshop, Busan, South Korea, September 23-26, 2014. Proceedings. pp. 433–450 (2014), https:// doi.org/10.1007/978-3-662-44709-3_24 [69] Knudsen, L.: DEAL - A 128-bit Block Cipher. NIST AES Proposal (1998) [70] Knudsen, L.R., Leander, G., Poschmann, A., Robshaw, M.J.B.: PRINT- cipher: A Block Cipher for IC-Printing. In: Cryptographic Hardware and Embedded Systems, CHES 2010, 12th International Workshop, Santa Barbara, CA, USA, August 17-20, 2010. Proceedings. pp. 16–32 (2010), https://doi.org/10.1007/978-3-642-15031-9_2 [71] Knudsen, L.R., Wagner, D.: . In: FSE 2002. Lecture Notes in Computer Science, vol. 2365, pp. 112–127. Springer (2002) BIBLIOGRAPHY 183

[72] Kölbl, S., Leander, G., Tiessen, T.: Observations on the SIMON Block Cipher Family. In: Advances in Cryptology - CRYPTO 2015 - 35th Annual Cryptology Conference, Santa Barbara, CA, USA, August 16-20, 2015, Proceedings, Part I. pp. 161–185 (2015), https://doi. org/10.1007/978-3-662-47989-6_8

[73] Kranz, T., Leander, G., Wiemer, F.: Linear Cryptanalysis: Key Sched- ules and Tweakable Block Ciphers. IACR Trans. Symmetric Cryptol. 2017(1), 474–505 (2017), http://tosc.iacr.org/index.php/ ToSC/article/view/605

[74] Krawczyk, H.: Cryptographic Extraction and Key Derivation: The HKDF Scheme. In: Advances in Cryptology - CRYPTO 2010, 30th Annual Cryptology Conference, Santa Barbara, CA, USA, August 15- 19, 2010. Proceedings. pp. 631–648 (2010), https://doi.org/10. 1007/978-3-642-14623-7_34

[75] Leander, G., Abdelraheem, M.A., AlKhzaimi, H., Zenner, E.: A Crypt- analysis of PRINTcipher: The Invariant Subspace Attack. In: Advances in Cryptology - CRYPTO 2011 - 31st Annual Cryptology Conference, Santa Barbara, CA, USA, August 14-18, 2011. Proceedings. pp. 206–221 (2011), http://dx.doi.org/10.1007/978-3-642-22792-9_ 12

[76] Leander, G., Minaud, B., Rønjom, S.: A Generic Approach to In- variant Subspace Attacks: Cryptanalysis of Robin, iSCREAM and Zorro. In: Advances in Cryptology - EUROCRYPT 2015 - 34th Annual International Conference on the Theory and Applications of Cryp- tographic Techniques, Sofia, Bulgaria, April 26-30, 2015, Proceed- ings, Part I. pp. 254–283 (2015), http://dx.doi.org/10.1007/ 978-3-662-46800-5_11

[77] Leander, G., Poschmann, A.: On the Classification of 4 Bit S-Boxes. In: Arithmetic of Finite Fields, First International Workshop, WAIFI 2007, Madrid, Spain, June 21-22, 2007, Proceedings. pp. 159–176 (2007), http://dx.doi.org/10.1007/978-3-540-73074-3_13

[78] Li, Y., Wang, M.: On the Construction of Lightweight Circulant Involutory MDS Matrices. In: Fast Software Encryption - 23rd In- ternational Conference, FSE 2016, Bochum, Germany, March 20- 23, 2016, Revised Selected Papers. pp. 121–139 (2016), https: //doi.org/10.1007/978-3-662-52993-5_7

[79] Lin, L., Wu, W.: Meet-in-the-Middle Attacks on Reduced-Round Midori64. IACR Trans. Symmetric Cryptol. 2017(1), 215–239 (2017), http://tosc.iacr.org/index.php/ToSC/article/ view/592 184 BIBLIOGRAPHY

[80] Liskov, M., Rivest, R.L., Wagner, D.A.: Tweakable Block Ciphers. In: Advances in Cryptology - CRYPTO 2002, 22nd Annual International Cryptology Conference, Santa Barbara, California, USA, August 18-22, 2002, Proceedings. pp. 31–46 (2002), https://doi.org/10.1007/ 3-540-45708-9_3

[81] Liu, G., Ghosh, M., Song, L.: Security Analysis of SKINNY under Related-Tweakey Settings. IACR Transactions on Symmetric Cryp- tology 2017(3), 37–72 (2017), https://tosc.iacr.org/index. php/ToSC/article/view/765

[82] Liu, M., Sim, S.M.: Lightweight MDS Generalized Circulant Ma- trices. In: Fast Software Encryption - 23rd International Confer- ence, FSE 2016, Bochum, Germany, March 20-23, 2016, Revised Selected Papers. pp. 101–120 (2016), http://dx.doi.org/10. 1007/978-3-662-52993-5_6

[83] Lucks, S.: Faster Luby-Rackoff Ciphers. In: Fast Software Encryp- tion, Third International Workshop, Cambridge, UK, February 21- 23, 1996, Proceedings. pp. 189–203 (1996), https://doi.org/10. 1007/3-540-60865-6_53

[84] MacWilliams, F., Sloane, N.: The Theory of Error-Correcting Codes. North-holland Publishing Company, 2nd edn. (1986)

[85] Matsui, M.: Linear Cryptanalysis Method for DES Cipher. In: Ad- vances in Cryptology - EUROCRYPT ’93, Workshop on the The- ory and Application of of Cryptographic Techniques, Lofthus, Nor- way, May 23-27, 1993, Proceedings. pp. 386–397 (1993), https: //doi.org/10.1007/3-540-48285-7_33

[86] McGrew, D., Viega, J.: The Galois/Counter Mode of Operation (GCM). Submission to NIST Modes of Operation Process, January, 2004 (2004)

[87] McKay, K.A., Bassham, L., Turan, M.S., Mouha, N.: Re- port on Lightweight Cryptography. https://www.nist.gov/ publications/report-lightweight-cryptography (2017)

[88] Moradi, A., Poschmann, A., Ling, S., Paar, C., Wang, H.: Pushing the Limits: A Very Compact and a Threshold Implementation of AES. In: Advances in Cryptology - EUROCRYPT 2011. pp. 69–88 (2011), http://dx.doi.org/10.1007/978-3-642-20465-4_6

[89] Mouha, N., Wang, Q., Gu, D., Preneel, B.: Differential and Linear Cryptanalysis Using Mixed-Integer Linear Programming. In: Infor- mation Security and Cryptology - 7th International Conference, In- scrypt 2011, Beijing, China, November 30 - December 3, 2011. Revised Selected Papers. pp. 57–76 (2011), https://doi.org/10.1007/ 978-3-642-34704-7_5 BIBLIOGRAPHY 185

[90] National Institute of Standards and Technology: Recommenda- tion for Key Management, Part 1: General – NIST SP 800-57 Part 1 Revision 4. https://nvlpubs.nist.gov/nistpubs/ SpecialPublications/NIST.SP.800-57pt1r4.pdf [91] Nyberg, K.: Differentially Uniform Mappings for Cryptography. In: Advances in Cryptology - EUROCRYPT ’93, Workshop on the Theory and Application of of Cryptographic Techniques, Lofthus, Norway, May 23-27, 1993, Proceedings. pp. 55–64 (1993), https://doi.org/10. 1007/3-540-48285-7_6 [92] Nyberg, K.: Linear Approximation of Block Ciphers. In: Advances in Cryptology - EUROCRYPT ’94, Workshop on the Theory and Applica- tion of Cryptographic Techniques, Perugia, Italy, May 9-12, 1994, Proceedings. pp. 439–444 (1994), https://doi.org/10.1007/ BFb0053460 [93] Ohkuma, K.: Weak Keys of Reduced-Round PRESENT for Lin- ear Cryptanalysis. In: Selected Areas in Cryptography, 16th An- nual International Workshop, SAC 2009, Calgary, Alberta, Canada, August 13-14, 2009, Revised Selected Papers. pp. 249–265 (2009), https://doi.org/10.1007/978-3-642-05445-7_16 [94] Peyrin, T., Seurin, Y.: Counter-in-Tweak: Authenticated Encryp- tion Modes for Tweakable Block Ciphers. In: Advances in Cryp- tology - CRYPTO 2016 - 36th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 14-18, 2016, Pro- ceedings, Part I. pp. 33–63 (2016), https://doi.org/10.1007/ 978-3-662-53018-4_2 [95] Peyrin, T., Sim, S.M., Wang, L., Zhang, G.: Cryptanalysis of JAMBU. In: Fast Software Encryption - 22nd International Work- shop, FSE 2015, Istanbul, Turkey, March 8-11, 2015, Revised Selected Papers. pp. 264–281 (2015), http://dx.doi.org/10. 1007/978-3-662-48116-5_13 [96] Rivest, R.L., Shamir, A., Adleman, L.M.: A Method for Obtaining Digital Signatures and Public-Key . Commun. ACM 21(2), 120–126 (1978), http://doi.acm.org/10.1145/359340. 359342 [97] Robinson, D.: An Introduction to Abstract Algebra. De Gruyter text- book, Walter de Gruyter (2003), https://books.google.com. sg/books?id=Yj3ApD8TeCUC [98] Rogaway, P.: Authenticated-encryption with associated-data. In: Pro- ceedings of the 9th ACM Conference on Computer and Communi- cations Security, CCS 2002, Washington, DC, USA, November 18- 22, 2002. pp. 98–107 (2002), http://doi.acm.org/10.1145/ 586110.586125 186 BIBLIOGRAPHY

[99] Rogaway, P.: Efficient Instantiations of Tweakable Blockciphers and Refinements to Modes OCB and PMAC. In: Advances in Cryptology - ASIACRYPT 2004, 10th International Conference on the Theory and Application of Cryptology and Information Security, Jeju Island, Korea, December 5-9, 2004, Proceedings. pp. 16–31 (2004), http: //dx.doi.org/10.1007/978-3-540-30539-2_2

[100] Rogaway, P.: Let’s not Call It MR. http://web.cs.ucdavis. edu/~rogaway/beer.pdf (2014)

[101] Rogaway, P., Bellare, M., Black, J.: OCB: A Block-Cipher Mode of Operation for Efficient Authenticated Encryption. ACM Trans. Inf. Syst. Secur. 6(3), 365–403 (2003), http://doi.acm.org/10. 1145/937527.937529

[102] Rogaway, P., Shrimpton, T.: A Provable-Security Treatment of the Key-Wrap Problem. In: Advances in Cryptology - EUROCRYPT 2006, 25th Annual International Conference on the Theory and Applications of Cryptographic Techniques, St. Petersburg, Russia, May 28 - June 1, 2006, Proceedings. pp. 373–390 (2006), http://dx.doi.org/10. 1007/11761679_23

[103] S. Frankel, R.G.H., Kelly, S.: The AES-CBC Cipher Algorithm and Its Use with IPsec. Network Working Group, RFC 3602 (September 2003)

[104] Saarinen, M.O.: Cryptographic Analysis of All 4 x 4-Bit S-Boxes. In: Selected Areas in Cryptography - 18th International Work- shop, SAC 2011, Toronto, ON, Canada, August 11-12, 2011, Re- vised Selected Papers. pp. 118–133 (2011), http://dx.doi.org/ 10.1007/978-3-642-28496-0_7

[105] Sadeghi, S., Mohammadi, T., Bagheri, N.: Cryptanalysis of Reduced round SKINNY Block Cipher. Through personal communication (2017)

[106] Sarkar, S., Sim, S.M.: A Deeper Understanding of the XOR Count Distribution in the Context of Lightweight Cryptography. In: Progress in Cryptology - AFRICACRYPT 2016 - 8th International Confer- ence on Cryptology in Africa, Fes, Morocco, April 13-15, 2016, Proceedings. pp. 167–182 (2016), http://dx.doi.org/10.1007/ 978-3-319-31517-1_9

[107] Sarkar, S., Syed, H.: Lightweight Diffusion Layer: Importance of Toeplitz Matrices. IACR Trans. Symmetric Cryptol. 2016(1), 95–113 (2016), http://tosc.iacr.org/index.php/ToSC/article/ view/537

[108] Sasaki, Y., Todo, Y.: New Impossible Differential Search Tool from Design and Cryptanalysis Aspects - Revealing Structural Properties of Several Ciphers. In: Advances in Cryptology - EUROCRYPT 2017 - BIBLIOGRAPHY 187

36th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Paris, France, April 30 - May 4, 2017, Proceedings, Part III. pp. 185–215 (2017), https://doi.org/10. 1007/978-3-319-56617-7_7 [109] Schaumüller-Bichl, I.: Cryptanalysis of the Data Encryption Standard by the Method of Formal Coding, pp. 235–255. Springer Berlin Hei- delberg, Berlin, Heidelberg (1983), https://doi.org/10.1007/ 3-540-39466-4_17 [110] Schneier, B., Kelsey, J.: Unbalanced Feistel Networks and Block Cipher Design. In: Fast Software Encryption, Third International Workshop, Cambridge, UK, February 21-23, 1996, Proceedings. pp. 121–144 (1996), https://doi.org/10.1007/3-540-60865-6_49 [111] Shannon, C.E.: Communication Theory of Secrecy Systems. Bell System Technical Journal 28(4), 656–715 (Oct 1949), http: //en.wikipedia.org/wiki/Communication_Theory_of_ Secrecy_Systems;http://www.alcatel-lucent.com/ bstj/vol28-1949/articles/bstj28-4-656.pdf;http: //www.cs.ucla.edu/~jkong/research/security/ shannon1949.pdf [112] Shibutani, K., Isobe, T., Hiwatari, H., Mitsuda, A., Akishita, T., Shirai, T.: Piccolo: An Ultra-Lightweight Blockcipher. In: Crypto- graphic Hardware and Embedded Systems - CHES 2011 - 13th Inter- national Workshop, Nara, Japan, September 28 - October 1, 2011. Proceedings. pp. 342–357 (2011), https://doi.org/10.1007/ 978-3-642-23951-9_23 [113] Shirai, T., Shibutani, K.: On the diffusion matrix employed in the Whirlpool hashing function. NESSIE Phase 2 Report NES/DOC/EX- T/WP5/002/1 [114] Shirai, T., Shibutani, K., Akishita, T., Moriai, S., Iwata, T.: The 128-Bit Blockcipher CLEFIA (Extended Abstract). In: Fast Software Encryption, 14th International Workshop, FSE 2007, Luxembourg, Luxembourg, March 26-28, 2007, Revised Selected Papers. pp. 181–195 (2007), https://doi.org/10.1007/978-3-540-74619-5_12 [115] Sim, S.M., Khoo, K., Oggier, F.E., Peyrin, T.: Lightweight MDS Involution Matrices. In: Fast Software Encryption - 22nd Interna- tional Workshop, FSE 2015, Istanbul, Turkey, March 8-11, 2015, Re- vised Selected Papers. pp. 471–493 (2015), http://dx.doi.org/ 10.1007/978-3-662-48116-5_23 [116] Sim, S.M., Wang, L.: Practical Forgery Attacks on SCREAM and iSCREAM, SYLLAB website: http://www1.spms.ntu.edu.sg/ syl- lab/m/index.php/CAESAR http://www1.spms.ntu.edu.sg/ ~syllab/m/images/b/b3/ForgeryAttackonSCREAM.pdf 188 BIBLIOGRAPHY

[117] Stewart, J., Salem, L., Husted, M.: The Bundle of Sticks. Continen- tal Press, Incorporated (2004), https://books.google.com.sg/ books?id=L6r0PAAACAAJ

[118] Stinson, D.R.: Cryptography - Theory and Practice. Discrete mathe- matics and its applications series, CRC Press (1995)

[119] Sun, S., Hu, L., Song, L., Xie, Y., Wang, P.: Automatic Security Evaluation of Block Ciphers with S-bP Structures Against Related- Key Differential Attacks. In: Information Security and Cryptology - 9th International Conference, Inscrypt 2013, Guangzhou, China, November 27-30, 2013, Revised Selected Papers. pp. 39–51 (2013), https://doi.org/10.1007/978-3-319-12087-4_3

[120] Todo, Y.: Structural Evaluation by Generalized Integral Property. In: EUROCRYPT 2015, Part I. Lecture Notes in Computer Science, vol. 9056, pp. 287–314. Springer (2015)

[121] Todo, Y., Leander, G., Sasaki, Y.: Nonlinear Invariant Attack - Practi- cal Attack on Full SCREAM, iSCREAM, and Midori64. In: Advances in Cryptology - ASIACRYPT 2016 - 22nd International Conference on the Theory and Application of Cryptology and Information Security, Hanoi, Vietnam, December 4-8, 2016, Proceedings, Part II. pp. 3–33 (2016), https://doi.org/10.1007/978-3-662-53890-6_1

[122] Toh, D., Teo, J., Khoo, K., Sim, S.M.: Lightweight MDS Serial-Type Matrices with Minimal Fixed XOR Count. In: Progress in Cryptology - AFRICACRYPT 2018 - 10th International Conference on Cryptology in Africa, Marrakesh, Morocco, May 7-9, 2018, Proceedings. pp. 51–71 (2018), https://doi.org/10.1007/978-3-319-89339-6_4

[123] Tolba, M., Abdelkhalek, A., Youssef, A.M.: Impossible Differential Cryptanalysis of Reduced-Round SKINNY. In: Progress in Cryptology - AFRICACRYPT 2017 - 9th International Conference on Cryptology in Africa, Dakar, Senegal, May 24-26, 2017, Proceedings. pp. 117–134 (2017), https://doi.org/10.1007/978-3-319-57339-7_7

[124] Vaudenay, S.: On the Need for Multipermutations: Cryptanalysis of MD4 and SAFER. In: Fast Software Encryption: Second International Workshop. Leuven, Belgium, 14-16 December 1994, Proceedings. pp. 286–297 (1994), https://doi.org/10.1007/3-540-60590-8_ 22

[125] Velichkov, V.: Recent Methods for Cryptanalysis of Symmetric-key Cryptographic Algorithms (Recente Methoden voor de Cryptanalyse van Symmetrische-sleutel Cryptografische Algoritmen). Ph.D. thesis, Katholieke Universiteit Leuven, Belgium (2012), https://lirias. kuleuven.be/handle/123456789/335732 BIBLIOGRAPHY 189

[126] Virtual Silicon Inc.: 0.18 µm VIP Standard Cell Library Tape Out Ready, Part Number: UMCL18G212T3, Process: UMC Logic 0.18 µm Generic II Technology: 0.18µm (July 2004)

[127] Wagner, D.A.: The Boomerang Attack. In: Fast Software Encryp- tion, 6th International Workshop, FSE ’99, Rome, Italy, March 24- 26, 1999, Proceedings. pp. 156–170 (1999), https://doi.org/10. 1007/3-540-48519-8_12 [128] Wiener, M.J.: Efficient DES key search. Tech. rep., School of Computer Science, Carleton University (1994)

[129] Wingers, L.: Software for SUPERCOP Benchmarking of SIMON and SPECK. https://github.com/lrwinge/simon_speck_ supercop (2015) [130] Wu, H., Huang, T.: JAMBU Lightweight Authenticated Encryption Mode and AES-JAMBU (v1). Submitted to the CAESAR competition (March 2014)

[131] Wu, H., Huang, T.: The JAMBU Lightweight Authentication En- cryption Mode (v2.1). Update version to submission for the CAESAR competition (September 2016)

[132] Zhang, W., Bao, Z., Lin, D., Rijmen, V., Yang, B., Verbauwhede, I.: RECTANGLE: a bit-slice lightweight block cipher suitable for multiple platforms. SCIENCE CHINA Information Sciences 58(12), 1–15 (2015), https://doi.org/10.1007/s11432-015-5459-7 [133] Zheng, Y., Matsumoto, T., Imai, H.: On the Construction of Block Ciphers Provably Secure and Not Relying on Any Unproved Hypotheses. In: Advances in Cryptology - CRYPTO ’89, 9th Annual International Cryptology Conference, Santa Barbara, California, USA, August 20- 24, 1989, Proceedings. pp. 461–480 (1989), https://doi.org/10. 1007/0-387-34805-0_42