<<

on a Customized Network

Ricardo Martinho Ferreira Miranda

Thesis to obtain the Master of Science Degree in

Mathematics and Applications

Examination Committee Chairperson: Prof. Maria Cristina Sales Viana Serodioˆ Sernadas Supervisor: Prof. Paulo Alexandre Carreira Mateus Co-supervisor: Bruno Neto de Oliveira Tavares Member of the Committee: Prof. Andre´ Nuno Carvalho Souto

November 2017

Acknowledgements

I want to thank both my supervisors Paulo Mateus and Bruno Tavares for all their support and guid- ance. I would also like to thank my dearest friend Sofia Brito, whose counseling and motivation were crucial aspects in the overcoming of the most difficult moments.

i ii Resumo

Construir uma rede segura para ser utilizada em aplicac¸oes˜ reais onde ha´ restric¸oes˜ impostas as` capacidades dos elementos da rede e a` transferenciaˆ de informac¸ao˜ necessita uma analise´ crip- tografica´ costumizada de forma a proteger as comunicac¸oes˜ e detectar e minimizar as vulnerabilidades do sistema que poderao˜ ser exploradas. Neste documento, uma rede com essas condic¸oes˜ e´ apre- sentada, procura-se encontrar um esquema topologico´ otimo´ antes de se escolherem os componentes criptograficos´ da rede embutidos nas comunicac¸oes˜ e armazenamento e posteriormente analiza-se a sua seguranc¸a. De entre as alternativas escrutinadas, apenas uma e´ escolhida como a soluc¸ao,˜ por comparac¸ao˜ em termos de performance, seguranc¸a e adaptac¸ao˜ as` restric¸oes˜ impostas. Esta soluc¸ao˜ e´ implementada usando as linguagens de programac¸ao˜ C e Java. Prova-se que os esquemas de encriptac¸ao˜ e protocolos escolhidos sao˜ opc¸oes˜ altamente adequadas e o seu uso na pratica´ e´ acon- selhado. Estes resultados sao˜ apenas validos´ para este espec´ıfico caso de estudo, uma vez que na eventualidade de alguma das restric¸oes˜ ser alterada entao˜ e´ provavel´ que exista uma soluc¸ao˜ diferente da sugerida e mais apropriada.

Palavras-chave: indistinguibilidade de texto cifrado; modo de operac¸ao˜ de cifra de bloco; seguranc¸a semantica;ˆ sistema de encriptac¸ao˜ simetrico.´

iii iv Abstract

Building a secure network to be used in real-world applications where there are constraints strictly imposed to the capabilities of the network’s elements and to the data flow requires a customized crypto- graphic analysis in order to protect the communications and detect and minimize the system’s exploitable vulnerabilities. In this document a network under such conditions is presented and one is challenged with providing an optimal topological scheme prior to choosing the network’s cryptographic components embedded in the communication and data storage protocols and posteriorly analyzing their security. Among the scrutinized alternatives a single one of them is elected as the solution by a comparison in terms of performance, security and suitability under the enforced restrictions. This solution is imple- mented using C and Java programming languages. The selected schemes and protocols are proven to be highly reasonable options and their use in practice is advised. These results are only valid for this specific case of study, for if any of the established constraints is ruled out then it is most likely the insurgence of an enhanced solution.

Keywords: indistinguishability; mode of operation; semantic security; symmetric .

v vi Glossary

In The set {k ∈ N : 1 ≤ k ≤ n}. P (A) Probability of occurrence of event A.. A∗ The Kleene star of A. I The set of unique identifiers of gathering devices.

O f ∈ O(g) ⇔ ∃ + ∃ ∀ |f(x)| ≤ M|g(x)| M∈R x0∈R x≥x0 .

∗ bitstring An element of Z2. byte A metric related with data-storage, composed by 1 octet. kB 1 kB = 1024 bytes. octet A sequence of 8 bits.

vii viii List of Abbreviations

bxc Floor function of x, for some x ∈ R. 0j The bitstring composed of j ’0’s, for some j ∈ N. 1j he bitstring composed of j ’1’s, for some j ∈ N.

[w]2 Binary representation of the word m. dxe Ceiling function of x, for some x ∈ R. w|k Suffix of w of length k, for some k ∈ N. w|k Prefix of w of length k, for some k ∈ N. x k y Concatenation of words x and y. x \ y Difference of x and y.

|w|2 The number of bits of the word w. 3DES Triple DES.

ACK Acknowledgement. AES Advanced Encryption Standard.

BCMO block cipher mode of operation.

CA certificate authority. CBC Cipher Block Chaining. CCM Counter with CBC-MAC. CFB Cipher Feedback. CPU central processing unit. CSPRNG cryptographically secure pseudo-random number generator. CTR Counter. CTR-H CTR mode with HMAC-256 checksum.

DAL Downstream Algorithm Lifecycle. DB database. DDL Downstream Data Lifecycle. DES .

ix EAP Extensible . ECB Electronic Codebook. ECC Elliptic Curve Cryptography.

FIFO First in first out.

GCM Galois Counter Mode. GD Gathering device.

GDj gathering device with unique identifier j ∈ I. GMAC Galois Message Authentication . GTK Group Temporal .

HMAC hash-based message authentication code.

IEEE Institute of Electrical and Electronic Engineers. IEEESA Institute of Electrical and Electronic Engineers Standards Association. IND-CCA Indistinguishability under chosen-ciphertext attack. IND-CPA Indistinguishability under chosen-plaintext attack. IV .

KDF .

LAN local area network.

MAC message authentication code. MIC message integrity code. MiM Man-in-the-middle. MnDM Mission and data manager. MPP Middle-point party.

NIST National Institute of Standards and Technology.

PBKDF2 password-based key derivation function 2. PCgF package ciphertext generator function. PCuF package ciphertext unpacking function. PMK Pairwise Master Key. PMS pre-mission system. POA Oracle Attack.

x PRF pseudo-random function. PRNG pseudo-random number generator. PSch packing scheme. PSK Pre-shared key. PTK Pairwise Transient Key.

RFC Request for Comments.

SEM-CPA Semantic security under chosen-plaintext attack. SPN Substitution Permutation Network. SSID Service Set Identifier.

UAL Upstream Algorithm Lifecycle. UDL Upstream Data Lifecycle.

WLAN wireless local area network.

XOR exclusive-or operation.

xi xii List of Tables

4.1 Comparison between CTR and CFB features...... 58

xiii xiv List of Figures

2.1 Encryption round of a SPN. Corresponds to the round function g from cryptosystem 4. It is used in all rounds except the last...... 11 2.2 Network Layout...... 21 2.3 Extensible Authentication Protocol (EAP)...... 22 2.4 WPA2 four-way handshake...... 23 2.5 WPA2 group-key handshake...... 23 2.6 Man in the middle attack. Eve is able to intercept the message and/or jam the communi- cation channel at will...... 25

3.1 General purpose and activity of the envisaged network...... 36 3.2 General layout of the desired network...... 37 3.3 Pre-deployment stage ...... 37 3.4 Topology of AP-based networks...... 39 3.5 Topology of the ad-hoc network...... 40

4.1 Key generation based on k users ...... 52

5.1 Pre-processing steps of the secret pass for the generation of the of the SHA-1 pseudo-random function...... 62 5.2 Scatter plots of the average key generation time per number of gathering devices. . . . . 64 5.3 Upstream Algorithm Lifecycle ...... 69 5.4 Downstream Algorithm Lifecycle ...... 73

A.1 ECB mode encryption and decryption procedures using an arbitrary block cipher B.... 81 A.2 CBC mode encryption and decryption procedures using an arbitrary block cipher B.... 82 A.3 CFB mode encryption and decryption procedures using an arbitrary block cipher B.... 83 A.4 CTR mode encryption and decryption procedures using an arbitrary block cipher B.... 84

B.1 KeyGeneratorApp’s initial screen...... 86 B.2 KeyGeneratorApp’s target choice screen...... 87 B.3 KeyGeneratorApp’s file details...... 88 B.4 KeyGeneratorApp’s key export final step...... 89

xv B.5 KeyGeneratorApp’s key checker example screen...... 89 B.6 Pre-deployment stage secret information’s revealment...... 90

∗ C.1 Message format F1 ...... 91

C.2 Message format F1 ...... 91 ∗ C.3 Message format F2 ...... 92

C.4 Message format F2 ...... 92 ∗ C.5 Message format F3 ...... 93

C.6 Message format F3 ...... 93 ∗ C.7 Message format F4 ...... 94

C.8 Message format F4 ...... 94 ∗∗ C.9 Message format F5 ...... 94 ∗ C.10 Message format F5 ...... 95

C.11 Message format F5 ...... 95

xvi Contents

Resumo iii

Abstract v

Glossary vii

List of Abbreviations ix

List of Tables xiii

List of Figures xv

1 Introduction 1 1.1 Summary ...... 1

2 Basic Concepts 3 2.1 ...... 4 2.2 Modern Cryptography ...... 6 2.2.1 Block Ciphers ...... 6 2.2.1.1 Linear and Differential Cryptanalysis ...... 7 2.2.1.2 DES and 3DES ...... 8 2.2.1.3 AES ...... 9 2.2.2 Block Cipher Modes of Operation ...... 11 2.2.2.1 ECB ...... 12 2.2.2.2 CBC ...... 12 2.2.2.3 CFB ...... 13 2.2.2.4 CTR ...... 13 2.2.2.5 CCM ...... 14 2.2.2.6 GCM ...... 15 2.2.2.7 Padding ...... 15 2.2.3 Asymmetric Cryptography ...... 16 2.3 Cryptographic Hash Functions ...... 16 2.3.1 SHA-256 ...... 17

xvii 2.3.2 HMAC ...... 18 2.4 Randomness ...... 18 2.4.1 Key Derivation ...... 19 2.5 Communication Protocols in Wireless Networks ...... 20 2.5.1 WEP ...... 20 2.5.2 WPA/WPA2 ...... 21 2.5.2.1 Initial Authentication ...... 21 2.5.2.2 4-way Handshake ...... 22 2.5.2.3 Group-key Handshake ...... 23 2.6 Known Attacks ...... 24 2.6.1 Brute Force and Dictionary Attacks ...... 24 2.6.2 Man In The Middle Attack ...... 24 2.6.3 ...... 25 2.6.4 ...... 26 2.6.5 Padding Oracle Attack ...... 26 2.6.6 Attacks ...... 28 2.6.6.1 Key Reuse ...... 29 2.6.6.2 Bit-flipping ...... 30 2.6.7 Weaknesses of Block Cipher Modes of Operation ...... 31 2.6.8 Side-Channel Attack ...... 33 2.6.9 Attacks on AES ...... 34

3 Network 35 3.1 The Problem ...... 35 3.2 Details ...... 36 3.3 Network Topology ...... 38 3.4 Protocol ...... 41 3.4.1 Setup ...... 41 3.4.2 Communication Protocol ...... 42 3.5 Message Formats ...... 48

4 Security Analysis 51 4.1 Strengths and Weaknesses ...... 51 4.1.1 Key Generation ...... 51 4.1.2 Packing Schemes and Protocols ...... 53 4.1.2.1 Semantic security ...... 55 4.1.2.2 Encryption Schemes ...... 57 4.1.3 Attacks ...... 59 4.2 Possible Solutions ...... 60 4.2.1 Chosen-plaintext attack ...... 60

xviii 4.2.2 Chosen-ciphertext attack ...... 60

5 Implementation Details 61 5.1 Key Generation ...... 61 5.2 Data Processing ...... 65 5.2.1 Upstream Algorithm Lifecycle ...... 65 5.2.2 Downstream Algorithm Lifecycle ...... 70

6 Results 75 6.1 Future Work ...... 76

References 77

A Schemes of Block Cipher Modes of Operation 81

B User Manual: Key Generation Application 85

C Message Formats 91

xix xx Chapter 1

Introduction

The first approach to secure a certain piece of information dates back to the Ancient Greece. Ever since, mankind has been continuously developing new methods for securing desired secrets and while some create the methods to secure information, others put a lot of effort on discovering weaknesses in order to retrieve the envisaged secrets. An example that reflects cryptography’s tremendous relevance in modern days is World War II. The victory of the Allies is considered to have been greatly influenced by their ability to eavesdrop on the enemy’s communications after being able to break the Enigma [1] cipher and as such, this war propelled the major advancements in the fields of cipher construction and cryptanalysis. The continuous demand for protecting information is the fuel that thrives the evolution of computational security. In the current work one is presented with a network composed by several types of devices with cer- tain restriction with respect to memory, space and power consumption and aims to choose the most suitable topology for the network and to create a security mechanism to be included in the communi- cation protocol with the objective of providing the satisfiability of some cryptographic properties to the messages travelling through the network. This work was developed for a real-world project in a business environment under the supervision of analysts and developers of the company GMVIS Skysoft, S.A. therefore several limitations were imposed.

1.1 Summary

This dissertation is segmented in six chapters and an appendix with three sections. In chapter 2 some required state-of-the art concepts that are somehow related with the developed work are addressed. In chapter 3 the problem is introduced, the options with regard to the network’s topology and communication protocols are discussed, the general protocol for the upstream and down- stream data lifecycles is presented and its underlying message formats are defined. Chapter 4 contains a security analysis for the network defined in chapter 3. In chapter 5 the details regarding the implemen- tation are discussed and some parts of the code are presented and analyzed as pseudocode. Chapter 6 is the last of its kind and contains a general overview of the results obtained and motivation for future

1 work on the subject. In appendix A lie the figures regarding the construction of the block cipher modes of operation pre- sented in chapter 2, appendix B is a user-guide manual for the developed user-interface application with respect to the generation of keys and in appendix C one can visualize all the message formats introduced in chapter 3.

2 Chapter 2

Basic Concepts

This chapter provides an overview of several cryptographic concepts and algorithms that are ap- plicable to the system that is considered in the next section and references are provided for various topics that are considered relevant, but out of the central scope of the text. The reader is assumed to be familiarized with basic cryptographic theory. Section 2.1 addresses some security definitions for and the most common techniques used by adversaries, section 2.2 introduces modern cryptographic concepts with highlight for private-key cryptosystems, section 2.3 contains the properties of cryptographic hash functions as well as a high level description of SHA-256, section 2.4 describes the problem of generating random values, section 2.5 specified the wireless communication protocols used in the problem at hand and the last section refers to nowadays’ known attacks to some of the systems described throughout the chapter. The minor level of detail assigned to public-key cryptography is due to the deprecation of this option with regard to the solution of security nominated in chapter 3. The suite of algorithms for the key generation, encryption and decryption processes form a cryp- tographic system (cryptosystem) and these are usually implemented to provide the user the ability of covering classified information.

Definition 2.0.1. A Cryptosystem is a 5-tuple (P, C, K, E, D) such that:

•P is the set of all possible plaintexts.

•C is the set of all possible .

•K is the set of all possible keys, also denoted as the key space.

•E := {Ek : P → C}k∈K is the family of all encryption functions.

•D := {Dk : C → P}k∈K is the family of all decryption functions.

Two types of cryptosystems can be defined: symmetric and asymmetric. In the first, the same key is used for encryption and decryption, while in the latter there are two distinct keys, one for encryp- tion and one for decryption. Section 2.2.3 further discusses asymmetric cryptography also known as public-key cryptography; for now this distinction suffices. The following section is oriented for symmetric cryptography.

3 2.1 Cryptanalysis

It is possible to break1 cryptographic systems without the knowledge of the used key(s) or even of the algorithm itself. As the name suggests, Cryptanalysis is the study of cryptosystems with the objective to find flaws or weaknesses that entail a gain of information from unauthorized parties, without necessarily discovering the secret key. Distinct cryptanalysis’ methods can be categorized based on the information available to the attacker. Following, the most common cryptanalysis methods are presented:

Definition 2.1.1 (Ciphertext-only attack). A ciphertext-only attack is one where the adversary possesses information regarding the ciphertext and is able to deduce either the corresponding plaintext or the key, without being provided any details about the plaintext itself, theoretically. Notwithstanding, in practice, the attacker usually does have access to useful information, such as the alphabet in which the plaintext is written.

Definition 2.1.2 (Known-plaintext attack). A Known-plaintext attack focuses on finding the secret key (or key stream) of the cryptosystem at hand, provided the knowledge of both the ciphertext and its corresponding plaintext.

Definition 2.1.3 (Chosen-plaintext attack). In a Chosen-plaintext attack, the adversary has access to an encryption oracle, which encrypts any plaintext given by the attacker and outputs the corresponding ciphertext.

A cryptosystem is said to be secure against chosen-plaintext attacks if and only if given an adversary who is able to choose any pair of plaintexts x0, x1 whose are y0, y1 respectively, cannot decide which of the following is true:

yi = ek(xi) or yi = ek(xi+1) (2.1) with probability greater than 0.5.

Definition 2.1.4 (Chosen-ciphertext attack). In a Chosen-ciphertext attack, the adversary has access to a decryption oracle, which decrypts any ciphertext given by the attacker and outputs the corresponding plaintext.

The increasing complexity of cryptosystems throughout the years has demanded a serious develop- ment on the methods of cryptanalysis. A system is as secure as its resilience to the most devious pos- sible attack and every cryptosystem used in the day is continuously being targeted by hackers, who never cease their attacking spree and are constantly developing new methods aiming to increase their success rate. So, how can one be certain that a cryptosystem will always remain secure against certain types of attacks? The following definition highlights such property.

Definition 2.1.5 (Semantic Security). Let C = (X,Y,K, E, D) be a cryptosystem and ek ∈ E. A cryp- tosystem is said to be semantically secure if given y = ek(x), then V (A(y, l)) = V (B(l)), where A and

1By ”break” one means being able to recover the corresponding plaintext for any given ciphertext.

4 B are two polynomial-time bounded adversaries, l = |y| and V is the advantage of the adversary, which is defined by

V (a) = P (a chooses the wrong plaintext) − P (a chooses the correct plaintext) (2.2) for every adversary a, where P (E) defines the probability of occurrence of event E.

Notwithstanding, it is possible that a cryptosystem is semantically secure against some types of attacks while having some flaws with respect to its construction, which entail undesired properties that can be used by an adversary to exploit vulnerabilities in the scheme at hand. The upcoming definitions are helpful to the semantic security analysis of cryptosystems. The follow- ing description is based on the results presented in [2] and some similar notation is used. Consider the following scenario: an adversary A living in one of two worlds2 (left world L or right world

R) is trying to break a cryptographic system C and has access to an encryption oracle Oe. A does not know which world he lives in but the world W is defined a priori and cannot be changed throughout the entire activity of A. The encryption oracle, given any two plaintexts p0, p1 always returns the ciphertext ek(pb) where ek is the encryption function of C for some k ∈ K and b ∈ {0, 1} is picked according to the following relation  0 if W = L b = (2.3) 1 if W = R

Oe is known as lr-oracle.

Definition 2.1.6 (IND-CPA). Let C = (P,C, K, E, D) be a cryptosystem with encryption and decryption functions ek and dk, respectively, for some k ∈ K, let A be an adversary and O an encryption lr- n oracle. Consider that A is in possession of X = (x1, . . . , xn) ∈ P and also that |xi| = |xj|, ∀i6=j. Indistinguishability under chosen-plaintext attack (IND-CPA) is a game defined as follows:

0 0 1. A picks two messages x0, x1 ∈ X;

0 0 2. A queries the oracle O with (x0, x1);

0 0 3. O encrypts xb yielding ek(xb), according to 2.3;

0 4. O returns the encryption output ek(xb) to A;

(A can repeat steps 1 to 4 at will);

5. A chooses b0 ∈ {0, 1};

6. A wins if b0 = b and loses otherwise.

If A is able to correctly choose b0 with probability negligibly greater than 1/2 then the system at hand is not semantically secure against chosen-plaintext attacks (see property 1). To the answer of the encryption oracle to the adversary’s query one denotes by challenge ciphertext and analogously for the decryption oracle by challenge plaintext.

2A world can be seen as a binary state.

5 A stronger measure of security can be defined based on the previous definition. If the adversary not only has access to an encryption oracle but also to a decryption oracle, then A is granted a serious amount of resources that threaten the security of the cryptosystem at hand as this is the most critical and undesirable type of attack to defend against.

Definition 2.1.7 (IND-CCA). Indistinguishability under chosen-ciphertext attack (IND-CCA) is a game analogous to the one from Definition 2.1.6, but herein the adversary has access to two lr-oracles: an encryption lr-oracle Oe and a decryption lr-oracle. This game has the additional requirement that the adversary A may not query the decryption oracle with challenge ciphertexts.

Let Od be the decryption lr-oracle and assume that the adversary is not allowed to query Od with any challenge ciphertext, in which case it would be trivial for A to gain advantage on the IND-CCA game because A would immediately know the world W . The two properties that follow are desirable properties for analyzing the security level of any cryptographic system.

Property 1 (IND-CPA secure). A cryptosystem is said to be IND-CPA secure if a polynomial-time bounded adversary who plays the IND-CPA game cannot win with probability negligibly greater than 1/2.

Property 2 (IND-CCA secure). A cryptosystem is said to be IND-CCA secure if a polynomial-time bounded adversary who plays the IND-CCA game cannot win with probability negligibly greater than 1/2.

According to [3], IND-CCA ⊂ IND-CPA, thus any system that is IND-CCA secure is also IND-CPA secure. In the present days, the minimum threshold of security required for a cryptosystem to be ac- ceptable with regard to its security level is to satisfy property 1.

2.2 Modern Cryptography

During World War II, Claude Shannon [4] contributed to the development of Cryptography, especially with his results presented in a 1945 classified paper [5], which influenced the development of modern day cryptography. This section describes linear and differential cryptanalysis techniques, some of the most important block ciphers with focus on AES and finally several block cipher modes of operation and their properties.

2.2.1 Block Ciphers

Block ciphers can be defined as iterated product ciphers. The notion of an iterated cipher is straight- forward and its main components are a round function and a . As the name suggests, the cipher consists in performing several rounds (iterations) of a round function applied on a state and a round key, where the initial state is the plaintext, a (non-initial) state is defined as the image of the round function and a round key and the round key is one of the elements of the output of the key schedule algorithm. Cryptosystem 1 formally illustrates this description.

6 ∗ ∗ Cryptosystem 1. Let X = ΣP and Y = ΣC be the set of plaintext and ciphertext bitstrings, respectively and K the key set. Let round : S × KR → S be the round function and f : K → KR the key schedule, where S ⊇ X ∪ Y is the set of states and KR the set of round keys with |KR| = r ∈ N, such that

x = s0;

y = sr; (2.4) f(k) = (k1, . . . , kr);

round(si, ki+1) = si+1;

An iterated cipher is defined as follows:

ek(x) =round(. . . round(round(s0, k1), k2) . . . , kr); (2.5) −1 −1 −1 dk(y) =round (. . . round (round (sr, kr), kr−1) . . . , k1); where round−1 is well defined iff round(s, k) is injective for fixed k.

For block ciphers and according to the initial statement of this section, round usually contains a combination of S-Boxes and/or P-Boxes. A common technique used to increase the security of a block cipher is called key whitening and con- sists in performing an exclusive-or operation (XOR) using the round key in the initial and the last rounds. Whitening contributes to increasing the hardness of a brute-force attack. For a block cipher to be con- sidered robust, it must have good [5], otherwise it may be susceptible to simple statistical attacks, namely linear and differential cryptanalysis. Robust block ciphers are widely used in nowadays’ cryptographic algorithms, namely in cryptographic hash functions and pseudo-random num- ber generators.

2.2.1.1 Linear and Differential Cryptanalysis

Linear and differential cryptanalysis are the most common and devious attacks known to block ci- phers. Generally speaking, both focus on finding probabilistic linear relations and exploit them in such a manner that it becomes feasible to perform either known-plaintext or chosen-plaintext attacks. Both techniques make use of the bias of a random variable, as opposed to measuring the raw probability, as it expresses the deviation of the true value of a random variable with its expecting value. For a Bernoulli distributed random variable (X ∼ Ber(p)), this quantity is defined by 1 (X) = p − (2.6) 2

Linear Cryptanalysis

Assume that, for large n, the attacker Victor (V) has access to (x1, y1) ..., (xn, yn) such that ek(xi) = yi for fixed k, where ek is the encryption function of the block cipher at hand. Moreover, suppose that V is able to linearly relate subsets of the plaintext, ciphertext and key bits on a linear approximation of the form ⊕ ⊕ ⊕ x [a1, . . . , aj] ⊕ y [b1, . . . bl] = k [c1, . . . , cm] (2.7)

7 ⊕ where a1, . . . aj, b1, . . . , bl, c1, . . . , cm are fixed bit indexes and v [d1, . . . , dh] represents v[d1] ⊕ ... ⊕ v[dh] such that v[di] ∈ {0, 1}, ∀i∈Ih , for fixed bit index di.

The attack consists in assigning an equally-valued counter Ck to each possible key k ∈ K. For every pair (xi, yi) in V’s possession he computes the left side of equation 2.7 for each k and Ck is incremented each time the abovementioned equation holds. At the end of the whole process, the key k with the highest counter value is the key for which the bits k[c1], . . . , k[cm] are considered to be correct. Considering T to be the random variable that represents the outcome of equation 2.7, the effec- tiveness of a linear cryptanalytic attack is proportional to |(T)|[6]. According to [7], the number n of plaintext-ciphertext pairs that V needs to know in order for the attack to succeed with high confidence is approximately c−2, where c ∈ R is usually small. Note that ((T) → 0) ⇒ (n → ∞) and n → 4c when (T) → ±1/2.

Differential Cryptanalysis

Differential cryptanalysis is very similar to the aforementioned procedure of , with the exception that one does not try to find a linear relation between the plaintext, ciphertext and key bits, but instead for a linear approximation on differences3 of the plaintext and ciphertext bits with key bits. That is, instead of the attacker V having pairs (x, y), he is now able to choose x, x0 ∈ P and compute a x0 y0 x0 0 y0 0 0 0 differential: the pair (∆x , ∆y ), where ∆x = x ⊕ x , ∆y = y ⊕ y , ek(x) = y and ek(x ) = y . For each of the possible keys k, V checks if the linear approximation between the differentials holds and if so, the counter of k is incremented. This process is very similar to the one of linear cryptanalysis, as the key with highest counter will be the one for which the bits of the linear relation are most likely correct.

2.2.1.2 DES and 3DES

In 1977 DES [8] was published as an official FIPS and although considered insecure nowadays, it had a major influence in modern cryptography. It was designed as a [9] (cryptosystem 2).

L R L R Cryptosystem 2 (Feistel cipher). Let S be the set of states such that si = (s , s ), where |s | = |s | L R and s k s = s, ∀s∈S, let KR be the set of round keys, g : S × KR → S the round function and f the function that (possibly) contains the non-linear operations of the block cipher. Then, the round function used in the encryption procedure is given by

L R R L R g((si−1, si−1), ki) = (si−1, si−1 ⊕ f(si−1, ki)) (2.8) where i = 1,...,N, for some number of rounds N ∈ N defined by the key schedule. Moreover, the round function for the decryption procedure is given by

−1 L R R L L g ((si , si ), ki) = (si ⊕ f(si , ki), si ) (2.9)

In 1992 and after having (re)discovered differential cryptanalysis, Shamir and Biham published the first theoretical attack on DES, although practically infeasible at the time due to its complexity. Later on,

3By difference, one means an operation, usually XOR.

8 a practical attack was indeed discovered using linear cryptanalysis and in the years that followed the complexity of the attacks on DES confirmed that the standard had become deprecated. The need for a modification on DES or the design of a new algorithm had become of utmost importance. The abrupt growth of computational capability at the time made it clear for researchers that the 56- bit of DES was really small for the demanding security of the algorithm. In order to increase its security against brute-force attacks and without changing its core procedure several parties came up with a straightforward solution (presented in cryptosystem 3): instead of using a single key and performing a block encryption, each block of plaintext is subject to 3 rounds of block encryption using three (possibly) distinct keys. Note that the main goal of the cryptographers at the time hinged in solving the problem without having to create a new algorithm, which would save time and money because there would be no need to replace all hardware mechanisms that had DES implemented.

Cryptosystem 3 (Triple DES). Let Ek and Dk be the encryption and decryption procedures for the DES algorithm using the 56-bit key k. The encryption and decryption procedures for Triple DES are given, respectively, by:

ek(x) = Ek3 (Ek2 (Ek1 (x))) (2.10)

dk(y) = Dk1 (Dk2 (Dk3 (y))) where x ∈ P, y ∈ C and k = (k1, k2, k3) is either a 168, 112 or 56-bit key, based on the keying option.

Three options were available for the keys:

1. (k1 = k2 = k3) ⇒ 56-bit key;

2. (k1 = k3 6= k2) ⇒ 112-bit key;

3. (k1 6= k2 ∧ k1 6= k3 ∧ k2 6= k3) ⇒ 168-bit key;

3DES is still considered to be secure due to the impracticability of the currently known linear crypt- analytic attacks that require an infeasible number of known plaintext-ciphertext pairs. However, the previous statement is only true for 168-bit keys as options 1 and 2 have been considered deprecated.

2.2.1.3 AES

Aiming to replace the encryption standard to cope with the modern-day demanding security, NIST decided to launch an invitation to tender for the new encryption standard named AES. Several proposals were submitted (21 to be precise) and after being subject to a thoroughly security analysis, each of the five finalists were considered to be secure. The choice of the Rijndael cipher [10] as the algorithm for the AES was based on its performance, versatility, simplicity and implementation details. In 2002, AES was admitted as the official encryption standard. It is a symmetric cryptosystem based on an iterated block cipher and unlike DES, the cryptosystem does not follow a Feistel network, but instead a SPN [11], which is briefly descripted in cryptosystem 4 and whose round function is illustrated in Figure 2.1.

+ l m Cryptosystem 4 (Substitution Permutation Network). Let l, m ∈ Z , let πS : {0, 1} → {0, 1} be an 1 n+1 j lm S-Box and πP : {1, . . . , lm} → {1, . . . , lm} be a P-Box. Consider K = {(k , . . . , k ): k ∈ {0, 1} },

9 where n is the number of rounds and P = C = {0, 1}lm. Let S be the set of states and consider two round functions: g : S × K → S which is given by:

i−1 i i−1 i i−1 i l i−1 i l g(s , k ) = πP (πS((s ⊕ k )|l) k πS(((s ⊕ k )|2l)| ... k πS((s ⊕ k )| )) (2.11) and the round function f : S × K → S, given by

i−1 i i+1 i−1 i i−1 i l i−1 i l i+1 f(s , k , k ) = (πS((s ⊕ k )|l) k πS(((s ⊕ k )|2l)| ... k πS((s ⊕ k )| )) ⊕ k (2.12)

The encryption procedure consists in applying the g function n − 1 times followed by the f function, where s0 = x.

Note that in the above cryptosystem the P-Box [7] is not applied in the last round thus allowing the algorithm to be used for decryption without appropriate modifications.

AES block size is of 128 bits and the standard specified three4 possible key sizes: 128, 192 and 256 bits. There is a trade-off on security and performance directly related to the size of the key, since the number of rounds of the algorithm varies according to the key length. For 128, 192 and 256-bit keys, the number of rounds is, respectively, 10, 12 and 14. Nevertheless, even for 128-bit keys, the currently known attacks and the foreseen computational capability in a near future lead to the conclusion that AES is secure as a block cipher, regardless of the key length chosen. That is the reason AES is impregnated in a large majority of modern-day cryptographic schemes or protocols that have the need to provide secrecy.

A high-level description of the algorithm’s main functions is going to be assembled followed by the algorithm’s pseudocode in algorithm 1.

High-level Description

In the 2001 FIPS publication of AES [12] some functions were introduced. The same names are herein being used and their informal definitions is presented:

• AddRoundKey: performs an XOR operation between the current state and the current round key;

• SubBytes: replaces each byte of the current state for its correspondence on a fixed lookup table;

• ShiftRows: shifts the bytes of each row of the state according to a (fixed for each row) permutation;

• MixColumns: multiplies each column of the state by a fixed polynomial p(x).

4The Rijndael cipher was more versatile in the subject, as it allowed more key and block sizes (multiples of 32 bits between 128 and 256 for both cases).

10 l bits

i−1 i−1 ... i−1 s s<1> s

i i i u<1> ··· u u

i i i k k<1> ··· k πS πS

i i i v<1> ··· v v

i πP in the indexes of v

i i i s<1> ··· s s

Figure 2.1: Encryption round of a SPN. Corresponds to the round function g from cryptosystem 4. It is used in all rounds except the last.

Algorithm 1 AES algorithm 1: procedure AES(K,M) . Encrypting x with K

2: state ← M;

3: (K1,...,KN+1) ← KeySchedule(K);

4: AddRoundKey(K1, state);

5: for r = 1 to N do

6: SubBytes(state, πS);

7: ShiftRows(state, πP );

8: if r ≤ N − 1 then

9: MixColumns(state);

10: end if

11: AddRoundKey(Kr, state);

12: end for

13: return c ← state;

14: end procedure

2.2.2 Block Cipher Modes of Operation

Block ciphers are very useful in modern cryptography, but they are only able to encrypt or decrypt one block of fixed size data. Block cipher modes of operation were created so that one is able to encrypt a piece of data of arbitrary length using a block cipher and the way these modes make use of the block cipher at hand is very relevant for the security of the cryptosystem, for one can induce flaws in the cryptosystem with a bad usage of the block cipher, even if the latter is considered to be secure against all known attacks. Throughout this section let B be an arbitrary block cipher with b-bit block size, let P be an m-bit

11 plaintext, C an m0-bit ciphertext and let n be the number of blocks of the message at hand. Moreover, consider the following notation:

B • Ek := B block cipher encryption function using key k;

B • Dk := B block cipher decryption function using key k;

th • pi := i b-bit block of the plaintext;

th • ci := i b-bit block of the ciphertext; n

• n xi := x1 k x2 k · · · k xn, where k is the concatenation operator. i=1

2.2.2.1 ECB

ECB mode is the most straightforward mode of operation for block ciphers. Its encryption and de- cryption functions are, respectively, the following:

n ECB B ek (P ) = n ci, where ci = Ek (pi), ∀i∈In i=1 n (2.13) ECB B dk (C) = n pi, where pi = Dk (ci), ∀i∈In i=1 where m = m0 is a multiple of the block cipher’s block size. For this reason padding is advised and it is discussed in section 2.2.2.7. A graphic representation of the encryption and decryption procedures for the ECB mode is presented in figures A.1a and A.1b, respectively.

2.2.2.2 CBC

Unlike ECB mode, CBC is a widely used mode of operation and is often suited to be used for au- thentication purposes due to its ripple effect. Its encryption and decryption procedures are as follow:

n CBC B ek (P,IV ) = n ci, where ci = Ek (ci−1 ⊕ pi), ∀i∈In i=1 n (2.14) CBC B dk (C,IV ) = n pi, where pi = ci−1 ⊕ Dk (pi), ∀i∈In i=1 and such that c0 = IV . Figure A.2 contains a graphic representation of both cases. Both encryption and decryption functions in (2.14) have an additional argument IV , which is a ran- dom5 b-bit initialization vector (IV) and whose role is to contribute to the XOR of the first iteration. By construction, one can easily observe that CBC mode encryption cannot be computed in parallel, since each iteration depends on the previous ciphertext; on the other hand though, the decryption mechanism can be parallelized, since each plaintext block pi can be obtained deterministically provided knowledge of the tuple (k, ci1, ci). Due to its sequentiality, CBC is extremely susceptible to errors in transmission,

5The predictability of the IV gives room for feasible attacks on the cryptosystem. It is further discussed in section 2.6.7.

12 mainly triggered by noise in the communication channel induced either by an adversary or by environ- mental conditions, as they propagate to every subsequent block.

2.2.2.3 CFB

CFB mode can be seen as a synchronous stream cipher [13]. Each plaintext block pi is encrypted by applying an XOR operation with a element ki, yielding the corresponding ciphertext block yi. Each element ki of the keystream k is generated according to

B ki = Ek (yi−1), ∀i∈N (2.15) and for the first block (i = 1) one has y0 = IV . This procedure is illustrated in Figure A.3a and can be interpreted as follows:

n CFB ek (P ) = n (pi ⊕ ki) i=1 n (2.16) CFB dk (C) = n (ci ⊕ ki) i=1 where ci = pi ⊕ ki for i = 1, . . . , n. As for the CFB decryption, it is important to note that the block B B cipher’s encryption Ek is used instead of the block cipher’s decryption Dk . Since the keystream generator function depends on the previous ciphertext block, this mode cannot be parallelized for encryption.

2.2.2.4 CTR

The original modes of operation published as FIPS in 1981 did not include CTR mode. Only in 2001 was it added as a standard mode of operation, the same year of the public announcement for the consideration of AES as an effective block cipher. The general idea behind this block cipher mode of operation (BCMO) is to handle a counter through- out the operations. A b-bit value IV is chosen as the initial counter value and thereafter every counter is computed based on the previous one. The block cipher is used to encrypt the counter block and use its output to perform an XOR with the plaintext block, yielding the ciphertext block. This procedure is illustrated in Figure A.4 and as one can easily observe, CTR mode can be seen as a synchronous stream cipher [13]. More formally,

n CTR B ek (P,IV ) = n (Ek (ti) ⊕ pi), where t1 = IV i=1 n (2.17) CTR B dk (C,IV ) = n (Ek (ti) ⊕ ci), where t1 = IV i=1

B th where Ek is the block cipher’s B encryption procedure, pi and ci are the i plaintext and ciphertext th block, respectively and ti is the i counter block such that

ti = ctr(ti−1) (2.18)

13 where ctr : {0, 1}b → {0, 1}b is the counter function. There are several possibilities for the behaviour of the aforementioned ctr function, but the NIST recommendation [14] goes for the Standard Incrementing Function, which is given by

m m ctr(x) = x|b−m k (x| + 1 mod 2 ); (2.19)

m where x is a b-bit word, x| represents the last m bits of x, x|b−m represents the first b − m bits of x and m ∈ N : m ≤ b is the counter length. In contrast with CBC mode, the IV herein used does not need to be random, it just needs to be unique for each encryption under the same key. In other words, CTR mode’s security lies in the uniqueness of the pair (ti, k) for all the encryptions performed. Therefore, there is an upper bound ul for the length (in bits) of a message to be given as input to the CTR encryption scheme, which is given by

m ul = b × 2 (2.20) because if the number of blocks exceeds the cardinality of the set of possible counters (n > 2m), then 6 t2m+i = ti, ∀i≤2m , which cannot happen, otherwise CTR’s security becomes compromised . CTR mode is very suitable to be used in situations where the time complexity of the encryption algorithm is of essence, as it can be fully parallelized. The only pre-processing needed for this mode is the computation of the counter blocks, which is done in O(n) time, where n is the length of the input data, because there are n/k blocks where k is the fixed block length and each increment is done in time O(1).

2.2.2.5 CCM

The block cipher modes of operation discussed so far provide secrecy to the data at hand. Notwith- standing, there are modes which apart from secrecy, also provide authentication. CCM is one of those modes and combines the CTR mode with the CBC-MAC mode, the former for secrecy purposes and the latter for authentication. CBC-MAC is very similar to CBC mode’s encryption (see Figure A.2a), with the exception that instead of the algorithm returning C, it returns only the last block cn, i.e.

CBC b CBC-MAC(x, k) = ek (x)| (2.21)

CCM [15] mode interleaves the authentication and confidentiality steps, taking as input a 3-tuple (N,H,P ) such that N is a nonce (number used only once) intended to be used as the IV by the CTR mode of operation, H is the header, which is data to be authenticated but not encrypted and P is the plaintext that is going to be subject not only to authentication but also to encryption. The algorithm has several pre-requisites, namely all the operations are done using the same key k and there is a formatting function that takes as input a 3-tuple (N,H,P ) as above and returns a sequence of bitstring blocks. There are some situations in which pieces of data may be of public knowledge and therefore there is no need to encrypt them, as one would only be increasing the memory and computational over-

6Section 2.6.7 further discusses this topic.

14 head. CCM provides a thorough solution to this potential problem, as it allows the authentication of non-encrypted data without extending the length of the ciphertext.

2.2.2.6 GCM

Galois Counter Mode (GCM) [16] is another mode of operation that comprehends both authentication and confidentiality. This combined mode makes use of an adapted version of CTR to encrypt the data and the integrity and authenticity is granted by the Galois mode of authentication. The latter is known as Galois Message Authentication Code (GMAC) and is based on a keyed-hash function which, even though lacking the title of cryptographic hash function (see section 2.3), is well suited for the job. With this mode it is imperative that the pair (v, k) is never reused for any given input data, where v is the IV and k is the key. The uniqueness requirement on the IVs is necessary to grant the system immunity to malleability by the authentication mechanism.

2.2.2.7 Padding

For the ECB and CBC modes, the length of the plaintext must be a multiple of the block size, thus one must7 pad the plaintext prior to encryption. There are several padding techniques used by distinct types of algorithms

Bit Padding

One of the most used padding techniques for BCMO is called bit padding and consists in appending a bit 1 to the end of the plaintext and filling the remaining r = n × b − (m + 1) bit fields with the bit value jmk 0 such that the length of the nth block is b, where n = + 1. b

PCKS7

PKCS7 padding [18] is another widely used padding technique and consists in checking how many remaining bytes are there to the end of the block (k = 8 × n − 8 × m) and pad the message with k bytes each valued k. Note that if m is already a multiple of the block size, the padding must be performed either way, because the recipient of the message is always expecting a padded message; thus, a new block must be added to the end of the plaintext. For this reason, this padding technique is bounded by the maximum value of 256 bytes, for it must not be used in block ciphers whose block size is greater than 256 bytes, i.e., 2048 bits.

7Although out of the scope of this text, there are methods to prevent the use of padding for ECB and CBC. These are named of methods [17] and allow the ciphertext to have exactly the same length as the plaintext, while increasing the complexity of the algorithm.

15 Usually, by padding a message, one gains the advantage to hide the true length of the plaintext. However, if the padding is not executed properly, some vulnerabilities may rise and an adversary may be able to successfully exploit them. Notwithstanding, when using a padding scheme, the plaintext may become vulnerable to a Padding Oracle Attack (POA), which is explained in section 2.6.5. Therefore, one should beware whenever applying a padding algorithm.

2.2.3 Asymmetric Cryptography

There are clearly some issues related with the symmetric cryptographic systems since all the users that can either encrypt or decrypt data must know the unique key k a priori. If it’s physically impossible for the users to share this information and if there is no to transmit the key then they can’t use a symmetric key cryptographic system in order to change confidential information. The concept of asymmetric cryptography emerged to successfully work around this problem.

A public-key cryptographic system is based on a key-pair (kpub, kpriv), where kpub stands for the public key and kpriv for the private key, the first being known publicly and the latter known only by the user. Herein, there is a slightly distinct mode of operation when compared to a symmetric cryptosystem, since the owner of the private key kpriv uses it only to decrypt any received information, which has been encrypted with his public key kpub and could have been sent by someone. For example, when

Alice wants to send Bob a message, she encrypts it with Bob’s public key Bpub and then he decrypts it with his private key Bpriv. Asymmetric cryptosystems rely upon the infeasibility of certain mathematical problems. The major shortcoming of asymmetric cryptographic systems is their high computational complexity, when compared with symmetric cryptosystems.

2.3 Cryptographic Hash Functions

A Hash function outputs a fixed-length message on any given input of arbitrary length. Therefore, it can be a very useful tool in modern cryptography, which has led many researchers to study their properties. For a given hash function to be considered as a cryptographic hash function it must satisfy the following properties:

(i) Efficiency: The computation of the hash value must be incredibly fast.

(ii) One-Way Function: It’s infeasible to invert.

(iii) : A small change in the input of the hash function produces a very distinct output.

(iv) Collision Resistance: It is very hard to find two distinct inputs with the same image.8

m 8Note that the complexity of a birthday attack, O(2 2 ) for an m-bit message digest, is an upper bound for the best collision resistance.

16 Several hashing algorithms were created and among the remarkably popular are Message Digest (MD) and Secure Hashing Algorithm (SHA). The most widely used hash function from the first family is MD5 and it was deprecated as soon as an attack was found successful in a considerable short time frame. As for the second family, the most widely used function is SHA-1 [19] and it is considered to be insecure due to a successful collision attack [20] published by Google in February of 2017. Several theoretical attacks had already been found and thus this cryptographic hash function was considered to be on the edge of failure, foreseeing that a collision would be found soon enough. Hence the creation of SHA-2 [19] was mandatory, a version with four variants (SHA-224, SHA-256, SHA-384 and SHA-512) that extends the set of possible hashes to a point where the present known collision attacks become infeasible. The most straightforward use one can give to a hash function is for integrity purposes, i.e., given a message m and a cryptographic hash function h, the computation of h(m) yields an n-bit digest that can be used to check the integrity of the message m. Due to the collision resistance property of cryptographic hash functions it is most likely that h(m) 6= h(m0) for any m 6= m0 and cryptographic hash function h. Thus, for any word y, if h(y) = h(m) one can consider with a high level of trust y to be equal to m. This capability of providing integrity to the messages at hand is specified in Example 2.3.1.

Example 2.3.1 (Integrity). Let h be a cryptographic hash function of public knowledge and consider that Alice sends Bob a message x along with its digest h(x). Bob receives the pair (m, d), where m is the message and d the message digest and wants to verify whether m is in fact x, the message that was sent by Alice. Thus he computes h(m) and accepts the message as valid if and only if h(m) = d.

The reader should not be mislead for the example above is only successful in an unreal situation. Since h is of public knowledge, any adversary capable of interfering with the communications would be able to change (m, d) for (m0, h(m0)), for some malicious message m0, and Bob would successfully conclude the integrity verification without noticing that the message had been tampered with. The use of cryptographic hash functions require a deep understanding of the involved components in order to satisfy the desired properties and strengthen the system at hand against vicious attacks. Most modern- day cryptographic algorithms make use of cryptographic hash functions, as their properties provide extremely advantageous behaviours to prevent eventual vulnerabilities.

2.3.1 SHA-256

A member of the SHA family, SHA-256 is able to generate a 256-bit message digest of any message with binary length b satisfying 0 ≤ b < 264, for the padding scheme associated to the algorithm’s con- struction requires b to be written as a 64-bit number. Following, a high level description of the steps of the SHA-256 algorithm [19] is described, for an arbitrary b-bit input message x :

1. Pad and parse x into x1, . . . , xn;

2. Initialize i = 0 and zero-valued 32-bit hash values h1, . . . , h8;

3. Build the message schedule mi based on xi;

17 4. Build working variables {vk}k∈I8 , each based on the value of hk;

5. Update the values of global variables using mi and vk ∀k ∈ I8;

6. Update the hash values hk ∀j∈I8 , using the values of the variables obtained in the previous step;

7. Compute i = i + 1 and if i < n then go to step 3, otherwise return the value h = h1 k · · · k h8.

2.3.2 HMAC

The use of cryptographic hash functions has been associated with authentication purposes since the first definition of a hash-based message authentication code (HMAC) [21] in 1997.

Definition 2.3.1. Let m be a message, k an l-bit key and h a cryptographic hash function whose com- pression function’s block size is of n-bits. The following function f defines the HMAC-h:

f(k, m) = h((k0 ⊕ opad) k h((k0 ⊕ ipad) k m)) (2.22) where ipad and opad are fixed strings and k0 is the resulting key such that for j = n − l:

 j k k 0 if l < n  0  k = h(k) if l > n (2.23)   k otherwise

HMAC grants both integrity and authenticity to the input data, but while the first follows trivially from the use of a cryptographic hash function, the latter requires the key k to be shared solely between the two involved parties. Clearly, if there are more than two parties with access to k, the recipient of the messages will never be able to authenticate the sender.

2.4 Randomness

This section introduces the randomness concept and some useful definitions regarding this topic. Randomness is a desired property for several algorithms, as it is a measurement of uniqueness and unpredictability, very suitable for solving various problems in the field of cryptography. The higher known level of randomness is theoretical, the next best thing however is extracted from physical elements, for instance the movement of electrons. Throughout the years, researchers have been trying to develop software algorithms that behave like a true random generator, but to no avail: true randomness is a prop- erty yet out of reach by modern algorithms. This entails the well-known fact that hardware randomness is better than software’s.

Definition 2.4.1. A random number generator (RNG) is an algorithm that generates an unpredictable9 sequence of values, i.e., if one uses a RNG to generate a sequence a1 ··· an for ai ∈ Σ, then a third 1 party cannot guess a with probability non-negligibly greater than . i |Σ|

9Infeasible to be computed by a polynomial time algorithm.

18 When the hardware at hand lacks a RNG and one needs to implement a random behaviour in a certain algorithm, the only solution is to implement a RNG based on the entropy generated by software available features. The problem is that there is no such mechanism providing true randomness: in com- puter programming, the random number generators are pseudo-random number generators (PRNGs) since the stream of values produced by these algorithms is only seemingly non-deterministic for the whole process requires an input value, called seed, which makes the algorithm deterministic. So, when- ever using a PRNG it is demanding that an adversary cannot feasibly obtain the used seed which means that not all PRNG are suitable to be used in cryptographic algorithms. In order to make use of a PRNG for cryptographic primitives, it must satisfy two very important properties:

1. Given an initial state of a sequence of numbers generated by the PRNG, say the first k bits of the sequence, it is infeasible to compute the (k + 1)th bit with probability of success non-negligibly greater than 1/2 (see next-bit test [22]).

2. It is infeasible to reconstruct the stream of numbers generated by the PRNG based on a known internal state of the algorithm.

A PRNG satisfying the above properties and therefore suitable for cryptographic applications is named a cryptographically secure pseudo-random number generator (CSPRNG). It is, however, extremely dif- ficult to find a CSPRNG, since most PRNG are either vulnerable to extended personalized statistical attacks or leak information upon the unveiling of some internal state. Many cryptographic algorithms are very sensitive with respect to predictability, meaning that a CSPRNG is usually used in steps where randomness is of essence, as for instance the generation of cryptographic keys or salts. There is yet another definition [23] that needs to be addressed in order for the reader to efficiently understand the concepts descripted in section 2.4.1.

Definition 2.4.2 (PRF). A family of functions {Fk : X → Y }k∈{0,1}∗ is a pseudo-random function (PRF) if, for a randomly chosen instance function Fk, its output is indistinguishable (for a polynomial-time algorithm) from the output of a random function R : X → Y , where X and Y are the domain and range sets of the functions of the family, respectively.

PRFs are applicable in a wide variety of solutions as their properties are eximious.

2.4.1 Key Derivation

Regardless of the level of security of an underlying cryptographic algorithm, if one is able to obtain the secret key used in the process then it becomes unreliable, with the danger of compromising all the data that has been and/or is to be processed by it. In fact, it is possible for a cryptographic key associated with a cryptosystem to be compromised without compromising any of the prior messages encrypted under that cryptosystem. These systems are said to provide forward secrecy. Nevertheless, one wants to always prevent adversaries from discovering the envisaged keys. Throughout the years researchers have been using CSPRNGs to create and refine algorithms for the generation of stronger cryptographic keys. These methods are called key derivation functions (KDFs)

19 and output an enhanced key for a given input secret. The increased resistance to attacks of the resulting cryptographic keys makes them suitable for most real-world applications.

In 2000, RSA Laboratories published a specification [24] in which the PBKDF2 key derivation function was introduced; it became quite popular and one of the most widely used nowadays. The password- based key derivation function 2 (PBKDF2) takes as input five parameters:

PBKDF2(PRF, pass, salt, iter, len) (2.24) where PRF is a pseudo-random function, pass and salt are octet strings such that the former is the secret password and the latter the cryptographic salt, both to be used in the inherent PRF; iter is an integer value corresponding to the number of iterations of the PRF and len is the length, in octets, of the envisaged output key. The number of iterations is directly related with the level of security of the procedure. The steps that describe the PBKDF2 algorithm can be found more specifically in [25].

There are several known KDFs but, among the most secure of its kind, PBKDF2 is considered to be the better suited for using in real-world applications for it is the one with better performance [26].

2.5 Communication Protocols in Wireless Networks

Institute of Electrical and Electronic Engineers Standards Association (IEEESA) is an association that develops standards for several technological fields, namely telecommunication and information technol- ogy. They have been developing standards for over ten decades and among the published works is a family of network protocols for parties trying to connect to a local area network (LAN) or wireless local area network (WLAN), specified by the set S = {IEEE 802.1X : X is a unique identifier for the standard}.

The most relevant elements of S are going to be discussed, as one of them (WPA2) is considered the most suitable protocol for wireless communication and is used nowadays throughout the world to provide indirect access to the Internet to either personal or corporate devices without a cable connection.

2.5.1 WEP

The standard IEE 802.11 ∈ S [27] contains the description of WEP [28], an algorithm to provide data secrecy and integrity to wireless networks such that the level of security granted would be equivalent to the level of security of a wired network. WEP was proved insecure mainly due to the IV space being so small for busy networks, since k is usually fixed in practice10 (recall that a stream cipher is vulnerable against key reuse attacks). After the proof regarding WEP’s security break being publicly published automated tools have been developed in order to recover the key used in the algorithm and nowadays a WEP encryption can be broken in less than a minute.

10A personal computer is usually connected to a router acting as an AP and the password for the router is fixed, unless the user changes it manually.

20 2.5.2 WPA/WPA2

IEEE 802.11i [29], an amendment for IEEE 802.11, was put into effect due to the WEP’s exploitable flaws. The standard includes two new security protocols for communicating over a wireless channel: WPA and WPA2, which were intended to replace WEP. Both provide authentication either by a PSK or by an EAP, the latter requiring an authentication server. WPA’s encryption process differs from WEP’s such that the former does not suffer from the same fragilities as its predecessor. Nevertheless, it has been deprecated in 2012, due to its vulnerability to a message integrity code (MIC) recovery attack [30]. This specification was mainly created as a preventive measure for hardware mechanisms that could not support WPA2, the most recent version of WPA, which includes a CCM-AES-based encryption mode named CCMP [29] as a replacement for the TKIP [27] encryption and grants data confidentiality, authentication and access control. WPA2 communication protocol is composed of three main stages:

1. Initial authentication;

2. 4-way handshake;

3. Group-key handshake;

There is an entity called Authenticator whose role is to authenticate the parties that intend to join the network. If WPA2-PSK is in effect then it solely communicates with the client, but for EAP mode it acts as an intermediate point between each supplicant and the authentication server. This is the general layout of wireless networks that make use of the WPA2 communication protocol and it is illustrated in Figure 2.2.

Client Authenticator Server

Figure 2.2: Network Layout.

As abovementioned, there are two distinct modes for the WPA2 protocol: WPA2-PSK and WPA2- EAP. These two modes only differ in the first stage of the protocol, the initial authentication step, whose objective is to derive the PMK, a mid-step key that is used in the 4-way handshake to derive the PTK and GTK. These two unique-per-session keys contain sub-keys that are necessary for encrypting and decrypting the data flow between the client and authenticator. The group-key handshake is used for updating the GTK and such that the authenticator can securely distribute it over all the authenticated clients in the network; this key is used by the clients to decrypt multicast or broadcast data sent by the authenticator.

2.5.2.1 Initial Authentication

The PSK mode of the WPA2 protocol has the advantage of being faster than EAP because it does not need to go through an initial authentication step. In fact, the authenticator has a pre-defined password pass and a SSID ssid (usually the name of the network) and computes PBKDF2(HMAC-SHA-1, pass, ssid, 4096)

21 Client Authenticator Server Request Response

Authentication protocol between Client and Server (Client and Authenticator derive PMK)

Accept Client Confirm acceptance

Figure 2.3: Extensible Authentication Protocol (EAP). in order to build the PSK. The client must follow the same procedure but in order to do so he must pos- sess the (private) pass and the (public) ssid. This option is usually chosen in personal networks where each client trusts any other client that is able to successfully authenticate itself and connect to the net- work. On the other hand, there may be situations where each client do not completely trust in other clients that may connect to the network. Consider, for instance, the case of a corporate wireless network such that there is a router authenticating several employees who dislike each other; there must be a way to prevent each and every one of them to tamper with the data that is not intended for themselves. Simply illustrated in Figure 2.3, EAP mode must be used for those situations, as its initial authentication step provides pairwise authentication by deriving a PMK for each client. Independently of the chosen mode for WPA2, at the end of the initial authentication step both the client and the authenticator possess the PMK, which is the PSK for the case of WPA2-PSK.

2.5.2.2 4-way Handshake

After the initial authentication step, the authenticator will confirm that the client possesses the correct PMK by asking for the decryption of certain data. Moreover, the GTK is also transmitted to the client. The 4-way handshake is depicted in Figure 2.4 and comprises the following steps:

1. Client and Authenticator each generate a nonce nonce1 and nonce2, respectively;

2. Authenticator sends nonce2 to Client;

3. Client derives the PTK such that

PTK =PRF(gen)

and (2.25)

1 2 gen = PMK k nonce1 k nonce2 k MACADDRESS k MACADDRESS

1 2 where PRF is a pseudo-random function and MACADDRESS and MACADDRESS are the MAC addresses of the client and authenticator, respectively.

4. Client sends nonce1 and a MIC of that nonce to Authenticator.

22 Client Authenticator

Generate nonce1 Generate nonce2 nonce2

Derive PTK (nonce1, MIC)

Derive PTK and generate GTK

(eGTK, mGTK)

Decrypt eGTK Acknowledgement

Figure 2.4: WPA2 four-way handshake.

5. Authenticator derives the PTK and generates the GTK.

6. Authenticator encrypts the GTK with PTK (eGTK) and computes a MIC of the encrypted GTK (mGTK)

and sends the pair (eGTK, mGTK) to Client.

7. Client decrypts eGTK and sends an acknowledgement to Authenticator consisting of a MIC of the decrypted GTK.

After performing the four-way handshake, both client and authenticator have the PTK and GTK with- out ever disclosing these two keys and each of them knows that the other party also possesses the same keys.

2.5.2.3 Group-key Handshake

The GTK needs to be updated every time a client disconnects from the AP (authenticator) or upon the expiry of a timer as a security measure. The group-key handshake is a two-way handshake as depicted in Figure 2.5 and comprises of the following steps:

1. Authenticator updates GTK;

2. Authenticator encrypts GTK with PTK (eGTK) and generates a MIC (mGTK);

3. Authenticator sends the pair (eGTK, mGTK) to Client;

4. Client decrypts eGTK and sends an acknowledgement message to Authenticator consisting of a MIC of the decrypted GTK.

Client Authenticator

Encrypt GTK and generate MIC

(eGTK, mGTK)

Verify MIC and decrypt GTK Acknowledgement reply

Figure 2.5: WPA2 group-key handshake.

23 2.6 Known Attacks

Cryptographic systems are as trustworthy as their robustness to attacks, which means that a cryp- tosystem that has survived to a countless number of distinct attacks is considered to be reliable for practical use, while systems that have not suffered such tests do not transmit such confidence.

2.6.1 Brute Force and Dictionary Attacks

Given a certain cryptographic algorithm, a brute force attack consists in trying all possible inputs to the algorithm and checking whether each input leads to a desired output. For example, a hacker might try to break into a third party’s personal computer by trying all possible passwords, each at a time. A dictionary attack is no more than a brute force attack which narrows the space of possible words by only considering specific words, or subsets of those words, based on some alphabet. Dictionary attacks have proven to be deadly for some cryptographic systems, especially for password-hacking purposes. This is one of the main threats to passwords that are dictionary-based.

Countermeasures

In order to prevent brute force attacks, the number of words that can be written with alphabet Σ must be as high as possible without compromising the computational capability of the system at hand. As for dictionary attacks, not only the previous requirement must be met, but also the passwords or secret keys must be long enough and as much randomized as possible.

2.6.2 Man In The Middle Attack

In a Man-in-the-middle (MiM) the adversary is able not only to eavesdrop the communications, but also to actively participate in the exchange of messages, in such a way that the other parties are not aware of the adversary’s true identity.

Example 2.6.1. Consider the following case scenario represented in Figure 2.6: Alice and Bob are communicating through a non secure channel C, which is being eavesdropped by Eve, a third party for which none of the messages sent in the channel are directed for. Since C is not secure, Eve can listen to the communication and may be able to impersonate either Alice or Bob (or even both of them, in a worst case scenario). Alice may send a message M intended for Bob, but Eve is able to intercept the message M, change it for a malicious message M 0 and send M 0 to Bob, who has no idea that the original message was M instead of M 0.

A MiM attack can even be performed by an adversary which does not gain any additional information on the ciphertext and whose solely purpose is to disrupt all the communications by jamming the chan- nel(s) with junk data and therefore preventing the reception of any message by the targeted end user. Nevertheless, most MiM attackers intend to extract information by acting as the other end party for each of the communication entities.

24 0 M M

M M 0 Alice Eve Bob

Figure 2.6: Man in the middle attack. Eve is able to intercept the message and/or jam the communication channel at will.

Countermeasures

It is very hard to detect every intrusion in a wireless network, especially one where the elements of the network are under restrictions of power consumption and activity extent. The most common procedure of preventing an active MiM attack is to always verify each received message’s integrity and authentication. Even though there are intrusion detection systems for alerting such undesired interference in wireless networks and methods to grant extra layers of security such as VPN connections, their consideration is out of the scope of this text for their usage is not suitable to the discussed problem due to constraints imposed to some parties.

2.6.3 Birthday Attack

Categorized into the set of collision attacks, a birthday attack is no more than a brute force attack where the attacker has some useful probabilistic insight that reduces the set of possible outputs for the same bit security, making it more efficient than a simple brute force.

Problem 2.6.1 (Birthday Problem). Given a room with n people, what is the probability that k of those people have the same birthday?

Let Pk(n, d) be the value of the probability that holds the answer to the Birthday Problem, where d is the number of possible values for each element, i.e., d = 365 for this specific case. Birthday attacks for cryptographic hash functions are based on Problem 2.6.1 with k = 2. In fact, the probability that any two out of n people have the same birthday is given by 365! P (n, 365) = 1 − 2 (365 − n)! × 365n which follows trivially from a probabilistic analysis of Problem 2.6.1.

Consider an arbitrary hash function h : X → Y , such that ∀y∈Y : |y|2 = b, where |y|2 is the number of bits of y. One can adapt the Birthday Problem to ask the following question: ”Providing n randomly chosen inputs {x1, . . . , xn} =: Xn such that xi ∈ X for all 1 ≤ i ≤ n, what is the probability that h(xi1 ) = h(xi2 ), for some xi1 , xi2 ∈ Xn?”. The answer to the previous question consists simply in the b b −n2/2b+1 value of P2(n, 2 ) and from [7, 31] one can conclude that P2(n, 2 ) ≈ 1 − e , which is no more than the probability of finding a collision for a cryptographic hash function whose outputs are b-bit words. b b/2 Let mb be the expected number of distinct outputs of h such that P2(mb, 2 ) ≥ 0.5. Then mb = 2 represents a lower-bound to the number of outputs of h to be computed such that a collision is expected to occur and is usually referred to as birthday bound.

25 This probabilistic approach entails a reduction of every cryptographic hash function’s bit security and in order to prevent these types of attacks one has to ensure that it is computationally infeasible to compute 2b/2 distinct outputs, for hash functions that return b-bit digests.

2.6.4 Replay Attack

A replay attack is a special case of a MiM attack. Here an adversary is able to gain additional information by eavesdropping and saving a transmitted message or part of a message, either in another protocol, or in another run of the same communication protocol which was eavesdropped. This attack is based on re-transmission of data.

Countermeasures

There are several possible procedures to prevent a system of being vulnerable to a replay attack. In general, for a cryptographic system to be resistant to this type of attack, one must assure each communication session to be uniquely identified, what can be achieved by granting each message a session identifier. Another method for preventing replay attacks consists in timestamps. Suppose that Bob has a clock from which he periodically broadcasts its real value time t, together with a message authentication code (MAC), for authentication purposes. Whenever Alice wants to send Bob a message x, she encrypts x 0 into y with some cipher and then generates a guess timestampt0 of Bob’s real current time t , based on t, which is then authenticated with a MAC. Upon receiving the whole package at time t00, Bob only accepts 00 the message for further checking its authentication and integrity if t − timestampt0 < , for some  > 0. 00 Note that if Eve wants to replay a message she is able to do it for as long as t − timestampt0 < , i.e., this procedure is not completely reliable for replay attack protection, given that if either  is not small enough or if an attacker can replay quickly enough regardless of epsilon’s value, then the cryptographic scheme is compromised.

2.6.5 Padding Oracle Attack

To perform a padding oracle attack one must have access to a padding oracle11 O. Among the BCMO discussed in chapter 2.2.2, the modes that require padding (ECB and CBC) are vulnerable to a padding oracle attack, but while this attack does not completely break ECB, for CBC it’s lethal due to its encryption and decryption mechanisms. Nevertheless, note that if padding is used in either CFB, OFB or CTR, then these encryption schemes also become vulnerable to this type of attack as well, provided that no authentication layer is associated with it. A padding oracle attack for the CBC encryption is going to be exemplified and discussed throughout this section. Consider the following situation:

• Alice (A) and Bob (B) want to communicate through a communication channel and share a com- mon secret s, with which they decide to use as key for a block cipher of their choice whose block

11A padding oracle is a system that binarily answers to the question: ”Is this message properly padded?”.

26 size is b together with CBC mode to provide secrecy to their messages.

• A and B agree on padding the messages with padding as in [32].

• Victor (V), an adversary, has access to the communication channel and is able to perform a MiM attack. He has also access to a padding oracle that answers whether an encrypted message y is correctly padded.

Consider that A encrypts a message x and sends the encrypted result y along with the IV through the communication channel , which is intercepted by V. Assume w.l.o.g. that n = 2, i.e., y = y1 k y2, where

b

yi = n yij, ∀i∈I2 j=1 (2.26) and

O(y) = true.

Recall CBC decryption: n CBC ds (C) = n Pi i=1 b where Pi = ds(Ci) ⊕ Ci−1 and C0 = IV . ∗ V now decides to change the last byte of C1 in the following way: C1b = C1b ⊕ zb ⊕ 0x01, where zb is a guess for the last byte of P2 and 0xh is the hexadecimal representation of a byte such that ∗ ∗ ∗ ∗ h ∈ {00, 01,...,FE,FF }. After having replaced C1b for C1b, V has C = C1 k C2, where C1 = C11 k ∗ ∗ ... k C1(b−1) k C1b. Then V makes use of the oracle in order to know if C is properly padded by calling O(C∗). Note that b ∗ ∗ ∗ ds(C1 ) ⊕ IV = P11 k ... k P1b 6= P1 (2.27) due to the avalanche effect of the block cipher; on the other hand,

b ∗ ds(C2) ⊕ C1 = P21 k ... k P2(b−1) k P2b (2.28) meaning that only that last byte of P2 is changed while the remaining b − 1 bytes have not been altered. b ∗ Let db correspond to the last byte of ds(C2). Since P2b = C1b ⊕ zb ⊕ 0x01 ⊕ db and C1b ⊕ db = P2b then ∗ P2b = P2b ⊕ zb ⊕ 0x01 by the associativity property of the XOR operation. At this point, there are two possible cases:

1. P was not padded prior to encryption.

2. P was padded prior to encryption.

In case 1, the conclusion is straightforward:

∗ (a) (zb = P2b) ⇒ (P2b = 0x01), which corresponds to correct padding for PKCS7 and in this case ∗ O(C ) = true. This means that V has found the last byte of P2.

27 ∗ ∗ 12 (b) (zb 6= P2b) ⇒ (P2b 6= 0x01), meaning that O(C ) = false, so V chooses a fresh guess zb and repeats the process.

∗ Thus O(C ) = true if and only if zb = P2b. ∗ 0 In case 2, situation (b) is still valid, but O(C ) = true 6⇒ zb = P2b, because now P2 = x1 k...kxr kx1 k 0 0 0 ... k xk where r + k = b, k ∈ (Z256 \{0}) and xi represent the padded bytes such that xi = k, ∀i∈Ik . For k > 1 there are two possible values of zb that entail an acceptance of the modified word by the oracle, which are:

∗ (2a) (zb = P2b) ⇒ (P2b = 0x01);

∗ (2b) (zb = t) ⇒ (P2b = k), for some t ∈ (Z256 \{P2b});

Therefore, in order for V to differentiate which of these two acceptances is the true last byte of P2,V modifies the second to last byte of C1 by flipping a positive arbitrary number of bits, which will definitely yield a distinct value than before. After doing so, V runs the previous procedure for the iterated guess zb ∗ until he finds zb ∈ Z256 such that O(C ) = true, in which case V is sure to have found the value of P2b.

After discovering the last byte P2b, V proceeds in trying to find the next (second to last) byte of P2, i.e., P2(b−1). The same arguments can be applied to this situation, as follows:

∗ C1b = C1b ⊕ P2b ⊕ 0x02 (2.29) ∗ C1(b−1) = C1(b−1) ⊕ z(b−1) ⊕ 0x02

∗ ∗ ∗ ∗ thus C1 = C11 k ... k C1(b−2) k C1(b−1) k C1b and V keeps making calls to the oracle O(C ) until it accepts the input, in which case V has found the second to last byte of P2. CBC ∗ ∗ ∗ ∗ ∗ Note that in this case ds (C ) = P1 k P2 where P1 6= P1 due to the avalanche effect but P2 = ∗ 0 0 P21 k ... k P2(b−2) k P2(b−1) k P2b, where P2b = 0x02 is fixed independently of the guess z(b−1) and on the ∗ b other hand P2(b−1) = C1(b−1) ⊕ z(b−1) ⊕ 0x02 ⊕ d(b−1), where d(b−1) is the second to last byte of ds(C2). ∗ Again, the same arguments 1 and 2 can applied to P2(b−1) and the attacker only needs 511 attempts in a worst case scenario in order to find the correct byte P2(b−1). Algorithm 2 contains the pseudocode for the whole procedure and one can easily see that V is able to recover the whole plaintext P in time O(mb), where m is the size of the ciphertext and b the block size. Since these values are usually not large, this algorithm runs in efficient time. Lastly, note that the algorithm does not take into account case 2 but one can easily adapt it for this situation.

2.6.6 Stream Cipher Attacks

The underlying security of stream ciphers is based on their good usage, as an adversary can take advantage if certain precautions are not taken. In general, stream ciphers are considered to be very secure, provided that one does not reuse the key and run an authenticity check on every encrypted message.

12By fresh, one means a value in the set of possible values that has not yet been chosen.

28 Algorithm 2 Padding Oracle Attack on CBC encryption 1: procedure POA(C, O) . Discovering P without s

2: Initialize zero valued array P with length nb bytes;

3: for i = n to 1 do

4: x ← 1;

5: for j = b to 1 do

6: z ← 0;

7: A ← false;

8: for k = j + 1 to b do

9: Cik ← Cik ⊕ Pik ⊕ x;

10: end for

11: while A == false do

12: z ← z + 1;

13: Cij ← Cij ⊕ z ⊕ x;

14: A ← Ask oracle O if C is properly padded;

15: end while

16: Pij ← z;

17: x ← x + 1;

18: end for

19: end for

20: end procedure

Let S be a stream cipher with encryption and decryption functions ek and dk, for a given key k ∈ K and assume that Eve is an adversary.

2.6.6.1 Key Reuse

Suppose that Eve is able to perform a MiM attack and let m1 and m2 be two messages such that m1 6= m2 and assume w.l.o.g. that |m1| = |m2| = l. Since S is a stream cipher, there is a keystream 0 generator function g which produced the keystream k1k2 ··· based on some internal state. Let k be the 0 0 substring k1 . . . kn, n ≥ 1 such that y1 = ek(m1) = m1 ⊕k and y2 = ek(m2) = m2 ⊕k . Upon intercepting y1 and y2, Eve is able to compute y1 ⊕ y2 = m1 ⊕ m2 due to the commutative and self-inverse properties of the XOR operator.

Statistical analysis can now be applied to recover m1 and m2 with high degree of confidence. Let

Σ be the alphabet at hand, let m3 := m1 ⊕ m2 = m31 . . . m3l such that m3i ∈ Σ ∀1≤i≤l, let Xi be a random variable representing the value of the ith element of an arbitrary plaintext x and consider

P (Xi = m3i) = pi, ∀i∈{1,...,l}. The set

Ci = {(a, b): a ⊕ b = m3i ∧ a, b ∈ Σ} (2.30)

th contains the (possibly many) pairs whose XOR yields the intended i element of m3. Assuming that the

29 probability distribution of the alphabet elements is not homogeneous, a simple approach is to choose the pair (a, b) ∈ Ci that satisfies

max P (Xi = a)P (Xi = b) (2.31) (a,b)∈Ci By applying this method, Eve is able to recover the plaintexts with high confidence without knowing the secret key used in the stream cipher’s encryption procedure. More complex, probabilistic relations may be required to increase the confidence degree on the choosing of the pairs (a, b) that satisfy equa- tion 2.31, for all 1 ≤ i ≤ l.

Countermeasures

The only countermeasure for this situation is to never use a key more than once. For ciphers that include an IV as part of their input, the pair (IV, k) can be seen as the general key to the cryptosystem and the key k may be used more than once, as long as the initialization vector IV does not repeat, which is done in practice by randomly choosing an IV out of the set of possible IVs. Given the generally high cardinality of the latter, CSPRNGs are the most common choice in order to maximize the underlying algorithm’s performance.

2.6.6.2 Bit-flipping

Suppose that Alice and Bob communicate through a communication channel C on which Eve is able to perform a MiM attack. Moreover, assume that Alice wants to send Bob a message m such that

[m]2 = m1 . . . mn and that ∃i,j∈N : 1 ≤ i ≤ j ≤ n for which Eve knows mi . . . mj. Alice encrypts m using S encryption and sends it to Bob through C and Eve upon intercepting the message y = m ⊕ k0, where k0 is the most significant n-bit substring of the resulting keystream produced by the keystream generator function, computes

0 y := y ⊕ (mE ⊕ v) (2.32)

i−1 n−j i−1 n−j where [mE]2 = 0 mi . . . mj0 and [v]2 = 0 vi . . . vj0 , such that the bitstring vi . . . vj is an evil bitstring chosen by Eve. After performing the operation in (2.32) she sends the resulting message through C to Bob, who upon receiving y0 decrypts it as follows

0 0 0 dk(y ) =k ⊕ y

0 0 =k ⊕ (m ⊕ k ) ⊕ (mE ⊕ v) (2.33) =(m ⊕ mE) ⊕ v

=m ⊕ v

j−i+1 where [m]2 = m1 . . . mi−10 mj+1 . . . mn, thus

[m ⊕ v]2 = m1 . . . mi−1vj . . . vimj+1 . . . mn (2.34)

Note that Eve does not know the secret key k shared between Alice and Bob but the knowledge of bits of the message m being sent makes it possible to alter them at will, without Bob noticing.

30 Countermeasures

At first glance one might think that an integrity check would suffice to prevent this attack, but since the assumption of Eve knowing bits of the message being sent holds (possibly the whole message), then considering to produce a simple digest on the message is not enough, for if she has knowledge on the entire message m she can easily compute h(m) for any cryptographic hash function h. Therefore an authentication tag is needed in this situation given that Eve will not be able to silently tamper the message without leading to the mismatch of the tag and the corrupted message.

2.6.7 Weaknesses of Block Cipher Modes of Operation

Even if a block cipher is considered to be secure and there is seemingly no way to break the cryp- tosystem by an analysis on the cipher itself, one can make use of that block cipher repeatedly in order to encrypt or decrypt messages of size larger than the input block size and do so in such a way that compromises the security of the whole system. This subsection discusses the advantages and disad- vantages of some of the BCMOs.

ECB

ECB mode is considered to be the less secure block cipher mode of operation, as it is not semanti- cally secure. An adversary can indeed gain information of the plaintext based solely on the ciphertext since the a given plaintext block is always encrypted to the same ciphertext block.

CBC

Recall the CBC block encryption function present in Figure A.2 and according to equation 2.14 for an n-block message p = p1 . . . pn. Allowing the IV to be predicted by an adversary gives room to a feasible chosen-plaintext attack on the cryptosystem at hand, where the adversary can efficiently recover any previously sent message. Assume that Alice and Bob communicate through a channel C and that Eve is an adversary eaves- CBC dropping C with access to an encryption oracle O, such that O(p) = ek (p) for any b-bit block p. Moreover, the oracle has an intrinsic IV generator function (equal to Alice’s) that produces a random new initialization vector used in each call. Now, Alice intends to send a word m = m1 . . . mn to Bob and in order to do so, she computes an initialization vector IV1 and encrypts m as in equation 2.14, yielding the encrypted message y. Then she sends the pair (y, IV ) through C such that Bob is able to decrypt the message. Consider that Eve is able to predict the next IV used by Alice (therefore by the oracle as well); upon intercepting (y, IV ), she can recover m according to Algorithm 3 by applying n calls to procedure PREDICT(yi, yi−1), where y = y1y2 . . . yn, |yi|2 = b ∀i∈N : 1 ≤ i ≤ n, pictured below:

1: procedure PREDICT(yi, yi−1) . yi, yi−1 are b-bit values

2: Initialize a b-bit value y0;

3: Predict the initialization vector used in the next encryption: IVp;

31 0 4: while y 6= yi do

5: Guess b-bit value m0; 0 6: Compute M := yi−1 ⊕ IVp ⊕ m ;

7: Call oracle: y0 ← O(M)

8: end while

9: return m0;

10: end procedure The capability of Eve to predict the IV successfully is the key to the feasibility of the attack. Indeed, suppose that it’s highly probable tht Eve is not able to correctly predict the IV, for her guess IVp is such that P (IVp 6= IVnew) → 1, where IVnew is the new random IV generated by the oracle O. Note that for B 0 any b-bit block x, the query O(x) returns Ek (IVnew ⊕ x). By taking a guess m , Eve wants to compute 0 0 M such that m ⊕ IVold = IVp ⊕ M ⇒ M = m ⊕ IVold ⊕ IVp, where IVold is either the value of the IV of the pair (y, IV ) for when Eve is trying to find the first plaintext block, or the value of yi−1 for when Eve is trying to find the ith plaintext block. Then, by calling the oracle with input M, the following holds:

B O(M) =Ek (IVnew ⊕ M) (2.35) B 0 =Ek (IVnew ⊕ m ⊕ IVold ⊕ IVp)

b however, since [IVnew ⊕IVp]2 6= 0 , even if equation 2.35 yields the same result of the intended ciphertext 0 block yi one cannot conclude that m is the original plaintext block mi, because it only implies that 0 IVnew ⊕ m ⊕ IVp = mi. On the other hand, if Eve is able to correctly predict the IV used in the next encryption, then

B O(M) =Ek (IVnew ⊕ M)

B 0 =Ek (IVnew ⊕ m ⊕ IVold ⊕ IVp) (2.36)

B 0 =Ek (m ⊕ IVold)

B 0 0 where the last equality holds because IVp = IVnew. Lastly, note that Ek (m ⊕ IVold) = yi ⇒ m = mi.

Algorithm 3 CBC Predictable IV attack CBC 1: procedure PREDICTATTACK(y, IV ) . Discovering x : ek (x, IV ) = y

2: Split y into n blocks y1, . . . , yn;

3: Initialize empty bitstring x;

4: for i = 1 to n do

5: a ← PREDICT(yi, yi−1); . y0 = IV

6: x ← x k a;

7: end for

8: return x;

9: end procedure

Apart from IV predictability, CBC mode is also susceptible to POA: provided the absence of cipher- text stealing methods, every message must be padded prior to encryption. Algorithm 2 describes the procedure for attacking CBC given a padding oracle.

32 CTR

th Let ti, pi and ci be the i counter block, plaintext block and ciphertext block, respectively. Due to

CTR’s construction, changing the last byte of ci results in changing only that last byte of pi and the same attack using a padding oracle for CBC can be herein applied. Let x be an m-bit message and y an n-bit message, with n < m. Then the following holds:

x ⊕ y = x|n ⊕ y where x|n represents x truncated to its first n bits. This observation makes it clear that there is no need for padding messages that are encrypted via CTR and the resulting ciphertext will have exactly the same length as the original plaintext, as the XOR operation is performed bitwise.

As already mentioned in section 2.2.2.4, the pair (ti, k) needs to be unique for all i ∈ N, otherwise CTR mode’s security is compromised. Consider the following scenario: using CTR mode with an arbi- trary block cipher B and standard incrementing function, Alice encrypts two messages p and m (of the same length, w.l.o.g.) such that the nonces chosen for each encryption, nonce1 and nonce2, satisfy

nonce1 + i = nonce2 + j (2.37) for some i, j ≤ n, where n is the number of b-bit blocks, yielding the ciphertexts w and z such that

CTR w =ek (p, nonce1) (2.38) CTR z =ek (m, nonce2) that are available to Eve. Given the nonce equality, the following holds

B B Ek (nonce1 + i) = Ek (nonce2 + j) (2.39) meaning that Eve, who is in possession of the ciphertexts w and z is able to compute

B B wi ⊕ zj =(Ek (nonce1 + i) ⊕ pi) ⊕ (Ek (nonce2 + j) ⊕ mj) (2.40) =pi ⊕ mj where the last equality follows from (2.39) and from the XOR properties of commutativity and self- inverse. Now, a statistical analysis technique would be the most straightforward approach to find both pi and mj.

CFB

Generally speaking, CFB suffers from the same fragilities as CTR mode: the IV must be unique and it may be susceptible to a POA. Furthermore the construction of each block of CFB encryption is fully-dependent on the previous hence there is no way of parallelizing the process.

2.6.8 Side-Channel Attack

Whenever the cryptographic systems are embedded within devices that are physically exposed in such a way that third parties can extract information from its electromagnetic field, temperature, sound, energy consumption, or any kind of physical element variation, one says that they are vulnerable to side-channel attacks.

33 Countermeasures

The countermeasures for side-channel attacks can be categorized into two main activity clusters:

1. Prevent the leak of information;

2. Remove or smoothen the relation between secret data and environmental changes.

Both of these actions impend a considerable amount of resources, especially access and knowledge to the hardware development, hence it is not usually easy to prevent side-channel threats.

2.6.9 Attacks on AES

Since it was published as a standard in 2001, AES has been target to non-ceasing break attempts throughout the years. It is yet unbreakable in terms of direct security, i.e., there is no efficient known practical attack on the cipher itself. AES overcomes the weaknesses of DES that were exploited by differential cryptanalysis but the development of the concept of [33], which instead of XOR differences is based on sets of chosen plaintexts that have some common fixed part, raised the first attack on this robust standard apart from the brute-force approach. The latter is simply infeasible for any of the possible key lengths. The first theoretical key recovery attack on AES [34] was published in 2011 and it was approximately four times faster than a brute-force attack. Even with the improvements to this attack dating to the current days, it is not yet possible to efficiently implement these attacks due to their time complexity. There are many possible ways to break a cipher and some of the most devious attacks do not directly target the cipher but instead work around it and try to gain information leaked by the behaviour of external components. The type of attack that deviates the most from the cryptographic features related with the privacy provider encryption scheme is a side-channel attack and in 2016 a very efficient attack of this kind was created [35] that relies on aspects related with the central processing unit (CPU)’s cache memory and can break AES in less than a minute. Notwithstanding, most modern-day CPUs are already resilient to this category of time-based side-channel attacks.

34 Chapter 3

Network

The present technological advancements entail an increasing complexity in computational security. Given a network under certain restrictions on both its elements’ autonomy, capacity and connectivity, arises the problem of transmitting data with integrity, authenticity and non-repudiation, using the nowa- days’ cryptographic standards. The goal of this chapter is to provide a topological solution together with a communication protocol for a specific network. The problem being studied is addressed in section 3.1, followed by a description of the components and restrictions imposed to the network in section 3.2. Then, some possible solutions for the network’s topology are compared and a choice is made in section 3.3 and section 3.4 contains a step-by-step description of the communication protocol under the chosen option. Lastly, in section 3.5 some concepts are introduced for the global characterization of the network’s inherent encryption and decryption mechanisms.

3.1 The Problem

The main purpose of this work is to develop an optimal solution for the topological and cryptograph- ical components of a restricted network. Basically, upon being provided with network requirements, which are either constraints to the network’s elements and their connections or to the capabilities of the communication channels, one is intended to choose the encryption and authentication schemes and analyze their level of security for fitting state-of-the-art properties and definitions. The goal of these schemes is to provide the data cryptographic properties that will strengthen the resilience of the data stored in the network elements’ memory against possible threats. The security layer of any of the pro- tocols used for the transmission of data between any two network parties is also a relevant subject of study for it will determine the level of security of the communications. Consider the scenario depicted in figure 3.1. Certain measurable elements from the environment are processed into digital data by a specific type of device, who stores the information after processing. Then, the data is to be transmitted to a secure database, where it is stored and used as required. The problem is to come up with a secure mean of transmitting the data from the device to the database, pro-

35 Measurable element Device Database

Detects activity

Data transmission

Data storage

Processes and stores information

Figure 3.1: General purpose and activity of the envisaged network. vided restrictions to the device’s lifetime, autonomy, capacity of processing and memory space. Thus, a network for the transmission of the data is to be constructed, which consists of distinct clusters such that each is composed by a fixed number of elements, each element of the same group shares a set of fea- tures and the connections between clusters are restrained under some pre-defined rules. The previously mentioned parties’ features range from the computational power scope and available cryptographic al- gorithms and their respective keys to the assigned mission of the network element. More specifically, the purpose of the network is to gather real-time data and transmit it to a secure database while granting the collected evidence confidentiality, integrity, authenticity and non-repudiation properties.

3.2 Details

Let the network be composed by three main components:

• Gathering devices (GDs): field-deployable parties that gather the raw data (with a maximum threshold of 256kB/s), process it and subsequently send it to an authorized party via an asyn- chronous channel. The length of each of the generated messages is a multiple of a minimum

defined length l1 and is maximized by 256 octets. These elements are restricted with respect to memory (256kB RAM) and autonomy, as their energy source is a non-rechargeable battery and remain in the same geographic location throughout the extent of their lifetime.

• Middle-point party (MPP): a gateway party who is near the deployed gathering devices in order to wirelessly receive the data and/or send command messages. May also possess a serial connec- tivity option for posteriorly physically transmitting the sensitive data to an authorized party.

• Mission and data manager (MnDM): headquarters’ positioned device that receives the data from the middle-point party, makes the necessary verifications and stores it in a secure centralized database. It is also capable of generating and sending command messages, whose length is a

multiple of a pre-defined minimum length l2.

Figure 3.2 summarizes the interactions between the abovementioned components. Note that the GDs do not communicate directly with the MnDM and vice-versa. The data flow from the GDs to the

36 MPP and subsequently to the MnDM is denoted by Upstream Data Lifecycle (UDL) and in the inverse direction is denoted by Downstream Data Lifecycle (DDL).

Upstream Data Lifecycle

DATA Gathering DATA Middle-Point DATA Mission and DATA

COLLECTION Devices TRANSMISSION Party TRANSMISSION Data Manager STORAGE

Downstream Data Lifecycle

Figure 3.2: General layout of the desired network.

The description of the network entities uproots the term of command message. These are pre- defined formatted messages whose contents are intended to give an instruction to another network party and can be generated by the MPP and MnDM. It is important to note that both the (binary) length of the messages generated by the GDs and the command messages herein introduced is a multiple of a value l1 and l2, respectively, for some l1, l2 ∈ N : l1 ≤ 2048 ∧ l2 ≤ 2048. For every message having length L = m × l it is equivalent to have m messages of length L with respect to the gathering process. That is, for every message x and command message m

∃L1,L2∈N :(L1 = m1l1 ∧ L2 = m2l2) ∧ (|x| = L1 ∧ |m| = L2) (3.1)

11 for some m1, m2 ∈ N : mi ≤ 2 /li, ∀i∈I2 . For secrecy, authentication and non-repudiation purposes some cryptographic algorithms are going to be used, for which are compelled cryptographic keys. Prior to deployment there must be a setup stage, in which the required keys are generated, transmitted to the envisaged target and stored in solid memory. These keys are generated inside secure headquarters, called the pre-mission system (PMS) and will be discussed further on. Figure 3.3 contains a simple diagram that depicts the whole step: at the PMS, a family of keys K = (K1, K2, K3) is generated such that K2 and K3 are the sets of keys Sn j transmitted to MPP and MnDM, respectively, and K1 = j=1 K1, where n is the total number of GDs j and K1 is the set of keys transmitted to GDj, ∀j∈N : 1 ≤ j ≤ n. When the setup stage is concluded the devices meet the required constraints for the set up of the network.

Pre-Mission System Generate key family K

j K1 K3

K2

GDj MnDM

Store keys MPP Store keys READY FOR DEPLOYMENT STAGE

Store keys

Figure 3.3: Pre-deployment stage

37 The GDs, upon deployed, can be in one of two states: active mode or sleep mode. When the devices are in active mode they keep on gathering evidence according to their data collecting schedule and send the processed data to the envisaged end party via an asynchronous communication channel [36] according to the data flow schedule. The sleep mode, in turn, is a low power consumption state in which the devices are not actively performing any activity other than periodically searching for a connection to an asynchronous communication channel. There must be an activity schedule on which the GDs’ actions rely on and it can be one of the components affected by the command messages that either the MPP or the MnDM send to the GDs. Due to the restrictions imposed on the gathering devices, there must be a device within range that generates the WLAN on which the devices share information. There are two options for this situation, which are discussed in detail in the next section:

1. The network is generated by a field-deployed AP;

2. The network is generated by the MPP directly.

When in active mode, the gathering devices are intended to be collecting and internally storing rele- vant data around the clock, but the transmission of this information to another party does not need to be performed at the same time, meaning that the WLAN can be periodically created and at that timespan the transmission of data should be prioritized. For this reason, the data stored in solid memory of the gathering devices is expected to be encrypted. The GDs are connected to the MPP via Wi-Fi, which becomes a security vulnerability since the messages are transmitted as radio signals hence an attacker may be able to eavesdrop on the communication channel and attempt to break the cryptosystem. For these reasons, the chosen protocol for the communication between the GDs and the MPP has been decided to be WPA2-PSK, since WPA2 is the most robust option for Wi-Fi communication channels. The choice of PSK over EAP follows from two facts: firstly, the gathering devices do not need to hide infor- mation from one another and secondly the EAP mode of the WPA2 protocol requires more computations for the initial authentication step and therefore it increases the energy consumption when compared with PSK. As for the Internet communication protocol between the MPP and the MnDM, it’s been decided that the TCP protocol [37] should be used for the Transport Layer along with IP for the Internet Layer.

3.3 Network Topology

There are several possibilities to be considered for the network’s topology and their suitability differs on the purpose that weighs more and passively implies the remaining options to become cumbersome. Following, some proposals will be presented and their practicality discussed.

AP-based network

This approach considers that the gathering devices solely communicate with the access point. There are two possibilities for this infrastructure mode, both represented in Figure 3.4.

38 GD1

Wi-Fi Wi-Fi Internet GD2 ACCESS POINT MPP MnDM DB

GD3

(a) Including a deployed router on the field.

GD1

Wi-Fi Internet GD2 MPP MnDM DB

GD3

(b) Middle-point party performs the AP role.

Figure 3.4: Topology of AP-based networks.

Proposal A: The first choice is represented in Figure 3.4a and considers the deployment of a fixed AP on the field, which would be able to maintain the WLAN in effect continuously. The main advantage of this option lies on the gathering device’s energy consumption reduction entailed by the fact that the encryption and subsequent internal storing of the gathered data would be performed by the AP itself, i.e., the GD would only need to spend energy on the communication protocol and not on the storing mechanism, the latter being performed by the AP. Notwithstanding, the latter would not only require a high amount of energy to be running, but would also be very difficult to be hidden due to its dimensions, which would allow an adversary to easily detect it on the field and eventually destroy it or try to crack the communications. These two very strong arguments lead to the deemphasizing of this possible solution, since the AP is required to be a centralized element of the network.

Proposal B: The second possible option is represented in Figure 3.4b for a total of 3 gathering devices. The arrowed edge represents the uni-directional data flow between the MnDM and the DB, whilst the simple edges represent a bi-directional data flow. It features the MPP as the access point of the network. Since the MPP is considered to be a versatile element in the sense that it is not deployed on the field, this option is considered to be very suitable for the features at hand.

So far, the topology presented in proposal A has been deemphasized, leaving B as the only viable option. Notwithstanding, another possibly highly reliable solution is going to be discussed.

39 Ad-hoc network

Proposal C: Considering the case where the gathering devices may communicate with one another, one is presented with an ad-hoc network (Figure 3.5). The dashed lines in the previously mentioned figure represent a possible communication, i.e., upon agreeing on a certain frequency for the commu- nication channel, the devices can either broadcast a message or send it to a number of targets of their choice. An advantage of this option over the previously mentioned AP-based case is the network’s scalability and self-management.

This layout is indeed a strong option for the topology of the network, since it may allow the GDs to never broadcast and therefore save energy. More explicitly, each GD can communicate with a finite chosen number of other GDs and/or the MPP. This would imply that the GD would have previously set up targets, allowing low energy consumption communications. However, the more possible connections the more keys one needs to store in each GD due to the required secrecy layer on the data stored within the gathering devices. Moreover, the end-to-end transmission in an ad-hoc network is usually slower given that the message will have to be transmitted from one party to the next until it arrives to the desired target, in the absence of broadcast. This step back may have a relevant impact on the system at hand, not due to the time needed for the MPP to gather the data, but to the energy spent by the GDs in the transmission process (recall that the GDs must save as much energy as possible for an extended autonomy on the field).

GD1

Wi-Fi Internet GD2 MPP MnDM DB

GD3

Figure 3.5: Topology of the ad-hoc network.

In this case, the use of Elliptic Curve Cryptography (ECC) would be very useful on the grounds that the memory savings (as a result to the smaller key sizes), the lower computational complexity on both the encryption and authentication processes and the reduction of the GDs’ power consumption are all demanding features, given the restrictions at hand. However, as opposed to standardized cryptosys- tems, it is not yet common to find embedded systems hardware-programmed with ECC, which is why proposal B has been chosen over C in practice.

40 3.4 Protocol

This section describes the protocol that specifies the key generation stage and the data processing that occurs on the network’s data lifecycle in both directions, i.e., UDL and DDL. Recall that the pre-deployment stage copes with Figure 3.3 and the network’s topology is as pre- sented in Figure 3.4b. Let n be the number of gathering devices and GDi be the GD at hand for some fixed index i ∈ I. Moreover, consider the following notation:

GMP i • (kA ) : The 128-bit key shared between the GDi and the MPP, used by the AES cipher.

MPM • kA : The 128-bit key shared between the MPP and the MnDM, used by the AES cipher.

GMP i • (kH ) : The 256-bit key shared between the GDi and the MPP, used in the HMAC-SHA-256 algo- rithm.

GMD i • (kH ) : The 256-bit key shared between the GDi and the MnDM, used in the HMAC-SHA-256 algorithm.

MPM • kH : The 256-bit key shared between the MPP and the MnDM, used in the HMAC-SHA-256 algorithm.

These abbreviations are to be considered throughout the entire text.

3.4.1 Setup

The setup stage must take place in a secure location and be performed by trusted users, given that herein all the necessary keys are generated and inserted into the corresponding targets. There are three types of keys that need to be generated and distributed: keys for encryption, keys for authentication and a single key for the Wi-Fi communication protocol. The key generation protocol occurs at PMS and comprises the following steps:

1. The user inputs (n, pass), a tuple consisting in the number of gathering devices that are going to be deployed and a password, respectively;

1 2. The key generation algorithm is applied with input (n, pass) and outputs K = (K1, K2, K3) ;

3. Export the keys to the envisaged devices according to the following distribution:

j GMP j GMP j GMD j •K 1 = {(kA ) , (kH ) , (kH ) };

MPM MPM GMP j •K 2 = {kA , kH } ∪ {(kA ) }j∈I ;

•K 3 = K1 ∪ K2;

j 4. The keys in K1 are inserted into GDj, ∀j ∈ N : 1 ≤ j ≤ n;

5. The keys in K2 are inserted into the MPP;

1As defined in section 3.2

41 6. The keys in K3 are inserted into the MnDM.

After all the steps are concluded the devices are ready for the deployment stage, in which the GDs i i are distributed among the desired initial locations l0, ∀i∈I . Each device GDi will remain in l0 for its entire lifetime without any key schedule algorithm to update the keys.

3.4.2 Communication Protocol

In this section, only the steps of the protocol are described. The utility of each of the components of the ciphertexts is explained in chapter 4. Subsequent to the setup stage, the GDs are deployed into the field of action and start gathering the data to be sent to the MPP. This data is encrypted and saved in the GD solid memory, waiting to be sent through the communication channel to the MPP, via Wi-Fi and using the WPA2-PSK protocol. The communication protocol of the whole network encompasses the two directions of the data flow: UDL and DDL.

Consider the devices to be already deployed in the field. Let GDi be one of the deployed gathering devices for some i ∈ {1, . . . , n}, let fid ∈ Z256 be a unique identifier of GDi stored in its solid memory and consider the following abbreviations:

IV GMP i • e1 ≡ Encryption mode AES-CTR-128 with key (kA ) using IV as the initialization vector.

IV GMP i • d1 ≡ Decryption mode AES-CTR-128 with key (kA ) using IV as the initialization vector.

IV MPM • ≡ Encryption mode AES-CTR-128 with key kA using IV as the initialization vector.

IV MPM • d2 ≡ Decryption mode AES-CTR-128 with key kA using IV as the initialization vector.

GMP i • h1 ≡ HMAC-SHA-256 with key (kH ) .

GMD i • h2 ≡ HMAC-SHA-256 with key (kH ) .

MPM • h3 ≡ HMAC-SHA-256 with key kH .

Upstream Data Lifecycle

The data is gathered by GDi, encrypted and stored in solid memory. Then, subject to the wireless channel’s communication protocol, it is sent to the MPP where its integrity and authenticity are verified and another layer of security is applied prior to being saved in the solid memory of the MPP. Lastly, the package is sent from the MPP to the MnDM either via Internet or serial connection and if all the verifications succeed at the MnDM then the plain data is sent to a secure database (DB). The operations that are carried out in each device will now be listed.

Gathering Devices

1.GD i transforms the gathered analog raw data to digital data D and assigns to it a 4-octet message

identifier mid;

42 2. Compute h1(D);

3. Compute h2(D);

4. Compute inner pack := h1(D) k h2(D) k D;

5. Generate a 16-octet initialization vector IV1;

IV1 6. Perform an encryption: e1 (inner pack);

IV1 7. Build the final package P ack1 := fid k mid k IV1 k e1 (inner pack);

8. Store P ack1 in solid memory.

The MPP gets in range and starts listening to incoming requests. GDi attempts to connect to the WLAN hosted by the MPP and as soon as the connection is established, the messages that the GDi had stored in memory are sent, subject to the WPA2-PSK protocol.

Middle-Point Party

IV1 9. P ack1 is parsed into its main components: fid, mid, IV1 and e1 (inner pack);

IV1 IV1 IV1 10. Decrypt e1 (inner pack), i.e., compute d1 (e1 (inner pack)) in order to obtain inner pack;

11. Parse inner pack into its main components: h1(D), h2(D) and D, such that h1(D) =

inner pack|256, h2(D) = (inner pack \ inner pack|256)|256 and D = inner pack \ inner pack|512;

0 0 12. Verify the message’s integrity, i.e., compute h1(D) and check whether h1(D) = h1(D), where 0 h1(D) is a new instance of the function h1 applied to the data D found in the decrypted package. If the verification is unsuccessful, then consider the message at hand as compromised and abort at this step by clearing all the memory associated with it.

2 13. Send the 32-bit word message identifier mid as an acknowledgement related to the message at

hand back to GDi;

IV1 14. Compute P ack2 := fid k IV1 k h1(D) k h2(D) k e1 (inner pack), where h1(D) and h2(D) are extracted from step 11;

15. Generate a 16-octet initialization vector IV2;

IV2 16. Encrypt the previously built package: e2 (P ack2);

IV2 17. Generate a digest of the encrypted data: h3(e2 (P ack2));

IV2 IV2 18. Build the final package P ack3 := h3(e2 (P ack2)) k IV2 k e2 (P ack2);

19. Store P ack3 in solid memory.

2The message acknowledgement is subject to the WPA2-PSK protocol and thus is protected while travelling through the net- work. Upon receiving this information, the GDi will trust that this information has been successfully delivered to the intended party.

43 After the data has been gathered the MPP closes the WLAN and connects to the MnDM via Internet, transmitting all the recently stored data subject to the TCP/IP protocol.

Mission and Data Manager

IV2 IV2 20. Parse P ack3 into its main components: h3(e2 (P ack2)), IV2, and e2 (P ack2), where

IV2 h3(e2 (P ack2)) = P ack3|256;

IV2 = (P ack3 \ P ack3|256)|128;

IV2 e2 (P ack2) = (P ack3 \ P ack3|384);

IV2 0 21. Verify the integrity of the encrypted data by computing a new HMAC instance h3(e2 (P ack2))

IV2 0 IV2 and checking whether h3(e2 (P ack2)) = h3(e2 (P ack2)). If successful, proceed to the next step, otherwise consider the message as compromised and abort at this step.

IV2 IV2 22. Perform the decryption d2 (e2 (P ack2)) in order to obtain P ack2;

23. Parse P ack2 into its main components:

fid = P ack2|8;

128 IV1 = (P ack2|136)| ;

256 h1(D) = (P ack2|264)| ;

256 h2(D) = (P ack2|392)| ;

IV1 e1 (inner pack) = P ack2 \ P ack2|392;

IV1 IV1 24. Perform the decryption d1 (e1 (inner pack)) in order to obtain inner pack;

∗ ∗ ∗ 25. Parse inner pack into its main components: h1(D) , h2(D) and D .

∗ 26. Check whether hi(D) = hi(D) , ∀i∈I2 . If successful, then proceed to the integrity check in the next step. Otherwise, consider this message to be incorrect and abort the execution at this step;

0 0 0 27. Compute two new HMAC instances of the data D: h1(D) and h2(D) and check whether hi(D) =

hi(D), ∀i∈I2 . If successful, send P ack3 to the DB, for it can be assumed with a high level of trust that the message has not been tampered with during the whole course. Otherwise, consider the message as compromised and abort the execution.

Downstream Data Lifecycle

Both the MnDM and the MPP can generate messages, usually called command messages, whose purpose is to give an instruction to another network element; for instance they can order a GD to change its data gathering time frame. The format of the command message is pre-defined and varies according to the type of inherent command. All messages generated and sent by the MnDM fall into the category of command messages and one can discriminate two distinct clusters: the commands intended for the MPP and the commands intended

44 for the GDs. Either way, the messages with origin at the MnDM are ciphered prior to being stored in the MnDM’s solid memory and sent to the MPP. Then the data is sent via Internet to the MPP and subject to a verification process upon arrival, after which it is either encrypted and saved in the MPP’s solid memory while waiting to be sent to the envisaged GD via Wi-Fi or read and applied on the fly. Moreover, the MPP can also generate local instructions intended for GDi, thus any command message that leaves the MPP via Wi-Fi must be flagged according to its sender. When the command message reaches the target GDi, its authenticity and integrity are verified and it is saved in a stack while waiting to be read and applied by the internal command manager protocol. The list of steps that are carried out in each device is now presented.

Mission and Data Manager

1. Generate a message M;

2. If M is a command for a GD then generate a HMAC of the message: h2(M) and proceed to step 4. Otherwise proceed to step 3;

3. Consider inner pack := M and fid a zero valued 8-bit identifier. Proceed to step 5;

4. Prepend the HMAC to the message: inner pack := h2(M) k M and choose a GD identifier fid ∈ 8 (Z2) : fid 6= 0. Proceed to step 5;

5. Generate a pseudo-random 16-octet initialization vector IV1;

IV1 6. Encrypt inner pack by computing e1 (inner pack);

IV1 7. Generate a HMAC of the encrypted package: h3(e1 (inner pack));

IV1 IV1 8. Build the package P ack0 = h3(e1 (inner pack)) k fid k IV1 k e1 (inner pack). If the command is not intended for a GD, then the device’s identifier field will hold a full zero 8-bit array, in which case it will flag that the recipient of the message is the MPP;

9. Generate a pseudo-random 16-octet initialization vector IV2;

IV2 10. Perform the encryption e2 (P ack0);

IV2 11. Build P ack1 = IV2 k e2 (P ack0);

12. Store P ack1 in solid memory.

P ack1 is now sent via Internet to the MPP subject to the TCP/IP protocol.

Middle-Point Party

Steps 13 to 27 (block of execution A) represent the phase of the protocol where the MPP receives the MnDM’s command message, proceeds to the necessary verifications and processes it accordingly, whereas steps 28 to 36 (block of execution B) describe the case where the MPP generates the com- mand message to be sent directly to the GDi. Both blocks may not be executed synchronously; for

45 instance, the MnDM may send a message to the MPP while the latter is processing its own command. Nevertheless, at the end of both execution blocks (A and B) the protocol follows to step 37.

13. Parse P ack1 into its main components:

IV2 = P ack1|128;

IV2 e2 (P ack0) = P ack1 \ P ack1|128;

IV2 IV2 14. Perform the decryption: d2 (e2 (P ack0)) in order to obtain P ack0;

15. Parse P ack0 into its main components:

IV1 h3(e1 (inner pack));

fid;

IV1;

IV1 e1 (inner pack);

IV1 ∗ 16. Compute a new HMAC instance h3(e1 (inner pack)) ;

IV1 ∗ IV1 17. If h3(e1 (inner pack)) 6= h3(e1 (inner pack)) then assume the message to be compromised, discard it and abort the execution. Otherwise continue;

IV1 IV1 18. Perform the decryption: d1 (e1 (inner pack)) in order to obtain inner pack;

19. If fid = 0 then apply the corresponding command and finish the execution at this step, otherwise continue3.

20. Initialize a single bit flag ∈ Z2 : flag = 1;

21. Parse inner pack and generate a HMAC of the message: h1(M);

22. Build inner pack1 := flag k h1(M) k h2(M) k M;

23. Generate a pseudo-random 16-octet initialization vector IV3;

IV3 24. Perform the encryption e1 (inner pack1);

∗ 25. Generate a unique message identifier mid;

∗ IV3 26. Build the final package P ack2 := mid k IV3 k e1 (inner pack1);

27. Store P ack2 in solid memory;

As already stated, the following steps 28 to 36 are related with the case in which the command message M is generated at the MPP instead of the MnDM. This block of execution does not necessarily follow from the previous one (steps 13 to 27) and may be triggered either by a user interaction on the MPP or by a scheduled command.

3 Note that in this case inner pack := h2(M) k M.

46 28. Generate a command message M;

29. Compute a HMAC of M: h1(M);

30. Initialize a single bit flag ∈ Z2 : flag = 0;

31. Build inner pack2 := flag k h1(M) k M;

32. Generate a pseudo-random 16-octet initialization vector IV4;

IV4 33. Perform the encryption e1 (inner pack2);

∗∗ 34. Generate a unique message identifier mid ;

∗∗ IV4 35. Build P ack2 := mid k IV4 k e1 (inner pack2);

36. Store P ack2 in solid memory;

At the end of either block of execution A or block of execution B, the package P ack2 is sent to the GDi via Wi-Fi and subject to the WPA2-PSK protocol.

Gathering Devices

37. Parse P ack2 into its main components:

mid = P ack2|32;

128 IV = (P ack2|160)| ;

IV e1 (inner pack) = P ack2 \ P ack2|160;

IV IV 38. Perform the decryption d1 (e1 (inner pack)) in order to obtain inner pack;

39. Read flag = inner pack|1. If flag = 0 go to step 40, if flag = 1 go to step 44, otherwise abort the execution and consider the message as corrupted.

40. Parse inner pack into its main components

256 h1(M) = (inner pack|257)| ;

M = inner pack \ inner pack|257;

0 41. Compute a new HMAC of M: h1(M) ;

0 42. If h1(M) 6= h1(M) consider the message to have been tampered with and abort the execution, otherwise continue;

43. Store M in the commands’ FIFO stack, waiting to be applied as soon as possible. Successfully exit the downstream data protocol after applying the envisaged command.

44. Parse inner pack into its main components

256 h1(M) = (inner pack|257)| ;

256 h2(M) = (inner pack|513)| ;

M = inner pack \ inner pack|513;

47 0 0 45. Compute two new HMAC instances of M: h1(M) and h2(M) ;

0 46. If hi(M) 6= hi(M) for some i ∈ {1, 2} then consider the message to have been tampered with and abort the execution, otherwise acknowledge the integrity of M and continue;

47. Store M in the commands’ FIFO stack, waiting to be applied as soon as possible. Successfully exit the downstream data protocol after applying the envisaged command.

3.5 Message Formats

This section aims to identify the distinct message formats built in the protocol description of section 3.4.2 and formally define the network’s packing and unpacking mechanisms. The five distinct mes- sage formats comprised in the protocol are visually presented in Appendix C according to the following specification:

F1 : encrypted by a GD and decrypted by the MPP.

F2 : encrypted by the MPP and decrypted by a GD (plaintext generated by the MPP).

F3 : encrypted by the MPP and decrypted by a GD (plaintext originally generated by the MnDM).

F4 : encrypted by the MPP and decrypted by the MnDM.

F5 : encrypted by the MnDM and decrypted by the MPP.

Hereinafter this notation is to be considered. All the abovementioned message formats are built based on ∗ ∗∗ two other category of message formats Fi and Fi , which are, respectively, the formats correspondent to the outer and inner layers of encryption within Fi for every i ∈ I5. These can also be consulted in Appendix C. The following definitions are useful for the upcoming discussion.

Definition 3.5.1 (Confidential plaintext). The raw data gathered by the GDs as well as the command messages are denominated of confidential plaintext.

The previous definition highlights the piece of information within the packages that is of utmost impor- tance and envisaged to be transmitted to and read by the desired parties. The aforementioned figures may clarify the subject. The only cryptosystem involved in the processing of the packages is the AES cipher. Since it is a 128-bit block cipher and the encrypted data does not necessarily have 128 bits, a BCMO is required and, as already stated, it has been decided to be the CTR mode of operation. Let G be the ciphertext generator operator such that G(C, M, v, k, x) = y (3.2) where y is the result of encrypting x via block cipher C using mode of operation M with initialization vector v (if applicable) and key k. Analogously, G−1 is the inverse operator and returns the corresponding plaintext: G−1(C, M, v, k, y) = x (3.3)

48 Consider the set of all words with format Fi,

∗ Si = {y ∈ Σ : y is of format Fi} (3.4)

Moreover consider a family of functions

∗ ∗ Ei : I × Σ × Σ × K × P → Si (3.5) such that Ei(j, a, b, k, x) represents the instance that outputs an element of Si, for some confidential plaintext x, key k, GD’s identifier j and global parameters a and b. For instance, for the format F1 the following expression is satisfied

E1(j, m, v, k, x) = j k m k v k G(AES, CTR, v, k, h1(x) k h2(x) k x) (3.6) where the HMAC functions h1 and h2 are defined as according to section 3.4. This function is called of package ciphertext generator function (PCgF) and hereinafter will be addressed accordingly. For a fixed key j, the function j ∗ ∗ Ei :Σ × Σ × K × P → Si (3.7) is an instance function from the family of PCgF with the same expression.

Definition 3.5.2 (Package ciphertext). Let x be a confidential plaintext, j ∈ I, k a key and a, b ∈ Σ∗ two parameters of choice. The data resulting from the computation of Ei(j, a, b, k, x) is designated as package ciphertext.

The inverse function for the PCgF is defined by

Di : I × K × Si → P (3.8) where K is the set of keys and P is the set of all confidential plaintexts and is called of package ciphertext unpacking function (PCuF). This means that for every z ∈ P:

j j Di (k, Ei (a, b, k, z)) = z (3.9) for every j ∈ I, where j Di : K × Si → P (3.10) is an instance of the family Di of PCuFs. The following definition is based on the previously presented ones and will be useful to the security analysis presented in chapter 4.

Definition 3.5.3 (Packing Scheme). A packing scheme (PSch) is defined by a 3-tuple (E,D,K) where E is a family of PCgF, D is a family of PCuF and K is the key set.

i Let PS be the PSch associated with message format Fi such that ∀i∈I5 :

i PS =(Ei,Di,K) and (3.11) n n [ j [ j Ei = Ei and Di = Di j=1 j=1

49 j j where Ei and Di are as according to expressions 3.7 and 3.10, respectively. i For every i ∈ I5, the packing schemes PS define the distinct message formats and their security will be thoroughly studied in chapter 4.1. Consider x to be an n-bit confidential plaintext generated by a given GD. The construction of the CTR mode of operation entails restrictions to the pair (v, k) used in CTR mode of operation, where v stands for the initialization vector and k for the key. Even though it is not required the initialization vector v to be unpredictable by an adversary, it is mandatory that the pair (v, k) does not repeat for the same block of plaintext. Thus, there is an upper bound on the length of the message to be encrypted using the CTR mode: the number of blocks of the message must not be greater than 2m, where m is the block size of the block cipher at hand, in bits4. Since AES is a 128-bit block cipher, the bit-length n of the plaintext must satisfy n ≤ 128 × 2128 = 2135 (3.12) which is a really large number and does not restrain the set of possible plaintexts in practice, for n may take larger values than the number of atoms in the universe. However, since the GDs are assumed to only have their RAM upper bounded by 256kB, then the plaintext messages’ length is bounded by this value, i.e., 18 n < 2 − ai (3.13) where ai represents the binary length of the data that was added in order to build the message with format Fi, that is j ai = |Ei (u, v, x)|2 − n, ∀j∈I,i∈I5 (3.14) where u and v are the required external variables for the construction of the package Fi. Moreover, note j k that for fixed |Ei (u, v, x)|2 = |Ei (u, v, x)|2|, ∀j,k∈I . The strictly lesser operator is due to the usage of some of the RAM by internal processes of the system at hand, that are needed for it to properly execute certain required background tasks.

4Recall that there are 2m distinct values for a bitstring of length m.

50 Chapter 4

Security Analysis

In real-world projects where computer security is of essence, there are always limitations directly caused by one or more of many factors, such as available funding or ethical restrictions. This means that one is not provided with unlimited resources in practice and choosing the best possible scenario comes both as an unavoidable consequence and arduous task. In general there is an inverse relation between security and performance but each system must be analyzed individually, as its features may entail that the former relation does not hold, in which case an optimal solution in terms of security may imply a greater amount of computational resources. This chapter aims to analyze the considered most important cryptographic properties of the packing schemes descripted in section 3.5 with respect to security. The strengths and weaknesses of the pro- posed message formats and protocols are scrutinized followed by suggestive solutions for the observed flaws.

4.1 Strengths and Weaknesses

The network considered in section 3.3 is not perfectly secure as one would expect and there are some ingrained fragilities induced by the chosen topology or by the communication protocol descripted in section 3.4. This section discusses some of those flaws and key aspects related with the key generation stage and the packing schemes built within the scope of the communication protocol.

4.1.1 Key Generation

In the key generation stage, as described in 3.4.1, all the required cryptographic keys are generated. In order to increase the resilience of the key against key-recovery attacks it must be generated as randomly as possible. Based on the results presented in [26] PBKDF2 is a good option for the generation of the keys, namely with HMAC-SHA-1 because it is the keyed-hash function with better performance and provides enough security [38], even though the inherent hash function is not strong with respect to collision resistance [20]. The password serves both as the seed for the pseudo-random generator that constructs the salt byte

51 array as well as the password passed as argument to the PBKDF2 algorithm. Ideally, one wouldn’t want to make the key generation process depend solely on a single password input by the user because it clearly lowers the security level of the key generation process, since to break the key generator comes down to finding a single password and replicate the process. Nevertheless, this simplistic approach was the one agreed to be used because of its simplistic features and the high level of trust placed in the user U operating of the MnDM. One very straightforward solution to increase the security level of this very im- portant stage of the protocol would be for U to provide two passwords: one to be used in the construction of the seed for the pseudo-random algorithm and the other to be used as the password for the PBKDF2 algorithm. However, both these approaches require a complete trust on the user that is generating the keys; if U is evil-intended, then the whole network becomes compromised. The answer to overcome this problem lies in the Two-Person Concept, which is a mechanism based in the following requirement: to launch a nuclear missile, there must be two distinct and unique individuals, each possessing a distinct key that is not known by the other party, inserting their credentials in the launching computer at the same time. By adapting this concept, one could build a similar behaviour for the generation of keys in the PMS, where a pre-processing stage would take place to construct a master key out of the keys of each of the k chosen parties, for some k > 1 (e.g., out of an XOR operation). This key would then be used by the key generation process and no party could ever single-handedly replicate the construction of the whole key set and attack the system. Figure 4.1 briefly depicts this procedure for k = 3.

User1

secret1

secret2 master key User2 Key Constructor Algorithm Key Generation Algorithm

secret3

User3

Figure 4.1: Key generation based on k users

One can also exploit flaws embedded in the distribution of the keys. Theoretically speaking, one does not even question the integrity of the MnDM but in practice all situations must be considered. Let Tom be a malicious user monitoring both the sent data and the results received at the MnDM. Since he is in possession of all the keys, he is able to generate a corrupted message M and produce a package E4(M) to be inserted into the DB, whose invalid authenticity is not traceable by anyone that GMP j MPM GMP j GMD j MPM attempts to make the verifications. That is, using the keys (kA ) , kA , (kH ) , (kH ) and kH , for some 1 ≤ j ≤ n, Tom would be able to produce a package P with format F4, holding M as its core message and such that any party who would attempt to trace the message would conclude that P is a package sent by the MPP to the MnDM whose principal components originated in GDj without being tampered with in the process. A possible solution for this issue would be to not provide the MnDM with

52 GMP i the keys (kH ) ∀1≤i≤n. This way, Tom would still be able to read and send messages from and to the MPP, respectively, but he would not be able to produce a malicious package P and send it to the DB without being flagged as compromised by a trusted authority. Apart from the aforestated problem, the current key distribution does not provide a whole control of the network by the MPP on the grounds that GMD i this device does not contain any of the keys (kH ) ∀1≤i≤n. Even if the MPP becomes compromised, it does not possess all the necessary knowledge to trick the GDs nor the MnDM into accepting corrupted messages. Nevertheless, the compromisability of any of the active elements of the network is a subject of utmost concern.

4.1.2 Packing Schemes and Protocols

Apart from the IVs and HMACs there are two elements that may occur in the packages’ headers: the gathering device identifier did and the message identifier mid. The former is intended to provide GMP i the MPP a mean to know which key of the set K1 = {(kA ) : 1 ≤ i ≤ n ∧ i ∈ N} was used in the encryption so that the same key is used in the decryption and/or in order for the MPP to know to which GD must the message be sent to, while the latter is helpful in the GD’s memory management. According to the protocol descripted in section 3.4.2, upon receiving a message, the MPP checks its integrity and authenticity and if this verification succeeds then an acknowledgment message containing mid is sent back to the envisaged GD. When the latter receives the ACK, it can release the memory associated with the message having identifier mid, using a look-up table for instance.

Even though both did and mid are not immune against tampering or unintentional errors, the following proposition holds.

Proposition 4.1.1. It is infeasible for an adversary to perform replay attacks or trick the MPP into as- suming the gathering material is located elsewhere.

j i Proof. Let i 6= j and Fj and Fi be two gathering devices with identifiers did and did, and located at positions X and Y , respectively. Consider that an adversary wants to display malicious activities at location X, which would be detected by Fj. If he is aware of the presence of Fj, he might attempt to j i tamper with the reports on the location by changing did for did; this way the information transmitted to the MPP would be that the malicious activity is in effect in location Y instead of X. However, changing i 1 the GD identifier value to did will result in the MPP calling the PCuF of PS using the keys shared with

Fi, meaning that the resulting decrypted text would be distinct from the original plaintext, due to the injectivity of AES. Thus such an attack becomes infeasible since the adversary has a very thin margin of success. The resistance to replay attacks follows directly from the WPA2 protocol [29].

j Hereinafter, due to simplicity purposes, consider E1(x) to represent the PCgF with omitted global parameters a and b and for which the keys used in the encryption scheme are associated with GDj.

i Proposition 4.1.2. PSch PS grants secrecy, integrity and authenticity to the confidential plaintext, for every i ∈ I5.

53 Proof. Let A be an adversary and j ∈ I a fixed identifier representing the operational gathering device j GDj. For some unknown confidential plaintext x suppose that A is in possession of y := Ei (x), the associated package ciphertext.

• i = 1: The secrecy of the plaintext follows directly from the secrecy property of the AES-CTR encryp- tion scheme; A cannot obtain x simply with the knowledge of y because the former is encrypted with GMP j AES-CTR using the key (kA ) ∈ K1, which is of private knowledge uniquely to the MPP and GDj.

j GMP j Based on figure C.2, recall that E1(x) = did k mid k IV k g(AES, CTR,IV, (kA ) , w), where w =

h1(x) k h2(x) k x. Clearly there is no integrity nor authenticity protection to the encrypted message and the malleability property of the AES-CTR encryption scheme gives the attacker an opportunity of tampering the ciphertext. In case A interferes with the ciphertext, then after decryption one ends up ∗ ∗ ∗ ∗ with the word w = h1(x) k h2(x) k x , due to the properties of AES-CTR. It is certainly very unlikely ∗ ∗ that ∀i∈I2 : hi(x) = hi(x ) because of the avalanche property of cryptographic hash functions, which means that the adversary has a negligible probability of corrupting an encrypted message without compromising the plaintext’s integrity check. The integrity and authenticity follows from the fact that GMP j GMD j the HMAC keys (kH ) and (kH ) are uniquely distributed among the pairs (GDj, MPP) and (GDj, MnDM), respectively.

• The same arguments can be applied analogously for i = 2,..., 5.

The previous proposition expresses that the data going through the UDL is authenticated and not tampered with in the process. However, when the data reached the DB how can an umpire prove to a third party that the data is legit? The following proposition addresses this question.

Proposition 4.1.3 (Non-repudiation). The data stored in the DB is granted the non-repudiation property, provided complete trust on the user of the PMS.

Proof. Let U be the user of the PMS, J the umpire, V the entity asking for the verification and y an arbitrary message stored at DB. In order for J to show V that y corresponds to a package ciphertext of some message that originated at the GDj, he requires of U a simulation of the key generation stage. Because U is the only one with access to the password used at the process then, having access to the correct PCuF f, V just needs to execute the function f with the keys that were assigned to GDj. If the output is a valid message then all the verifications for the unpacking procedure succeeded and J has proved to V that y is legitimate, as well as its secret contents.

Even though an adversary is not able to directly find the exact plaintext from its corresponding cipher- text it doesn’t mean the cryptosystem provides full security to the message’s secrecy. It could be the case that A would gain information on the plaintext if the IV is reused1, since the key remains static for the entire device’s lifetime. At the GDs the IVs are being generated through the standard incrementing function which means that the leak of information would only occur if the threshold of the state space

1See section 2.6.6.1 for key-reuse attack.

54 had been reached. Given that the IVs are fixed-sized words of 16 octets (128 bits), the total number of distinct IVs is 2128 − 1, which is a very large number. Every GD is limited to the maximum processing of 256 kB of data per second and each plaintext’s length is upper bounded by 256 octets therefore the maximum number of data packages that a GD can process per second is 1024. Continuously gathering data at such rate, a GD would overextend the space of possible message identifiers (mid) in around 48 days. Thus, the recommended lifetime for a GD under this conditions is 1 month in average, which means that the danger of reusing a key under the current encryption scheme is negligible. This conclusion confirms that the absence of a key scheduling algorithm for the GDs was a good option, for it would be a waste of energy to perform the computations to update the keys when there is virtually no threat against a brute-force attack.

4.1.2.1 Semantic security

The leak of partial information of the plaintext from the ciphertext is an undesired property that must be seen as a very dangerous threat when exploited by a capable adversary. This is the notion of semantic security introduced in chapter 2. The following proposition is very important for it expresses i the level of security of the message formats inherent to the packing schemes PS ∀i ∈ I5.

i Proposition 4.1.4. For every i ∈ I5, the PSch PS is semantically secure against chosen-plaintext at- tacks.

Proof. The fields did and mid within message format F1 are independent of the plaintext and so A cannot infer any relation with the associated plaintext. As has already been stated, the IV must be exposed as plaintext for the security of the AES-CTR encryption scheme and it does not leak any information for the i adversary to exploit. Therefore, for i = 1,..., 5 the semantic security of PS follows from the IND-CPA security of AES-CTR proved in [2] and from the equivalence between IND-CPA and SEM-CPA proved in [39].

Informally and assuming every confidential plaintext to be equally-sized, it means that the package ciphertext does not reveal any information whatsoever about the confidential plaintext. That is, for any two adversaries A and B who are given two confidential plaintexts x0 and x1 where (x0 6= x1) ∧ (|x0| = j |x1|) and such that B is also given the package ciphertext Ei(x) for any i ∈ I5 and j ∈ I, B has no advantage over A when trying to discover relevant information about xb ∀b∈Z2 . However, it so happens that the confidential plaintexts are not all of equal length and no padding method is used in the packing schemes, thus POAs are infeasible and the process is more efficient but it means that the ciphertext transmits the plaintext’s length information to the attacker. This unfortunate leak of information makes the cryptosystem vulnerable to chosen-plaintext attacks in which the adver- sary is able to pick plaintexts of non-equal length, since an adversary with knowledge of two plaintexts p0 and p1 such that |p0| 6= |p1| would be able to decide which of the them corresponds to the oracle’s answer cb for some random b ∈ {0, 1} with probability deviated from 1/2 given that |cb| = h + |pb| where h is the message header’s length.

55 Let A be the event corresponding to the storage in memory of package ciphertexts2 whose corre- sponding confidential plaintexts have lengths Lj for j ∈ Ik according to equation 3.1 and let B be the event analogous to A but for which the confidential plaintexts are all partitioned into equal-length sub- words of size l prior to being ciphered and stored. Let ∆ represent the difference between the memory payoff for k messages in event A and the memory payoff associated with event B. Then

k X ∆ = h (mj − 1) (4.1) j=1

The memory restriction imposed to the GDs is very unforgiving, as presented in the following example i analysis of a worst-case situation: first, note that the PSchs associated with the GDs are PS for i = 1, 2, 3 ∗ and let h = maxi∈I3 hi where hi = |y|2 − |x|2 for any y ∈ Si and x ∈ Σ . Furthermore, consider a 11 8 worst-case scenario where l = 1 byte and Lj = 2 ∀j∈Ik , thus ∀j∈Ik : mj = 2 . This means that ∆ = (28 − 1)kh. The GDs’ total allocated memory for storing the messages equals 221 bits as according to section 3.2. If event A is executed instead of B then the maximum number of messages that the GD at hand can store at a time in its solid time is

221 k = = 210 (4.2) 211

On the other hand, on the occurrence of event B, one has

221 = ∆ + k211 ⇒ k ' 11.95 (4.3) which means that the maximum number of messages that the GD can store in this case is 11, under the same restrictions. This example highlights the relevance of memory optimization within the GD. Following, an analysis with respect to a stronger type of attack is performed: the adversary not only has access to an encryption oracle but also to a decryption oracle. This is considered to be the stronger level of semantic security [40].

i Proposition 4.1.5. For every i ∈ I5, PSch PS is not semantically secure against chosen-ciphertext attacks.

n n Proof. Let A be an adversary playing the IND-CCA game [2] with the words w0 = 0 and w1 = 1 for some n ∈ N and w.l.o.g. fix j ∈ I. Let Oe and Od be the oracles with access to the PCgF and PCuF of 1 PS . The following strategy grants A a non-negligible IND-CCA advantage:

1. A queries Oe with (w0, w1);

j 2. Oe encrypts wb into y := E1(wb), for b ∈ Z2 and returns it to A;

3. A flips the last bit of y and obtains y0;

0 4. A queries Od with y ;

0 5. Od returns wb;

2In this case it is irrelevant which PCgF was used to generate the package ciphertext.

56 0 n−1 0 0 n−1 0 6. If wb = 0 1 then A chooses b = 0. If wb = 1 0 then A chooses b = 1.

The last step is only feasible due to the properties of error propagation of the CTR mode of operation [14]. Thus, A can tell to which confidential plaintext belongs the package ciphertext with probability far 1 from 1/2, meaning that PS is not IND-CCA secure. The result follows from the equivalence between semantic security and ciphertext indistinguishability in [3]. For i = 2, 3 and 5 the proof is analogous, with a single remark for the case i = 5, in which the associated PCgF makes a double call to the ciphertext generator operator G whose cryptosystem and BCMO arguments are AES and CTR, respectively (see figures C.7 to C.11). In this case, there are two layers of encryption on the confidential plaintext but the result is the same as in the previous cases due to the direct error transmission property of CTR. That is, flipping the last bit of the package ciphertext will entail that the last bit of the decryption of the outer layer of CTR encryption is also flipped, which will imply the last bit of the plaintext to be wrong after the final decryption, leading to the same outcome. For i = 4 one faces an encrypt-then-MAC procedure which is the only layout susceptible to be IND- 4 CCA secure. However, the fact that the IV is not targeted by the HMAC entails that PS is also IND-CCA insecure. In fact, note that if an adversary has access to a decryption oracle and the HMAC does not include the IV then the adversary can change the value of the IV at will in order to claim the keystream for the new IVs. Hence A will be able to decrypt any message that was encrypted using any of these new IVs.

The previous result states that an adversary with (temporary) access to a decryption oracle may gain knowledge to perform a partial or, in a worst-case scenario, complete break of the cryptosystem. This could be prevented if an encrypt-then-MAC mechanism was adopted and the header of the message was targeted by the HMAC.

4.1.2.2 Encryption Schemes

Now that the details about the chosen packing schemes have been presented it is time for the discus- sion of whether the encryption schemes involved in their construction were the right choices for the job. In order to make use of asymmetric cryptosystems in the development of digital signatures there is the need for a certificate authority (CA) whose role is to evaluate the authenticity of the messages travelling throughout the nodes of the network. The need for such an entity immediately deprecates this option because it is required that this trustable entity provides all the demanding certificates on the fly and as discussed in chapter 3, the chosen network topology does not include any permanent party on the field other than the GD. RSA is the only asymmetric cryptosystem implemented in the device’s hardware and there is currently no middleware developed for the call of this mechanism in software applications. Moreover the overhead in terms of memory entailed by the usage of asymmetric cryptographic systems and the increased key size makes these types of systems not suitable under the current restrictions, either for authentication or privacy purposes, when compared with methods associated with the usage of cryptographic hash functions. Hence the choice of symmetric methods is considered to be the best option.

57 CTR CFB Encryption Parallelizable Non-parallelizable Decryption Parallelizable Parallelizable Transmission Only the wrong Affects the wrong bits in the current block; Errors bits are affected Completely destroys the following blocks. May be predictable, May be predictable, IV must be unique. must be unique.

Table 4.1: Comparison between CTR and CFB features.

AES was the chosen cipher to provide secrecy to the confidential messages. Since this is an old standard published by NIST, a very fast algorithm and a it is hardware-implemented in the GDs makes it the most suitable choice for the case. CTR was the chosen mode of operation to with messages of variable length and it turns the encryption scheme into a stream cipher3. Since ECB is not semantically secure it was a right decision to have chosen CTR over ECB. As for the CBC mode, it requires an unpredictable IV because it is susceptible to a predictable IV attack as descripted in algorithm 3 while CTR only requires its unicity, a relevant fact for the decision made because it uplifts the execution time for the encryption procedures in the devices that have critical issues on the power consumption. Table 4.1 specifies some features of CTR and CFB in order to discuss their applicability to the presented problem and, as one can observe, the critical consequences of transmission errors and the non-parallelizable encryption are facts that lead to the deprecation of CFB over CTR. As discussed so far, CTR mode of operation is the best option over the classical block cipher modes of operation ECB, CBC and CFB under the constraints at hand. The fact that CTR mode with HMAC-256 checksum (CTR-H) mode was chosen over CCM and GCM is not a very straightforward outcome since the latter are two authenticated modes which have been well studied and have better performances es- pecially in terms of memory optimization; for CTR to achieve the same level of security as the previously mentioned authenticated modes it will definitely perform poorly with respect to memory optimization due to the added length of the MAC. Notwithstanding, it is in fact faster to execute the CTR-H mode of oper- ation rather than any of the other two modes because the former is implemented in the chosen device’s hardware, opposite to the latter; CCM is in fact a very slow mode due to its double block encryption procedures. Also, note that in this case it is preferable to choose an encryption algorithm with lower execution time than lower memory usage due to the device’s energy consumption for there is an upper bound on the data gathering rate and the memory associated with the gathered messages is deleted upon confirmation of its arrival to the recipient, meaning that the available 256kB suffices for the storage of the messages. Moreover, the deployment of the devices onto the field of action requires not only financial but also human resources, hence the durability of the devices is of utmost importance. Another strong argument for the choice of CTR mode alongside a HMAC is the fact that it makes use of an extra key (the authenticity and privacy keys are distinct) when compared with the authenticated modes

3See section 2.6.6 for attacks on stream ciphers.

58 of operation GCM and CCM, which use a single key for the whole process. The devices’ short lifetime entails a high resilient system against brute-force attacks and is the feature that prompts the inexistence of a key scheduler algorithm.

With this in mind the choice of CTR-H over all the other presented modes is considered to be suitable for the situation, provided a good usage of the authentication mechanism, i.e., a usage that prevents an attacker of exploiting weaknesses on the encryption scheme at hand, such as for example malleability.

4.1.3 Attacks

This subsection discusses some of the aspects that may be exploitable by an adversary in practice, under the assumption that all the restrictions imposed to the network and its elements hold. In practice, the GDs are deployed onto a fixed location in order to keep on gathering data. It is then possible for an adversary to physically tamper the devices, which is why the data is encrypted in solid memory in the first place. Let A be a polynomial-time bounded adversary with physical access to the GDs. If A tries to read the memory with the ambition of directly retrieving the secret data gathered by GDs then he/she i will have a bad time in doing so because these are encrypted using the PSch PS defined in equation 3.11, and as seen in section 4.1 these are semantically secure apart from chosen-ciphertext attacks. Moreover, the adversary will not be able to retrieve the keys from memory because these are stored in a memory location of restricted access [41]. Now assume that the GDs are deployed at time 0 and that at time t there is an event E which will directly or indirectly provide information to be gathered by some GD. The latter will get to know this information upon collecting the data within the time interval T = [t, t + [, where  > 0 is the duration of the intelligence leak of the event E. Suppose also that A is aware of both E and . Then, the adversary just needs to interfere with the envisaged GDs in the time interval T in order to prevent them from collecting any of the relevant data. Thus, for an intelligent adversary who is able to physically interfere with the GDs, this technique is less likely to be spotted by a tamper detection mechanism, while allowing A to optimize his/her energy consumption on the attack.

Another attack that could be performed is the reading of volatile memory [42] by a capable adversary. No message is ever stored as plaintext in solid memory, but both before encryption and after decryp- tion, the confidential plaintext is automatically stored in volatile memory, even if for a short time frame. Nonetheless, this time frame may suffice for the adversary to harvest the secret information.

Assuming that A cannot physically harm or interfere with the GDs a straightforward attack would be for the adversary to perform a MiM attack known as wireless denial of service attack that consists in jamming the communication channel with junk data, making the exchange of data between the GDs and the MPP impossible, provided that he finds the correct frequency. This method can somewhat be seen as a last resource for an adversary who is unable to partially or completely break the cryptosystem for it would be in his interest to stealthy eavesdrop on the communications in order to eventually tamper the messages or acquire information and change his strategy accordingly.

59 4.2 Possible Solutions

Some suggested solutions for the problems discussed in the previous section are now presented. It is important to note that these are just suggestions that aim to improve the selected choices. Even though they may seem better in a theoretical point of view it could just so happen that in practice these approaches are not fit for a variety of possible reasons. The reader should notice that the possible solutions to the issues related with the key generation stage have already been discussed in section 4.1.1. A reasonable approach to prevent an attack where the adversary takes advantage on the physical exposure of the device is to choose a device whose hardware provides a tamper detection mechanism and memory management [43] such that in case of memory compromise it clears all the memory as- sociated with the confidential data. This is a last resource and leads to the disablement of the device at hand. With respect to the jamming attack on the wireless communication channel, there is no way to prevent it from happening. The only solution would be to wire the connection, where the adversary would be unable to jam the channel without being targeted by the MPP.

4.2.1 Chosen-plaintext attack

Let x1, . . . , xk be confidential plaintexts for some k ∈ N and y1, . . . , yk their correspondent package i ciphertexts, respectively. Recall that ∀i∈I5 : PS is not secure against chosen-plaintext attacks in which the adversary is able to pick plaintexts of distinct length. It is known from equation 3.1 that the length Lj of confidential plaintext xj is a multiple of a minimum length l, i.e.,

Lj = lmj ∀j∈Ik (4.4)

Furthermore, let h be the length of the header of a package ciphertext, which is assumed to be fixed for any PSch, for simplicity purposes. If one does not consider the header’s overhead, then it is equivalent to partition each message of length Lj into mj messages of length l and the system would become resistant to variable-length chosen-plaintext attacks. This is uniquely a theoretical solution for this prob- lem because the assumption of the absence of the header does not hold in practice. The only way to overcome this issue is to define all confidential plaintexts to have the same length.

4.2.2 Chosen-ciphertext attack

i Proposition 4.1.5 refers to the fragility of the PSchs PS with respect to chosen-ciphertext attacks. The previous result holds under the assumption that the adversary A has access to a decryption oracle. However, in practice there is no feasible way for A to be admitted such resources. The only way A would be able to succeed would be to impersonate an element of the network, use the correct4 PCgF and then send it to the end party and be able to extract the plaintext from its volatile memory at the moment of decryption. This approach is infeasible in practice because no element of the network can be impersonated by a polynomial-time bounded adversary, as shown in proposition 4.1.2.

4By correct one means the correct function using the correct keys.

60 Chapter 5

Implementation Details

This chapter covers the majority of the developed code by illustrating some pieces of pseudocode and in some cases performing its complexity analysis. Two mock-up application examples were cre- ated: one is related with the key generation step and the other with the data flow of the discussed network. Appendix B contains a user guide manual for the first case. The application with regard to the communication is not presented in this text due to the confidential nature of its execution requirements. The code for the GDs has been developed in C programming language, whilst all the remaining code is in Java language. Clearly the low-level programming allows a more versatile working environment, but this advantage is evened by the inherent problem of performance and memory optimization.

5.1 Key Generation

As described in section 3.4.1 all the keys are generated in the PMS. This process is triggered by a user-input on the key generation program specified by Algorithm 5. The latter depends on Algorithm 4, which is the atomic procedure for securely generating keys of a given size. The PSK for the WPA2 protocol is generated as in section 2.5.2:

PBKDF2(HMAC-SHA-1, wipass, ssid, 4096, 32) (5.1) where ssid is the wireless network’s SSID and wipass is the password shared by the authorized parties, i.e., the GD and the MPP. The primitives within these two devices used for the WPA2 secured wireless network require the SSID and password as input, since the generation of the PMK occurs internally. Hence, for the generation of the Wi-Fi pre-shared key it suffices for the tuple (ssid, wipass) to be pro- vided. Therefore the trusted user starts by inputting the SSID and password for the WLAN as well as the number of GDs and a seed, which can be seen in an informative point of view as a password that makes the pseudo-random key generation process behave deterministically, meaning that the knowledge of that seed may serve as proof for the authenticity of the keys and validate any future report, if needed. 16 32 Let ΣK = Z256 be the key alphabet and consider A = {k : k ∈ ΣK } and H = {k : k ∈ ΣK } to be the set of keys generated for encryption and authentication, respectively. The key generation process for the elements of A and H is very similar, with the exception of the key length. According to [44], the

61 length of the keys in H must be at least equal to the length of the output of the hash function used, i.e., 32 octets. Algorithm 4 contains a high level representation (in pseudocode) of the generation step for the en- cryption and authentication keys. The main layout of the algorithm is as follows: a customized process is applied to the input secret in order to output a value that will be used as a seed to the cryptographic hash function SHA-1. This function is used as a PRF in order to produce the salt used in the PBKDF2 function that generates the key. Following, two important methods of the algorithm are thoroughly explained. InitRand(pass) is a customized deterministic procedure that makes use of the user-input pass ∈ Σ∗ in order to seed a CSPRNG (namely the SHA-1 function) which produces the salt used in the password- based key derivation function. The general idea behind it is to process the input in order to increase the password-complexity in such a way that an adversary who is able to obtain the password chosen by the user does not know immediately the seed used in the generation of the salt. Let n = bl2/32c be the number of 32-bit blocks of pass where l2 is the binary length of pass, let pi be the binary representation th of the i block of 32 bits ∀i∈In and let pn+1 be the last block with k bits, for some 0 ≤ k < 32. It is useful to convert each of the blocks pi into their decimal representation xi ∈ Z256 to perform the necessary addition operations. The seed s of the SHA-1 pseudo-random function is computed out of two sub-seeds s1 and s2. The first is given by:

n X ∗ s1 = xi + xn+1 (5.2) i=1

∗ m ∗ where xn+1 is the decimal representation of pn+1 k 0 for m = 32 − k whenever k > 0, or xn+1 = 0 if k = 0. Figure 5.1 describes the aforementioned method.

pass CHARACTER STRING

p1 p2 ··· pn pn+1

32 bits 32 bits 32 bits k bits BINARY STRING 0 < k < 32

m pn+1 k 0

32 bits

∗ x1 x2 ··· xn xn+1 DECIMAL STRING

+

s1

Figure 5.1: Pre-processing steps of the secret pass for the generation of the seed of the SHA-1 pseudo- random function.

As for the computation of the sub-seed s2, let l be the length of pass with respect to elements of Σ

62 and consider n ∗ X = ( n xi) k xn+1 (5.3) i=1 Clearly, X ∈ Z is a value whose number of digits is greater or equal than l because in the worst case each xi contains a single digit, which means that one can compute the second sub-seed as s2 = X|l and the value used for seeding the SHA-1 cryptographic hash function is given by

s = s1 + s2 (5.4)

GenerateSalt() is a procedure that calls the previously seeded SHA-1 CSPRNG in order to produce a value composed by 16 octets. The value outputted by this method is used as salt for the PBKDF2 function.

Algorithm 4 Key generation for a fixed length 1: procedure KEYGEN(n, pass, len) . Generating n keys with len bits under the password pass

2: keystream ← null;

3: InitRand(pass);

4: for r = 1 to n do

5: salt ← GenerateSalt();

6: key ← PBKDF2(hmac-sha-1, pass, salt, 10000, len);

7: keystream ← keystream k key;

8: end for

9: return keystream;

10: end procedure

After calling GLOBALKEYGEN(n, pass) there are 3n + 2 keys in stack memory, out of which the first n + 1 are designated to be used by the AES algorithm and the remaining 2n + 1 to be used in the HMAC-

SHA-256 algorithm. Let k1, . . . , k3n+2 be all the keys generated, ordered as they were generated, i.e., k1 was the first and k3n+2 the last. Then, the following correspondences have been defined:

GMP i • (kA ) := ki for 1 ≤ i ≤ n.

MPM • kA := kn+1.

GMP i • (kH ) := kj ∀i,j :(n + 2) ≤ j ≤ (2n + 1) and i = j − (n + 1).

GMP i • (kH ) := kj ∀i,j : (2n + 2) ≤ j ≤ (3n + 1) and i = j − (2n + 1).

MPM • kH := k3n+2.

Algorithm 5 Key generation algorithm 1: procedure GLOBALKEYGEN(n, pass) . Generating n keys with len bits under the password pass

2: a ← KEYGEN(n + 1, pass, 128);

3: b ← KEYGEN(2n + 1, pass, 256);

4: return a k b;

5: end procedure

63 Considering n to be the total number of GDs, the time complexity of algorithm 4 is O(n). In fact, note that InitRand(pass) ∈ O(m) for m = |pass|. However, the restriction imposed on the password (8 ≤ |pass| ≤ 63) entails an upper bound to the run time of this procedure, independently of the chosen password. Thus, asymptotically under this constraint, the time of InitRand is upper bounded by a constant, i.e., O(1). The password-based key derivation function PBKDF2 runs in time O(1) therefore the n-step cycle will run in time O(n). The graphics presented in figures 5.2a and 5.2b show the behaviour of the time spent in the key generation process with the increasing of the total number of GDs. The y-axis contains the average time for the generation of the key set and the x-axis the number of GDs; each point in figure 5.2a and 5.2b was taken out from a set of 100 and 20 values, respectively. The first image intends to expose the results obtained for a real-world situation, which would become impractical if the number of deployed devices happened to overextend the maximum presented magnitude (100) and the second shows the results for a theoretical large number of GDs (5000).

(a)

(b)

Figure 5.2: Scatter plots of the average key generation time per number of gathering devices.

Note that there is a slightly variation in the average key generation time for when the number of

64 GDs is near 85, depicted in figure 5.2a. This variation can be explained by the usage of the CPU by some external processes (e.g. operational system processes). Both scatter plots are in agreement with the stated time complexity for the generation of the keys. Of course, these graphics do not show the asymptotically linear time complexity behaviour but give an insight on what happens in practice.

5.2 Data Processing

In order to make use of the keys generated according to the previous section all the elements of the network must agree on algorithms that pack and unpack the data into each of the formats Fi, i ∈ I5 while making the required verifications. These can be seen as implementations based on the protocol descripted in chapter 3.4.2.

Let I = {k ∈ Z : k is a GD identifier} and consider the function valid : Z → Z2 given by  1 if f ∈ I valid(f) = (5.5) 0 otherwise

Clearly |I| equals the total number of deployed GDs, meaning that there are only |I| possible values out of the domain of valid for which this functions returns 1. This function is very useful in the upcom- ing discussion because it can filter whether a given 4-octet field represents a valid gathering device’s identifier and the associated message should be discarded whenever it returns 0. Moreover, consider C CTR-ENCv (k, x) to be the result of the encryption of x with the block cipher mode of operation CTR us- ing the block cipher C, initialization vector v and key k, where the initial blocks given as input to C are C built according to equation 2.18. Analogously, CTR-DECv (k, y) represents the decryption operation on y. Lastly, let HMAC-SHA-256(k, m) be the result of the HMAC-SHA-256 function applied to m using k as key, coping with equation 2.22.

5.2.1 Upstream Algorithm Lifecycle

The algorithms that represent the implementation of the data processing throughout the UDL are now illustrated and discussed. After gathering the relevant data the GDs will proceed to its processing, which is descripted in algorithm 6.

GenIvGd takes as input n ∈ N and generates an array with n bits according to the standard incre- menting function, starting with a hardcoded value for the first initialization vector. That is, whenever a new message is processed by the GD, the new IV vnew is computed as follows:

[vnew]2 = [vold]2 + 1 (5.6)

This procedure is done in time O(m), where m is the size of the initialization vector. Since the latter is always fixed independently of the size of the confidential plaintext given as input to the procedure

PACKGD, the GenIvGd method runs in constant time, i.e., O(1).

PACKGD runs in time O(n) where n is the size of its input, because the methods HMAC-SHA-256 and CTR-ENC run in time O(n), as well as the concatenation operator.

65 Algorithm 6 GD’s data packing algorithm Input: x . Confidential plaintext j Output: E1(x) . For some j ∈ I

1: procedure PACKGD(x) . Pack the data into format F1 GMP j 2: h1 ← HMAC-SHA-256((kH ) , x); GMP j 3: h2 ← HMAC-SHA-256((kH ) , x);

4: inner ← h1 k h2 k x;

5: v ← GenIvGd(128); AES GMP j 6: enc ← CTR-ENCv ((kA ) , inner)

7: return fid k mid k v k enc

8: end procedure

Upon taking as input a confidential plaintext, the algorithm PACKGD transforms it into an element of S1 using the keys that are stored in the GD’s solid memory with restricted access. After the data has been packed it is sent to the MPP and upon reception the algorithm PROCESSMPPGD is triggered, illustrated in Algorithm 10. The latter makes a call to each of the procedures descripted in algorithms 7 to 9.

Algorithm 7 MPP’s data unpacking algorithm: sender GD GD 1: procedure UNPACKMPP (y) . y ∈ S1

2: t ← Parse(y, F1); . t is a 4-tuple

3: if !CheckMsgId(t[1]) or valid(t[0]) == 0 then

4: return error;

5: else AES GMP j GMP j 6: p ← CTR-DECt[2] ((kA ) , t[3]); . The key (kA ) follows from t[0] in time O(1) 7: end if

8: return (t, p);

9: end procedure

GD The first procedure called by Algorithm 10 is UNPACKMPP , which receives as input y ∈ S1 and returns a 2-tuple (t, p) such that t is the 4-tuple for which each element corresponds to the parsed principal components of y (represented in step 7 of algorithm 6) and p is the result of the decryption of the body1 of y. Parse(x, f) is a function that parses the main components of x according to format f. The first of the input parameters x is the piece of data to be parsed and f is the message format to be considered, ∗ th i.e., f ∈ F, where F = {Fi}i∈I5 ∪ {Fi }i∈I5 . It returns a tuple such that the j element of the tuple corresponds to the jth element according to the format f. The time complexity for the parsing of the word x is linear in the size of the input, that is O(n) where n is the length of x. CheckMsgId(m) is a function that looks for the value m in a look-up table and in case the value is

1By ’body’ one means apart from the header.

66 found it returns false, otherwise returns true. This table contains all the identifiers of the messages already received and serves a memory management purpose, especially useful at the GDs for memory optimization. The time complexity for the search of a value in the table is O(1) in average.

Moreover, valid is also executed in constant time. Therefore, algorithm 7 runs in time O(n).

Algorithm 8 Authenticity and integrity verification algorithm Input: (h, x, k) Output: true or false . Success or insuccess, respectively

1: procedure VERIFY(h, x, k) . Verifies whether h is a HMAC of x using key k

2: h∗ ← HMAC-SHA-256(k, x);

3: if h∗ == h then

4: return 0;

5: else

6: return 1;

7: end if

8: end procedure

Upon performing the required decryption and splitting the package into its main components, it’s necessary for the MPP to call VERIFY in order to make the required authenticity and integrity checks. j If this verification succeeds then it can proceed with building the final message E4(x), where x is the confidential plaintext sent by the GD with identifier j ∈ I. To this end, a call to PACKMPPMNDM is made.

Algorithm 9 MPP’s data packing algorithm: recipient MnDM

1: procedure PACKMPPMNDM(z) . Desired recipient: MnDM

2: v ← PrGenIv(128); AES MPM 3: e ← CTR-ENCv (kA , z); MPM 4: h ← HMAC-SHA-256(kH , e); 5: return h k v k e;

6: end procedure

PrGenIv(n) is a method that includes the function SHA-1 as a PRNG to construct a unique2 n-bit value and its execution time is linear on the size of the input. PACKMPPMNDM calls this method in order to generate the IV used in the AES-CTR encryption. As one can easily observe PACKMPPMNDM ∈ O(n), where n is the size of the input.

Thus, the whole process of unpacking and verifying the data that reaches the MPP and is sent from a GD is performed in time O(n), where n is the size of the data given as input to PROCESSMPPGD.

2For a negligible probability of collision.

67 Algorithm 10 MPP’s data processing algorithm: recipient MnDM

Input: y . y ∈ S1

Output: z . z ∈ S4

1: procedure PROCESSMPPGD(y) GD 2: (t, p) ← UNPACKMPP (y); ∗ 3: u ← Parse(p, F1 ); GMP j 4: if VERIFY(u[0], u[2], (kH ) ) then 5: r ← t[0] k t[2] k u[0] k u[1] k t[3];

6: return PACKMPPMnDM(r);

7: else

8: return error;

9: end if

10: end procedure

Subsequent to the construction of the package, its delivery to the envisaged recipient takes place. Upon the arrival of the message, the MnDM executes the steps specified in Algorithm 13 in order to process and extract the confidential plaintext. This procedure comprises of 2 methods UNPACKMNDM and VERIFYMNDM specified in algorithms 11 and 12, respectively.

Algorithm 11 MnDM’s data unpacking algorithm

Input: y . y ∈ S4 ∗ Output: z . z ∈ S4 1: procedure UNPACKMNDM(y)

2: u ← Parse(y, F4); MPM 3: if VERIFY(u[0], u[2], kH ) then AES MPM 4: return CTR-DECu[1] (kA , u[2]); 5: else

6: return error;

7: end if

8: end procedure

In the UNPACKMNDM method, first the data is unpacked into its principal components (see figure C.8), then the authenticity and integrity verifications take place and lastly a decryption is performed to ∗ retrieve the plain data within the outer layer of encryption, with format F4 (see figure C.7). This procedure runs in time O(n).

68 Algorithm 12 MnDM’s data verification algorithm ∗ Input: y . y ∈ S4 ∗ Output: t . 3-tuple corresponding to the decrypted part of F4 1: procedure VERIFYMNDM(y) ∗ 2: u ← Parse(y, F4 ); AES GMP j 3: d ← CTR-DECu[1] ((kA ) , u[4]); ∗ 4: t ← Parse(d, F1 ); GMP j GMD j 5: if t[0]! = u[2] or t[1]! = u[3] or !VERIFY(t[0], t[2], (kH ) ) or !VERIFY(t[1], t[2], (kH ) ) then 6: return error

7: else

8: return t;

9: end if

10: end procedure

∗ The VERIFYMNDM procedure is expected to be given an input with format F4 ; any other input will result in an error returning message and the input data will be discarded. This function translates the behaviour presented in steps 23 to 27 within the UDL in chapter 3.4.2. Moreover, VERIFYMNDM ∈ O(n).

Algorithm 13 MnDM’s data processing algorithm

Input: y . y ∈ S4 Output: x . Confidential plaintext

1: procedure PROCESSMNDM(y)

2: o ← UNPACKMNDM(y);

3: t ← VERIFYMNDM(o);

4: return t[3];

5: end procedure

As the name states, PROCESSMNDM ∈ O(n) combines algorithms 11 and 12 in order to process the data that arrives at the mission and data manager entity. Figure 5.3 contains a high-level visualization of the interaction between the specified algorithms used in the UDL, thus it has been named of Upstream Algorithm Lifecycle (UAL).

PACKGD PROCESSMPPGD PROCESSMNDM

• UNPACKMPPGD • UNPACKMNDM

• VERIFY • VERIFYMNDM

• PACKMPPMNDM

Figure 5.3: Upstream Algorithm Lifecycle

69 5.2.2 Downstream Algorithm Lifecycle

In this section the algorithms that represent the implementation of the data processing throughout the DDL are described. MnDM generates a command message, processes it according to algorithm 14 and sends it to the envisaged recipient (MPP)

Algorithm 14 MnDM’s data packing algorithm Input: x . Confidential plaintext

Output: y . y ∈ S5

1: procedure PACKMNDM(x) GMD j 2: h1 ← HMAC-SHA-256((kH ) , x); 3: inner ← h k x;

4: v1 ← PrGenIv(128);

5: e ← AES((kGMP)j, inner) CTR-ENCv1 A ; MPM 6: h2 ← HMAC-SHA-256(kH , e);

7: v2 ← PrGenIv(128);

8: outer ← h2 k fid k v1 k e;

9: r ← AES(kMPM, outer) CTR-ENCv2 A ;

10: return v2 k r;

11: end procedure

j PackMnDM treats the input x to be the confidential plaintext and builds E5(x). This algorithm runs in time O(n) where n is the size of x, due to the time complexity of all the inherent methods, which are described in section 5.2.1.

Algorithm 15 MPP’s data unpacking algorithm: sender MnDM

Input: y . y ∈ S5 ∗∗ Output: p or error . p ∈ S5 1: procedure UNPACKMPPMNDM(y)

2: t ← Parse(y, F5); AES MPM 3: d ← CTR-DECt[0] (kA , t[1]); ∗ 4: f ← Parse(y, F5 ); MPM 5: if !VERIFY(f[0], f[3], kH ) or valid(f[1]) == 0 then 6: return error;

7: else AES GMP j 8: return CTR-DECf[2] ((kA ) , f[3]); 9: end if

10: end procedure

Upon receiving the data, the MPP will make a call to the first procedure represented in algorithm

70 17. PROCESSMPPMNDM ∈ O(n) calls two methods (presented in algorithms 15 and 16), both with time complexity O(n), where n is the size of the input. Algorithm 15 contains the pseudocode associated with ∗∗ the steps required for unpacking the data with format F5 into the plaintext with format F5 and algorithm 16 contains the packing method for the MPP, which takes as input a 2-tuple and packs the first entry of that tuple into format F2 or F3, depending on the second element of the input.

Algorithm 16 MnDM’s data packing algorithm: recipient GD Input: (x, flag)

Output: y . y ∈ S2 or y ∈ S3, depending on x

1: procedure PACKMPPGD(x, flag)

2: if flag == 1 then ∗∗ 3: t ← Parse(x, F5 ); GMP j 4: h1 ← HMAC-SHA-256((kH ) , t[1]); 5: else GMP j 6: h1 ← HMAC-SHA-256((kH ) , x); 7: end if

8: inner ← flag k h1 k x;

9: v ← PrGenIv(128); AES GMP j 10: e ← CTR-ENCv ((kA ) , inner);

11: mid + +; . Global value

12: return mid k v k e;

13: end procedure

However, it could also happen that the MPP would generate the comand message x, instead of acting solely as a communication bridge between the GD and MnDM. In this situation, the procedure

PROCESSMPP ∈ O(n) descripted in algorithm 17 is called and returns a package ciphertext belonging to the set S2.

Algorithm 17 MPP’s data processing algorithm Input: x

Output: y . y ∈ S2 or y ∈ S3, depending on the procedure called

1: procedure PROCESSMPPMNDM(x) . Sender MnDM

MNDM 2: p ← UNPACKMPP (x); GD 3: return PACKMPP (p, 1); . y ∈ S3

4: end procedure

1: procedure PROCESSMPP(x) . MPP generates the confidential plaintext x GD 2: return PACKMPP (x, 0); . y ∈ S2

3: end procedure

Regardless of the method used by the MPP to process the data, it will send it to the envisaged GD and upon reaching the recipient GDi the procedure descripted in algorithm 20 is triggered. PROCESSGD

71 ∈ O(n), where n is the size of the input data that was received through the asynchronous communication channel established between the MPP and the GD. This method calls the procedures descripted in algorithms 18 and 19.

Algorithm 18 GD’s data unpacking algorithm Input: y ∗ ∗ Output: d or error . d ∈ S2 or d ∈ S3 , depending on y 1: procedure UNPACKGD(y)

2: t ← Parse(y, F2); . Parsing with format F3 would have the same effect

3: if !CheckMsgId(t[0]) then

4: return error;

5: end if AES GMP j 6: return CTR-DECt[1] ((kA ) , t[2]); 7: end procedure

UNPACKGD parses the input with format F2 into its main components, decrypts the ciphered compo- nent and returns the obtained plaintext; it runs in time O(n). Note that parsing for F3 achieves the same ∗ ∗ result since these two formats only differ in the inner format layers F2 and F3 .

Algorithm 19 GD’s verification algorithm Input: d Output: x or error . Confidential plaintext

1: procedure VERIFYGD(d)

2: if msb(d) == 1 then ∗ 3: t ← Parse(d, F3 ); GMP j GMP j 4: if VERIFY(t[1], t[3], (kH ) ) and VERIFY(t[2], t[3], (kH ) ) then 5: return t[3];

6: end if

7: else ∗ 8: t ← Parse(d, F2 ); GMP j 9: if VERIFY(t[1], t[2], (kH ) ) then 10: return t[2];

11: end if

12: end if

13: return error;

14: end procedure

After the ciphered contents are revealed, the system proceeds to the required verification. To this end, the procedure VERIFYGD ∈ O(n) descripted in algorithm 19 is called. msb(d) returns the most significant bit of the word d and is performed in constant time (O(1)) because it only needs to extract the first bit in memory from the required field.

72 Algorithm 20 gathering device’s data processing algorithm Input: y Output: x or error . Confidential plaintext

1: procedure PROCESSGD(y)

2: d ← UNPACKGD(y);

3: return VERIFYGD(d);

4: end procedure

Figure 5.4 contains a high-level visualization of the interaction between the specified algorithms used in the DDL, thus it has been named of Downstream Algorithm Lifecycle (DAL).

PROCESSGD PROCESSMPPMNDM PACKMNDM

• UNPACKGD • UNPACKMPPMNDM

• VERIFYGD • PACKMPPGD

Figure 5.4: Downstream Algorithm Lifecycle

73 74 Chapter 6

Results

With this work one can easily observe that what is better in theory may not always be more suitable for the specific practical case at hand, where the real constraints must be thoroughly taken into account. The study, decisions and analysis of this specific network were performed under the supervision of analysts and developers of the company GMVIS Skysoft, S.A..

The keys’ generation process is very reliable in the sense that it is not only performed within secured headquarters but also a very efficient method regarding security and time. More specifically, it is a linear-time process with respect to the number of gathering devices that are to be deployed.

The selected network topology is considered to be the one that better suits the practical needs of the mission, whilst in theory an ad-hoc network might have a better performance when combined with elliptic curves [45].

The set of chosen packing schemes is considered to be a robust and secure option for the case, but would achieve a higher level of security if adopting an encrypt-then-MAC method of encryption and authentication with addition to including the header in the input to the HMAC; this approach would assure the system to be IND-CCA secure. However, even though it would strengthen the theoretical level of security, it would have no impact in practice because the variable size of the plaintexts induce an inexorable fragility. As for the encryption scheme, AES-GCM should be preferred over AES-CTR-HMAC in order to grant authenticity, integrity and privacy to the plaintexts in a theoretical point of view. The former is underqualified simply because it is not implemented in the hardware of this particular type of devices. Would any other devices with distinct characteristics have been chosen, the outcome would certainly differ from the one presented. All packing schemes are vulnerable to chosen ciphertext attacks which is a fact of some concern because an attacker with access to a decryption oracle might be able to break the system, even if just partially. There is virtually no way of preventing an adversary of performing a lunchtime attack [40] when the devices are in sleep mode.

All the data processing methods are linear in the size of the input. Note that the size of each of the inputs to the data processing algorithms descripted in section 5.2 is dependent of the size of the confi- dential plaintext. It is indeed the only dependency since the size of the confidential plaintext is the only variable term when computing the size of each of the message formats. Given the GDs’ memory limita-

75 tion, the size of the confidential plaintexts generated by these elements has an upper bound according to equation 3.13, which implies that the running time of the previously mentioned data processing algo- rithms is also upper bounded due to this constraint, for their time complexity is O(n). Thus, in practice, these algorithms are time-efficient.

6.1 Future Work

One possible improvement to the amplitude of the given network would be to allow the parallel activity of more than one MPP. This would require more keys to be generated not only for privacy and integrity purposes on the data, but such that all the MPPs are uniquely recognizable by the network parties (that is, provide an authentication mechanism). Another subject with good prospects is the hardware improvement of the devices such that their capabilities allow more efficient and secure packing schemes. By efficient one means both in terms of time and space complexity. A good example is to implement in hardware a randomized primitive that makes use of analogue entropy sources in order to obtain fairly randomized values. This feature would be extremely useful for the IV scheduler within the GDs and, in the event of increasing the devices’ battery lifetime, it would also be very fruitful for the development and maintenance of a key scheduler algorithm. In addition, a potential improvement would be to implement in hardware standardized authenticated modes of operation such as GCM. Thus, adopting AES-GCM instead of AES-CTR-H would optimize the system’s memory usage and therefore allow the GDs to be able to store more messages as well as shorten the GDs’ sleep mode time-frame.

76 References

[1] G. Bertrand. Enigma: ou, La plus grande enigme´ de la guerre 1939-1945. Plon, 1973.

[2] Bellare, Mihir and Rogaway, Phillip. Course Notes: Introduction to Modern Cryptography. University of California, San Diego.

[3] Yodai Watanabe, Junji Shikata, and Hideki Imai. Equivalence between Semantic Security and Indistinguishability against Chosen Ciphertext Attacks. RIKEN Brain Science Institute, 2003.

[4] Claude Shannon. https://en.wikipedia.org/wiki/Claude_Shannon.

[5] Shannon, Claude. Communication Theory of Secrecy Systems. 1949.

[6] Matsui, Misturu. Linear Cryptanalysis Method for DES Cipher. Computer and Information Systems Laboratory.

[7] Douglas Stinson. Cryptography: Theory and Practice,Third Edition. CRC/C&H, 3rd edition, 2005.

[8] National Institute of Standards and Technology. FIPS PUB 46-3: Data Encryption Standard (DES). National Institute of Standards and Technology, Gaithersburg, MD, USA, October 1999. Super- sedes FIPS PUB 46-2 1993 December 30.

[9] Michael Luby and Charles Rackoff. How to construct pseudorandom permutations from pseudo- random functions. SIAM Journal on Computing, 17(2):373–386, 1988.

[10] Rijmen, Vincent Daemen, Joan. AES Proposal: Rijndael. April 2003.

[11] Douglas Stinson. Substitution-permutation networks. In Cryptography: Theory and Practice,Third Edition, pages 74–79. CRC/C&H, 2005.

[12] National Institute of Standards and Technology. FIPS PUB 197: Advanced Encryption Standard (AES). National Institute of Standards and Technology, Gaithersburg, MD, USA, November 2001.

[13] M.J.B. Robshaw. Stream ciphers. Technical report, RSA Data Security, Inc. ftp://ftp. rsasecurity.com/pub/pdfs/tr701.pdf.

[14] National Institute of Standards and Technology. Recommendation for Block Cipher Modes of Op- eration. National Institute of Standards and Technology, Gaithersburg, MD, USA, 2001.

77 [15] Dworkin, Morris. Recommendation for Block Cipher Modes of Operation: The CCM Mode for Authentication and Confidentiality. National Institute of Standards and Technology, Gaithersburg, MD, USA, May 2014.

[16] Dworkin, Morris. Recommendation for Block Cipher Modes of Operation: Galois/Counter Mode (GCM) and GMAC. National Institute of Standards and Technology, Gaithersburg, MD, USA, November 2007.

[17] Phillip Rogaway, Mark Wooding, and Haibin Zhang. The Security of Ciphertext Stealing, pages 180–195. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012.

[18] R. Housley. Cryptographic message syntax (cms). STD 70, RFC Editor, September 2009. http: //www.rfc-editor.org/rfc/rfc5652.txt.

[19] National Institute of Standards and Technology. FIPS PUB 180-4: Secure Hash Standard. National Institute of Standards and Technology, Gaithersburg, MD, USA, April 1995. Supersedes FIPS PUB 180-3 2012 March 6.

[20] Stevens, Marc and Bursztein, Elie and Albertini, Ange and Markov, Yaric. The first collision for full SHA-1. 2017.

[21] Hugo Krawczyk, Mihir Bellare, and Ran Canetti. Hmac: Keyed-hashing for message authentication. RFC 2104, RFC Editor, February 1997. http://www.rfc-editor.org/rfc/rfc2104.txt.

[22] Andrew Chi-Chih Yao. Theory and applications of trapdoor functions. In 23rd IEEE Symposium on Foundations of Computer Science, 1982.

[23] Oded Goldreich. Pseudorandom functions. In Foundations of Cryptography: Volume 1, pages 106–113, New York, NY, USA, 2006. Cambridge University Press.

[24] B. Kaliski. PKCS #5: Password-Based Cryptography Specification Version 2.0. RFC 2898, RFC Editor, September 2000. http://www.rfc-editor.org/rfc/rfc2898.txt.

[25] B. Kaliski. PBKDF2. In PKCS #5: Password-Based Cryptography Specification Version 2.0, pages 9–11. RFC Editor, 2000.

[26] Ertaul, Levent and Kaur, Manpreet and Gudise, V. A. K. R . Implementation and Performance

Analysis of PBKDF2, Bcrypt, Scrypt Algorithms. http://www.mcs.csueastbay.edu/~lertaul/ PBKDFBCRYPTCAMREADYICWN16.pdf.

[27] Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications. IEEE Std. 802.11i-2004, August 1999.

[28] National Institute of Standards and Technology. FIPS PUB 800-48: Guide to Securing Legacy IEEE 802.11 Wireless Networks. National Institute of Standards and Technology, Gaithersburg, MD, USA, July 2008.

78 [29] Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications Amendment 6: Medium Access Control (MAC) Security Enhancements. IEEE Std. 802.11i-2004, July 2004.

[30] Tews, Erik Beck, Martin. Practical attacks against WEP and WPA. November 2008.

[31] Kohno, Tadayoshi Bellare, Mihir. Hash Function Balance and its Impact on Birthday Attacks. May 2004.

[32] Burt Kaliski. Pkcs #7: Cryptographic message syntax version 1.5. RFC 2315, RFC Editor, March 1998. http://www.rfc-editor.org/rfc/rfc2315.txt.

[33] Lars Knudsen and David Wagner. Integral Cryptanalysis, pages 112–127. Springer Berlin Heidel- berg, Berlin, Heidelberg, 2002.

[34] Andrey Bogdanov, Dmitry Khovratovich, and Christian Rechberger. Biclique Cryptanalysis of the Full AES. August 2011.

[35] A. C, R. P. Giri, and B. Menezes. Highly efficient algorithms for aes key retrieval in cache access attacks. In 2016 IEEE European Symposium on Security and Privacy (EuroS P), pages 261–275, March 2016.

[36] Z. Manna and A. Pnueli. The Temporal Logic of Reactive and Concurrent Systems: Specification. Springer New York, 2012.

[37] Jon Postel. Transmission control protocol. STD 7, RFC Editor, September 1981. http://www. rfc-editor.org/rfc/rfc793.txt.

[38] Bellare, Mihir. New Proofs for NMAC and HMAC: Security without Collision-Resistance. June 2006.

[39] A.C.A. Nascimento and P. Barreto. Information Theoretic Security: 9th International Conference, ICITS 2016, Tacoma, WA, USA, August 9-12, 2016, Revised Selected Papers. Lecture Notes in Computer Science. Springer International Publishing, 2016.

[40] P. Rogaway, D. Pointcheval, A. Desai, and M. Bellare. Relations Among Notions of Security for Public-Key Encryption Schemes. June 2001.

[41] Douglas Cook. Measuring memory protection. In Proceedings of the 3rd International Conference on Software Engineering, ICSE ’78, pages 281–287, Piscataway, NJ, USA, 1978. IEEE Press.

[42] Kristine Amari. Techniques and Tools for Recovering and Analyzing Data from Volatile Memory. March 2009.

[43] Grand, Joe. Practical Secure Hardware Design for Embedded Systems. In Proceedings of the 2004 Embedded Systems Conference. CMP Media, April 2004.

[44] Krawczyk, Hugo and Bellare, Mihir and Canetti, Ran. RFC 2104: HMAC: Keyed-Hashing for Mes- sage Authentication. 1997.

79 [45] Douglas Stinson. Elliptic curves. In Cryptography: Theory and Practice,Third Edition, pages 254– 266. CRC/C&H, 2005.

[46] The Java Tutorials. https://docs.oracle.com/javase/tutorial, 1995.

[47] The ASCII Character Set. http://ee.hawaii.edu/~tep/EE160/Book/chap4/subsection2.1.1. 1.html, August 1994.

80 Appendix A

Schemes of Block Cipher Modes of Operation

P p1 p2 ··· pn C c1 c2 ··· cn

input input

B B B B B B Ek Ek Ek Dk Dk Dk

output output

C c1 c2 ··· cn P p1 p2 ··· pn

(a) ECB encryption. (b) ECB decryption.

Figure A.1: ECB mode encryption and decryption procedures using an arbitrary block cipher B.

81 P p1 p2 ··· pn

IV ···

B B B Ek Ek Ek

···

C c1 c2 ··· cn

(a) CBC encryption.

C c1 c2 ··· cn

···

B B B Dk Dk Dk

IV ···

P p1 p2 ··· pn

(b) CBC decryption.

Figure A.2: CBC mode encryption and decryption procedures using an arbitrary block cipher B.

82 p1 p2 ··· pn

B B B IV Ek Ek ··· Ek

k1 k2 kn ···

c1 c2 ··· cn

(a) CFB encryption.

C c1 c2 ··· cn

···

B B B Ek Ek Ek

IV ···

P p1 p2 ··· pn

(b) CFB decryption.

Figure A.3: CFB mode encryption and decryption procedures using an arbitrary block cipher B.

83 IV t1 t2 tn

p1 p2 pn

B B B Ek Ek ··· Ek

c1 c2 cn

(a) CTR encryption.

IV t1 t2 tn

c1 c2 cn

B B B Ek Ek ··· Ek

p1 p2 pn

(b) CTR decryption.

Figure A.4: CTR mode encryption and decryption procedures using an arbitrary block cipher B.

84 Appendix B

User Manual: Key Generation Application

An application has been developed with the objective of providing the reader a close look on how the keys are generated in practice. KeyGeneratorApp is the executable JAR file [46] that contains the mock-up program for the key generation. This application is able to upload the keys directly into a serial- connected device or into an encrypted file with a pre-defined data format. The first case is out of the scope of this text and thus only the second is going to be discussed step-by-step.

When running KeyGeneratorApp, one should see a window similar to Figure B.1a so that the user can fill the required fields. Figure B.1b contains a suggestive filling of the fields and it is going to be considered hereinafter given that these fields define the produced keystream. Upon clicking on the ”Generate Keys” button the program generates the whole keystream and saves it in volatile memory while waiting for the next order. A pop-up window should appear similarly to Figure B.1c; by clicking ”Yes” the program proceeds.

85 (a)

(b) (c)

Figure B.1: KeyGeneratorApp’s initial screen.

The environment of the interface should now change according to Figure B.2a. There are several options to be chosen, one of which is triggered by the button ”Generate another key set”; it goes back to the previous key generation step in order to overwrite the current keystream with a new keystream based on new inputs chosen by the user. This option is advised if the user wants to change some of the previous inputs. In the last mentioned figure there are three options to be selected for the destination target, i.e., the user chooses the option where to upload the keys. The first two options are dependent on a serial-connected device and as previously mentioned this section will not discuss such scenario, so the only remaining viable field to be selected in this situation is ”Encrypted File”. Figure B.2b details the chosen sequence: the keys associated with GD with identifier did = 6 in the program’s memory are going to be exported to a file.

86 (a)

(b)

Figure B.2: KeyGeneratorApp’s target choice screen.

Upon clicking ”Next” a window similar to Figure B.3a should appear and the user can now choose the name and path in the file system of the (encrypted) file that will hold the keys, and the password that is used by the password based key derivation function which outputs the key used in the CTR mode of operation with the AES cipher. The length n of this password must satisfy 8 ≤ n ≤ 63. The filling of the fields in this image are merely illustrative. Nevertheless the same options can be used apart from the file path, which must be chosen according to the user’s local file system. After filling the fields, the button ”Generate File” creates the .enc encrypted file with the chosen password in the desired location, containing the envisaged keys.

87 (a)

(b)

Figure B.3: KeyGeneratorApp’s file details.

After all these steps are concluded the button ”Finish” is enabled and its action triggers the image depicted in Figure B.4. Here, the user has several options:

• Communication Application Test: starts the communication mock-up application for the message interaction between the GD and the MPP. This application is out of the scope of this text, since to run this executable there are additional requirements uniquely in the possession of the developers;

• Key Checker Application Test: decrypts and exports the keys within a previously encrypted file in the file system into a file with extension .txt;

• Export to another target: goes back to the selection of the destination target for the current key set (Figure B.2);

• Generate another key set: resets the program by clearing the memory associated with the current keystream and goes back to the initial screen (Figure B.1);

• Exit: safely exits the program.

88 Figure B.4: KeyGeneratorApp’s key export final step.

By selecting ”Key Checker Application Test” as the next step, a window similar to Figure B.5a shall appear. In the upper right corner the ”Show Help” button drops down a description of the behaviour of the program. The user can now choose one of the files previously created lying in the file system and fill the password field with the password that matches the one used in the file’s encryption. The type of file being decrypted is also a required parameter to be chosen since the program needs to parse the file’s contents. The parameters chosen throughout this guide are the following:

• WLAN SSID: MISSION2801WINET ;

• WLAN password: secretpassword;

• Seed: myrandomseed;

• Number of GD: 73;

• Target file type: GD;

• ID: 6;

• File name: GD6Keys.enc;

• File path: C:\Users \Ricardo \Desktop;

• Password for encrypting the file: fileEncPass;

(a) (b)

Figure B.5: KeyGeneratorApp’s key checker example screen.

89 Figure B.6: Pre-deployment stage secret information’s revealment.

Therefore the chosen options for this case should cope with Figure B.5a. The file with extension .txt is created in the same location as the file with extension .enc in the file system. Figure B.6 illustrates the contents of the GD6Keys.txt file for the abovementioned parameters, which can be opened by the reader in any way of his choice; in this case the source code editor Notepad++ was used. Each entry corresponds to an element of the keystream and is represented by an array of byte values, that is each element a of the array is such that a ∈ Z256, according to the ASCII character set [47].

90 Appendix C

Message Formats

0 63          HMAC-SHA-256  256-bit HMAC h1(D)  FH i   with key (kH )     Header         HMAC-SHA-256  256-bit HMAC h2(D)  FM i   with key (kH )     ( confidential D

2 . plaintext .

∗ Figure C.1: Message format F1

0 7 8 39 63   fid mid  Header  128-bit initialization vector IV  AES-CTR   encrypted   ∗ Data with format F1 with key (kFH)i  A  and IV

Figure C.2: Message format F1

The gray field in figure C.2 represents the absence of elements in that position. It was chosen to be pictured this way for a better visualization of the fields.

2Length may be variable.

91 0 1 31 f

HMAC-SHA-256 h1(D) FH i with key (kH )

confidential D . plaintext .

∗ Figure C.3: Message format F2

0 31

mid      Header 128-bit IV      AES-CTR   ∗ encrypted Data with format F2  FH j  with key (kA )

Figure C.4: Message format F2

92 0 1 31 f

HMAC-SHA-256 h1(D) FH i with key (kH )

HMAC-SHA-256 h2(D) FM i with key (kH )

confidential D . plaintext .

∗ Figure C.5: Message format F3

0 31

 mid     Header  128-bit initialization vector IV     AES-CTR   ∗ encrypted Data with format F3  FH j  with key (kA ) and IV

Figure C.6: Message format F3

93 0 7 8 63  f  id     128-bit initialization vector IV            HMAC-SHA-256  h1(D) FH i Header  with key (kH )               HMAC-SHA-256   h2(D)   with key (kFM)i   H   AES-CTR   encrypted   ∗ Data with format F1 with key (kFH)i  A  and IV ∗ Figure C.7: Message format F4

0 63         HMAC-SHA-256   h3(enc pack)   with key kHM  H Header       IV  128-bit initialization vector AES-CTR   encrypted  enc pack: ∗ with key kHM  Data with format F A  4 and IV

Figure C.8: Message format F4

0 63     HMAC-SHA-256 h2(D)  with key (kFM)i  H  ) D Confidential . . plaintext

∗∗ Figure C.9: Message format F5

94 0 7 8 63      h (enc pack)  3   Header   fid     128-bit initialization vector IV  AES-CTR   encrypted  enc pack: ∗∗ with key (kFH)i  Data with format F A  5 and IV ∗ Figure C.10: Message format F5

0 63 ) 128-bit initialization vector IV Header AES-CTR   encrypted   ∗ Data with format F5 with key kHM  A  and IV

Figure C.11: Message format F5

95