<<

REDUCED VECTOR TECHNIQUE HOMOMORPHIC WITH VERSORS

A SURVEY AND A PROPOSED APPROACH

by SUNEETHA TEDLA B.Sc, Osmania University, India 1993 M.C.A, Osmania University, India 1998

A dissertation submitted to the Graduate Faculty of the

University of Colorado Colorado Springs

in partial fulfillment of the

requirements for the degree of

Doctor of Philosophy

Department of Computer Science

2019

© 2019

SUNEETHA TEDLA

ALL RIGHTS RESERVED

ii

This dissertation for the Doctor of Philosophy degree by

Suneetha Tedla

has been approved for the

Department of Computer Science

by

Carlos Araujo, Co-Chair

C. Edward Chow, Co-Chair

T.S. Kalkur

Jonathan Ventura

Yanyan Zhuang

Date: 3 May 2019

iii

Tedla, Suneetha (Ph.D., Security)

Reduced Vector Technique Homomorphic Encryption with Versors

A Survey and a Proposed Approach

Dissertation directed by Professors Carols Araujo and C. Edward Chow

ABSTRACT

In this research, a new type of homomorphic encryption technique, based on geometric

algebra and versors, called Reduced Vector Technique Homomorphic Encryption

(RVTHE) is designed, developed and analyzed. This new method is optimized to

be faster and compact in cipher length while preserving the security strength.

Performance criteria are proposed to generate benchmarks to evaluate the homomorphic encryption for a fair comparison to benchmarks used for non-homomorphic encryption. The basic premise behind these performance criteria is to establish the understanding of the baseline to measure the variations of performance between different encryption methods for Cloud Storage type Solid State Drives (SSDs). Significant differences in throughput performances, up to 20-50% decreases, are observed among encryption software methods on Cloud storage SSD or encrypted SSDs.

The central thesis of the research is to verify that homomorphic encryption is better

accomplished with the use of versors instead of multi-vectors. Using properties of versors,

it is possible to design a homomorphic cipher that has simple structure versality of

assignments while achieving a great speed that rivals existing non-homomorphic .

iv

In the thesis, I demonstrated that the versors based homomorphic encryption is faster than an existing non-homomorphic encryption AES which based on AES. It is shown that

RVTHE is a symmetric somewhat homomorphic encryption performing addition, deletion, scalar multiplication, and scalar division. The evaluation of the implementation shows a file can be edited/appended in .001 sec. And it showed, in the case of full file encryption,

RVTHE is 75% faster on encryption and 25% slower on decryption, compared with the

AES-Crypt encryption software which implements the AES standard. The sizes of RVTHE are found to be reduced on average of 25% from those of previous approaches using multi-vectors and Clifford Geometric Algebra. RVTHE has the potential for use as an encryption method on real workloads.

Keywords: Encryption, Homomorphic, AES, SSD, AES-Crypt, Vectors, Versors.

v

DEDICATION

I wish to dedicate this body of research to my husband and my best friend Shravan

Tedla; with him everything is possible for me.

vi

ACKNOWLEDGEMENTS

I am blessed with beautiful people in my life. I am very thankful to all who supported me with my journey of schooling. I really appreciate all the support, encouragement, love and understanding provided by my family, friends, colleagues and Advisory Committee.

A special thank you to Dr. Carols Araujo and Dr. C. Edward Chow for their support, sharing their knowledge, and guiding me for the last several years. Dr. Xiaobo Charles

Zhou advised me prior to Dr.Carols Araujo, and I am very thankful to Dr. Xiaobo providing me the skills and insight needed to pursue my Ph.D. I very much enjoyed and admired Dr. Carols Araujo’s knowledge and the way he educates his thoughts to create a new way of doing the security, and that helped me tremendously for my research. I really appreciate Dr. Chow’s support and knowledge while discussing the ideas and analyzing how to put my thoughts and ideas into actions. I am very thankful to both of you. I appreciate my Advisory Committee members: Dr. Jonathan Ventura, Dr. Yanyan

Zhuang, Dr. T.S.Kalkur providing me their feedback and support. Many thanks to Ali

Langfels who helps all the students with a great smile while managing all the administrative work. I am very thankful to my parents and my in-laws; one gave me the beautiful life and one provided me the beautiful life partner with their unconditional love and support. I am blessed with beautiful friend, my husband Shravan Tedla, and my kids

SaiKiran and Siddhartha and my gratitude to them supporting me in all aspects of my life including my Ph.D. I am very thankful to my friend Tim Murphy spending so many hours to help me to write thesis.

vii

TABLE OF CONTENTS

CHAPTER ONE INTRODUCTION ...... 1

1.1 Security Terminology ...... 2

1.2 Security systems...... 4

1.3 Cloud Storage Security ...... 5

1.4 Design Criteria for Cryptographic Algorithm...... 6

1.5 Encryption ...... 7

1.6 Homomorphic Encryption ...... 8

1.7 Possible fully homomorphic encryption method ...... 10

1.8 Vector product spaces with Clifford Geometric Algebra ...... 13

1.9 Dissertation Contributions ...... 14

1.10 Dissertation Organization ...... 15

CHAPTER TWO BACKGROUND ...... 17

2.1 Cloud Storage SSD ...... 17

2.1.1 Data Reliability and Integrity ...... 18

2.1.2 Sanitization and Secure Deletion of SSD ...... 18

2.2 Survey of Various Encryption Approaches ...... 19

2.2.1 Block Ciphers ...... 20

viii

2.2.2 Modes...... 29

2.2.3 Encryption Methods for SSD ...... 36

2.2.4 Comparable Encryption for Evaluations ...... 40

2.2.5 Homomorphic Encryption ...... 40

2.3 Mathematical Foundation ...... 41

2.3.1 Geometric Algebra Overview ...... 45

2.3.2 Inner Product ...... 47

2.3.3 Outer Product ...... 48

2.3.4 Geometric Product...... 49

2.3.5 Inverse of Vector ...... 52

2.3.6 Versors ...... 52

CHAPTER THREE PROBLEMS AND LIMITATIONS ...... 55

3.1 Defining the Problem ...... 55

3.1.1 Encryption Security Limitations and Problem ...... 56

3.1.2 Encryption Limitations:...... 57

3.2 Other problems contributed for research motivation ...... 57

3.2.1 Cyber Attacks ...... 59

3.2.2 Real Randomness ...... 60

3.2.3 Storage Security Limitations ...... 61

3.2.4 SSD System Level Induced Limitations ...... 62

ix

3.2.5 Existing research to mitigate the software limitations ...... 70

CHAPTER FOUR STORAGE ENCRYPTION ANALYSIS ...... 78

4.1 Measurement Environment ...... 78

4.1.1 Selection of Encryption methods ...... 80

4.1.2 Experimental Tools and Workloads ...... 82

4.2 SSD performance without Encryption ...... 83

4.2.1 Performance differences between Amazon EC2 VMs ...... 83

4.2.2 Did various block sizes significantly affect I/O throughput? ...... 84

4.2.3 Did various levels of parallelism affect I/O throughput? ...... 85

4.2.4 Did random and sequential jobs have a different IOPS? ...... 86

4.2.5 SSD Random Workload Analysis on t2.micro VM ...... 87

4.3 SSD performance with Encryption ...... 89

4.3.1 Did various block sizes significantly affect IOPS ...... 90

4.3.2 Did various block sizes affect Performance Throughput ...... 92

4.3.3 Did various Versus Performance Throughput ...... 93

4.3.4 Reads, Writes and Mixed workloads Versus Block Sizes...... 95

4.4 Fully Homomorphic Encryption Limitations...... 96

4.4.1 FHE with Vector Space ...... 96

4.4.2 Previous homomorphic encryption using multi-vector technique...... 97

CHAPTER FIVE RVTHE ...... 99

x

5.1 Design of RVTHE...... 99

5.1.1 RVTHE Encryption and Decryption ...... 100

5.1.2 Encryption of RVTHE ...... 100

5.1.3 Decryption of RVTHE ...... 100

5.2 Mathematical Implementation of RVTHE Using Versors...... 101

5.3 Homomorphism of RVTHE ...... 102

5.3.1 Addition ...... 103

5.3.2 Subtraction ...... 104

5.3.3 Multiplication ...... 105

5.3.4 Division ...... 105

5.4 Security of RVTHE...... 106

5.4.1 RVTHE Key Strength ...... 106

5.4.2 High Level Evaluation ...... 107

CHAPTER SIX IMPLEMENTATION AND EVALUATION OF RVTHE ...... 110

6.1 Implementation of RVTHE...... 110

6.2 Experimental Systems ...... 112

6.3 Experimental Evaluations ...... 112

6.3.1 Time measurements on various key sizes ...... 113

6.3.2 Time measurements on various file sizes ...... 114

6.3.3 Size measurements on Encrypted Files ...... 115

xi

CHAPTER SEVEN LESSONS LEARNED AND FUTURE WORK ...... 116

7.1 Challenges and Lessons Learned ...... 116

7.2 Future Work ...... 119

CHAPTER EIGHT CONTRIBUTIONS ...... 122

CONCLUSION ...... 124

REFERENCES ...... 127

Appendix A – Cloud Storage SSD ...... 137

Appendix B – Cloud Storage and Encryptions ...... 139

Appendix C – Multi-Vector Based Encryption ...... 146

Appendix D – Demonstrate RVTHE ...... 148

xii

xiii

LIST OF FIGURES

Figure 1 - [27]...... 21

Figure 2 - TDEA [27] ...... 22

Figure 3 – AES encryption process [35] ...... 23

Figure 4 - Algorithm [38] ...... 25

Figure 5 – process [40] ...... 26

Figure 6 – Algorithm - [44] ...... 28

Figure 7 - CBC Encryption and Decryption [27]...... 29

Figure 8 - CFB mode with 8 bits [27] ...... 31

Figure 9 - XTS mode [27] ...... 33

Figure 10 - GCM mode [49] ...... 34

Figure 11 - Outer Product [59] ...... 48

Figure 12 - Address Mapping between physical to logical [11] ...... 63

Figure 13 - Flashes and their parallel architecture[11] ...... 67

Figure 14 - Consumer Vs Enterprise SSD [11] ...... 69

xiv

LIST OF GRAPHS

Graph 1 - IOPS Vs Block Size ...... 85

Graph 2 -Parallelism Vs Throughput ...... 86

Graph 3 - Random Versus Sequential Operations ...... 87

Graph 4 - t2.micro Block Size Versus IOPS...... 88

Graph 5 - t2.micro Block Size Versus KB/Sec ...... 88

Graph 6 - Encrypted SSD Block Size Versus IOPS ...... 90

Graph 7 - Best Crypt Block Size Vs IOPS ...... 91

Graph 8 - Dm-crypt Block Size Vs IOPS ...... 91

Graph 9 - Encrypted EBS SSD Volume Block Size Versus throughput ...... 92

Graph 10 - BestCrypt Block Size Versus Throughput...... 92

Graph 11 -Dm-Crypt Block Size Versus Throughput ...... 93

Graph 12 - Encryption Methods versus IOPS...... 94

Graph 13 - Encryption Methods versus Throughput ...... 94

Graph 14 - Read workloads for various Block Sizes ...... 95

Graph 15 – Write workloads IOPS for various Block Sizes ...... 95

Graph 16 - Mixed Workloads IOPS for Various Block Sizes ...... 96

Graph 17 – Multi-vector Based Homomorphic Encryption ...... 97

Graph 18 – Multi-vector based encrypted file sizes ...... 98

Graph 19 - Vs Encryption/Decryption time in Sec ...... 113

Graph 20 - File size and Encryption/Decryption times ...... 114

Graph 21-Key Size and Time on Regular SSD ...... 114

xv

Graph 22 - Encrypted file sizes in MB ...... 115

xvi

LIST OF TABLES

Table 1--AES Key Size and Number of Rounds [35]...... 24

Table 2 - Key and data location in versors ...... 99

CHAPTER 1

INTRODUCTION

Rapid changes in information technology, specifically the need to use data from anywhere, are leading users to use Cloud environments with the expectations of availability

(able to provide the data access as needed), reliability, solid integrity (maintain the data reliability accuracy throughout its life cycle), and full security (assuring the data is accessed by only authorized parties with authorized level of access). In this digital age, protecting the PII (Personal Identifiable Information) is imperative. Tax IDs, Medical

Information, Credit Information, and other extremely sensitive data needs to be secured at the highest level, because it can be used for Identity theft and other information crimes [1].

Various methods or processes are implemented to secure the data; among these methods, encryption techniques are the most commonly used. Scholars have been implementing different cryptographic algorithms and methods such as the following: ,

Public-Key encryption, Digital Signatures, and PKI. Cryptographic algorithms consist of

Block Ciphers (DES, AES, Serpent, and Twofish), Blocker Cipher Modes (, ECB,

CBC, Fixed IV, Counter IV, Random IV, Nonce-Generated IV, OFB, CTR, Combined

Encryption, Authentication, and Hash functions (MD5, SHA-1, SHA-2, SHA-256, SHA-

512). But even with all these encryption methods each one requires full decryption of all the data including decrypting the sensitive data. Also, I observed a significant difference between throughput penalties up to 20-50% using encryption software methods on Cloud storage SSD or encrypted SSDs, as described in abstract. FHE (Fully Homomorphic

2

Encryption) allows computing on encrypted data without decrypting it, keeping sensitive data encrypted and thus not exposed [2] [3].

This thesis is organized as follows: Chapter 1 discusses the introduction. Chapter 2 discusses the most common techniques to secure systems or data. Chapter 3 presents background work and shows the proof of performance penalties of cloud storage SSD and encryption software methods. Chapter 4 introduces math and RVTHE. Chapter 5 presents use of RVTHE evaluation in real workloads. Chapter 6 discusses future work. Chapter 7 concludes the thesis.

This chapter discusses introduction of research in terms of generic survey of overall security and storage. Discusses about terminology of storage, cloud, security systems and various encryptions methods and ciphers.

1.1 Security Terminology

The “Security” word Originated from “Late Middle English: from Old French

“securite” or Latin securitas, from securus ‘free from care’ and it means “check to ensure that all nuts and bolts are secure” [4]. The following are some of the most used terms in the field of cyber security. They will help to clearly define their role in Information

Technology System Security [5].

• Assurance: Specific security method implementation that has adequately met

these four security goals: integrity, availability, confidentiality, and accountability.

• Integrity: Ensuring the data in intact with all the modifications only with proper

allowable authenticity.

• Availability: Able to provide timely reliable access to an entity.

3

• Confidentiality: A set of practices and procedures that supports a security policy.

• Accountability: Principle that an authorized individual is responsible to follow the

safeguard controls of the system.

• Asymmetric Encryption: This encryption method which uses two unique keys, a

public key for encryption and a private key for decryption. It is impossible to derive

the private key from the public key.

• Authentication: Able to verify the identity of an individual or system accessing an

entity.

• Block Cipher: Arrays of bytes in the form of binary bits that are used as input,

output, state, and round key in the encryption process.

o State: Intermediate Cipher of encryption process.

o Round Key: Values derived from Cipher Key.

• Cipher and Ciphertext: A procedure containing a series of operations that convert

plaintext to ciphertext. Output generated from a Cipher method on plain text.

• Classified Information: Information requiring the highest level of security and

mandating authorized access.

• Cloud Computing: Way to provide network access of shared resources that can be

rapidly provisioned with minimal effort.

• Cryptography: Study that incorporates the foundations, mechanisms, or methods

used to hide data and protect it from unauthorized access.

• Cyber Attack: Intentionally disrupting the assurance of a system or the data.

• Decryption: A technique of converting ciphertext to plaintext.

• Encryption: A technique of converting plaintext to ciphertext.

4

• Key: A secret code needed to perform encryption and decryption.

• Private Key: A key needed for the decrypting process of asymmetric encryption.

• Public Key: A key needed for the encrypting process of asymmetric encryption.

• Reliability: A system is consistently performing with quality.

• Symmetric Encryption: A form of an encryption that uses same key for encryption

and decryption process.

• User: An individual who has proper level of authorization to access the system.

1.2 Security systems

“A security system is only as strong as its weakest link.” [6]

We can only guarantee the level of security of the system depending on how strongly we secured the weakest links. If we create an attack tree for any real system, it will provide an insight for possible lines of attack [7]. If we leave one single weak link, the rest of the system would be just as vulnerable, even with having the strongest security elsewhere.

Secrecy systems are broken three categories:

• Concealment Systems use a fake covering cryptography method to hide a message.

• Privacy Systems need special equipment to recover the original message.

• “True” Secrecy Systems use cipher for recovering the message.

To build a “True” secrecy system, one must follow the design criteria for a cryptographic algorithm [8].

5

1.3 Cloud Storage Security

Cloud uses SSD and there has been a lot of research done related to SSD characteristics, internal design, and performance for different types of workloads [9] [10]. Previous studies have shown SSD outperforms HDD in speed while accessing the data from each device

[11] but this research had not considered encryption on an SSD. There has also been a lot of research related to different types of encryption methods, vulnerable attacks, and secure methods [6] [12]. These existing algorithms were suitable for regular HDD, but they may not be optimal for SSD. This is because, with SSD, the physical structure is different, so the encryption algorithms for HDD might not be ideal or even compatible for SSD.

There is a need for research to make sure these encryption methods are good enough; they could be measured by calculating their impact on SSD in terms of performance and security. The rethinking of existing encryption algorithms is good for SSD or coming up with new algorithms that will accommodate new environments like the Cloud. The best encryption method could be found using an assessment between already-existing encryptions and new encryption methods. For this I study first about SSD’s physical and logical limitations.

The research showed workloads performances improved, always adding SSD or just

SSD as the storage. SSD is faster than HDD [39] [13], so adding it to the storage system is what is expected to improve performance. Very little research happened to show the impact on performance of the different types of workloads with the different encryption methodologies. When I explored what type of encryption is better for the cloud, we need to consider data at all stages, which means data traveling and data at rest for cloud [14]. This

6 can be accomplished by using fully data-centric security [15]. This can also be accomplished by using homomorphic encryption methods.

1.4 Design Criteria for Cryptographic Algorithm

Encryption is a small component of the system but provides a higher level of security during cyber-attacks [6]. Encryption is the original goal of cryptography. Encryption converts plain text into unreadable data which is also called ciphertext. A good encryption makes it impossible to find the plaintext from the ciphertext without knowing the key.

With good encryption, the only information that will be accessible is the plaintext length and the time stamp [16]. The following are some of the design principles that will help to generate stronger cipher [6].

• Algorithm should provide effective security, should be easy to use, and completely

stated.

• Security should depend on the key secrecy, not on the algorithm secrecy.

• Algorithm should be available to users, adaptable to applications and systems.

• Algorithm must be implementable on a targeted system.

• Algorithm should be efficient, verifiable, and portable between systems.

The cipher must be dependent on the key, modifications in message should not mask the key. Randomness of the key is critical for the security of the system and it is hard to generate or guess it [8]. In 1999, NIST selected AES - Encryption method Criteria Security.

The evaluation criteria were divided into three major categories [17]:

Security:

Resistance of the algorithm to .

7

Soundness of its mathematical basis.

Randomness of the algorithm output.

Cost:

Licensing requirements.

Computation efficiency on various platforms.

Algorithm and Implementation Characteristics:

Flexibility.

Hardware and software suitability.

Algorithm simplicity.

Flexibility.

Key and block size agility.

1.5 Encryption

Encryption is an imaginative technical derivate of cryptography. Building the optimal encryption technique is still very important. In the encryption process the “key” takes an important role for encrypting and decrypting data, without the key the data can’t be interpreted. The strength of the key depends on its secrecy, randomness, length (size), and complexity. Over the years the encryption processes became more complex during each iteration of cipher text generation. Various encryption methods use a unique key generated for each iteration. However, the definition of Kerckhoffs’ principle is, “security of the

8

encryption depends on the secrecy of the key not the algorithm.” Meaning that everybody knows how the key is applied in the algorithm, therefore the complexity of that key is all that matters. Most of the common cryptographic methods follow this principle.

In 1997, the NIST (National Institute of Standards and Technology) received fifteen new security algorithms from twelve countries. Out of these encryption methods, MARS,

RC6, Rijndael, Serpent, and Twofish were selected as finalists [18]. Out of these finalists

Rijndael, Serpent, and Twofish took top 3 places respectively. The winning algorithm,

Rijndael, also called AES (Advanced Encryption Standard) is still in use by different encryption methods [19]. All these methods are symmetric encryption ciphers.

1.6 Homomorphic Encryption

1978 was the first time the idea of homomorphism for cryptography was theorized by

Rivest, Adleman, and Dertouzous [20]. You can define homomorphism in abstract algebra in terms of functions and algebraic structures. Once a function (map) is applied on algebraic structures the result still holds the same algebraic structure from the domain to range of algebraic sets. In group theory, homomorphism theorems are developed on subgroups as quotients groups. Ideals introduced in 19th century played a parallel role defining quotient rings and in the comparable homomorphism theorems in ring theory [21]. In algebra, and are the same type of algebraic structure and mapping a function “ ” from to is퐴 the homomorphism퐵 from to . 푓 퐴 퐵

퐴 퐵 A map from operation “µ” and arity “ ” and elements in A.

퐴 → 퐵 푘 푎1, 푎2 … , 푎푘 (1.1)

푓(µ퐴(푎1, 푎2 … , 푎푘) = µ퐵(푓(푎1),푓(푎2)…,푓(푎푘))

9

Mapping from to with and epimorphism, is homomorphic image of . When homomorphism holds퐴 a퐵 one to µone relationµ it is called퐵 endomorphism and noted as퐴 .

퐴 = 퐵 This same homomorphism also can be derived using lattices, groups, modules, and monoids [22]. In groups, homomorphism is a category of isomorphism when the homomorphism must be a bijection. If the and are two rings, and is a function from

to , where A is the domain of and 퐴is the range퐵 of , then each푓 element belongs

퐴to ,퐵 and belongs to . This푓 homomorphism퐵 can 푓 hold addition,푎 subtraction, and multiplication퐴 푓(푎 algebraic) operations퐵 ( ). It can be showed as below.

∗ (1.2) ′ ′ If the푓 ( 푎 ∗satisfies 푎 ) = 푓above(푎) ∗ the 푓( 푎following) are true

푓 (1.3)

푓(0) = 0 (1.4)

푓(1) = 1 (1.5)

푓(− 푎 ) = − 푓 ( 푎 ) (1.6)

If the푓 (properties 푎 ) = 푓 (ofb) homomorphism then a = bare incorporated in an encryption method or cipher, then it is a homomorphic encryption method. Homomorphic encryption can be organized in three approaches that are partial, somewhat, and full. Partial Homomorphic Encryption allows only one operation with unlimited iterations. Somewhat Homomorphic Encryption allows more than one but not all types of operations and limits the iterations. Fully

Homomorphic Encryption allows all types of operations with unlimited iterations [23].

10

FHE (Fully Homomorphic Encryption) can be defined as: Applying an encryption method (E) on data 1 (D1) and data 2 (D2) where the ‘ ’ represented any operation

(addition, subtraction, multiplication, and division). ⨳

This is the mathematical representation:

E(D1 ⨳ D2) = E(D1) ⨳ E(D2) The first feasible form of FHE was proposed by Craig Gentry in 2009 using ideal lattices with “bootstrappable” encryption methods [2].

1.7 Possible fully homomorphic encryption method

In 2009, Craig Gentry introduced the first possible fully homomorphic encryption method with an arbitrary depth circuit (composed of additions and multiplications) on the encrypted data. This research provided the blueprint of FHE. It is referred to as SwHE

(Somewhat Homomorphic Encryption) and it uses limited depth circuit, addition, and multiplication for evaluation [2]. This research helped develop an encryption method using lattice-based, integer-based, LWE (learning-with-errors), and RLWE (ring-learning-with- errors). Further research of SwHE and FHE showed promise for potential usage in cloud computing environments and other MPCs (multi-party computing) [24] [25]. In the Gentry method, using a lattice-based scheme takes too long to generate the key (ranging from 2.5 sec to 2.2 hours), the implementation is complex, noise creation can exceed thresholds, and bigger key sizes ( 17MB to 2.25GB) require high memory resources; all this becomes impractical in real systems [3] . Fully Homomorphic Encryption (FHE), is on the “bleeding edge” of encryption technology. But currently there is no FHE available for real time applications [26]. There is still a lot of work that needs to be done to have “production ready” version of FHE.

11

Gentry defines the algorithm, in public-key encryption scheme consists of three algorithms: , , and . takes security훆 parameter as input and implementedKeyGenε asEncrypt randomlyε whichDecrypt it resultsε KeyGen a publicε key λ and secret key and

k k public key . Plain text space P and ciphertext space C is p defined by . Gentry’ss encryption methodpk also randomized algorithm and it uses and plaintextpk P

ε k as input, and generatesEncrypt outputs a ciphertext C. His decryptionp technique π ∈

ε takes and as input, and outputs the plaintextψ ∈ π. Algorithm computations workDecrypt of all of themsk must ψbe polynomial in [26].

Algorithm correctness is: λ if ( , ) , P, and ( ,π), then ( , ) → π. 푅 푅 k k ε ε k ε k sHomomorphicp ← KeyGen encryptionπ ∈ schemeψ ← Encrypt has propertyp possiblyDecrypt randomizeds ψ efficient algorithm is derived using public key and a circuit from a permitted set

ε k ε of circuits,Evaluate and tuple of = p for the inputC wires of C; generatedC ciphertext C. Informally, the functionalityψ ⟨ψ1, …that , ψ It ⟩want from is that, using

ε k if i “encryptsψ ∈ ”, then ← ( ,C, ) “encrypts C(Evaluate )” using , forp inputψ ( )π i generatesψ outputEvaluate C( ε pk )ψ of C. For the π encryption1, … , πt the minimalpk requirementπ1, … , is π t correctness. The followingπ1, … , are πt couple of different ways to formalize

Gentry’s homomorphic encryption methods. Gentry defined them as follows [26].

“Definition 1: (Correctness of Homomorphic Encryption). Gentry says a homomorphic encryption scheme is correct for circuits in if, for any key-pair ( , ) output by

ε k k (λ), any circuit훆 C , any plaintextsC , and any s ciphertextsp = 퐊퐞퐲퐆퐞퐧훆 with i ← ∈ C(ε , ), it is the caseπ 1that:, … , πt ← ( ,C, ), thenψ

⟨ψ1 , … , ψt⟩ ψ Encryptε pk πi ψ Evaluateε pk ψ

12

( , ) → C( ) except with negligible probability over the random coins

Decryptin ε sk ψ . π1, … , πt

EvaluateBy itself,ε [ 26 mere] correctness fails to exclude trivial schemes. Suppose I define

( ,C,ψ) to just output (C,ψ) without “processing” the circuit or ciphertexts at all,Evaluate and ε pk to decrypt the component ciphertexts and apply C to results.

DefinitionDecrypt 2: (Compactε Homomorphic Encryption). We say that a homomorphic encryption scheme E is compact if there is a polynomial f such that, for every value of the security parameter λ, E’s decryption algorithm can be expressed as a circuit DE of size at most f(λ) [26].

Definition 3: (“Compactly Evaluates”). We say that a homomorphic encryption scheme E

“compactly evaluates” circuits in CE if E is compact and correct for circuits in CE [26].

Definition 4: (Fully Homomorphic Encryption). We say that a homomorphic encryption scheme E is fully homomorphic if it compactly evaluates all circuits [26].

Definition 5: (Leveled Fully Homomorphic Encryption). We say that a family of homomorphic encryption schemes {E(d) : d Z+} is leveled fully homomorphic if, for all d Z+, they all use the same decryption circuit,∈ E(d) compactly evaluates all circuits of depth∈ at most d (that use some specified set of gates), and the computational complexity of E(d)’s algorithms is polynomial in λ, d, and (in the case of ) the size of the circuit C [26]. Evaluateε

Definition 6: ((Statistical) Circuit Private Homomorphic Encryption). We say that a homomorphic encryption scheme E is circuit-private for circuits in CE if, for any keypair

(sk, pk) output by KeyGenE(λ), any circuit C CE, and any fixed ciphertexts Ψ = hψ1,...,ψti

13

that are in the image of EncryptE for plaintexts π1,...,πt, the following distributions (over the random coins in EncryptE, ) are (statistically) indistinguishable:

Evaluateε Encrypt E(pk, C(π,…,πt)) ≈ (pk,C,Ψ)

Evaluateε The obvious correctness condition must still hold [26].

Definition 7: (Leveled Circuit Private Homomorphic Encryption). Like circuit private homomorphic encryption, except that there can be a different distribution associated to each level, and the distributions only need to be equivalent if they are associated to the same level (in the circuit) [3] [26].”

All the above definition to show a very high level of Gentry’s work to defend the thinking behind homomorphic encryption. Gentry scheme is an asymmetric encryption scheme and his work very revolutionary to the thought behind homomorphic encryption scheme bringing back to the world, so for that reason all the definitions are mentioned in this thesis, but the details of his work is out of scope of this this research. The mathematics used in his scheme has some shortcomings because the primitive itself is not homomorphic, but his circuit computation algorithm allowed for homomorphic properties. His algorithm organizes the data and manipulates the circuits to achieve the computations on encrypted data [26].

1.8 Vector product spaces with Clifford Geometric Algebra

Geometric algebra was used as the basis for various encryption methods. For example,

RSA (Rivest–Shamir–Adleman) uses math in the form of factors with larger prime number based key sizes. This approach creates complex factoring for RSA [27]. AES uses

14

mathematics in the form of bit manipulations to increase the “diffusion” of cyphertext and register based operations to increase “confusion” on shared key.

Applying Clifford Geometric Algebra on vector product spaces gives the results which is intractable because the results will produce output as a vector in different direction, space or volume. The geometric product that is a Clifford Geometric Algebra operation, is an extension of the inner product of the vectors and it represents the geometric objects of all dimensions in vector space. Versors represents the multiple vectors geometric product and hold the properties of vectors in vector space. Selecting multiple vectors with smaller dimensions and performing a geometric product on them, results in an intractable vector in vector product space.

1.9 Dissertation Contributions

This dissertation contributes to the state of art in the following areas

1. New Primitive for Homomorphic Encryption Method leveraging Clifford

geometric algebra mainly geometric product, versors and inverse of a vector named

Reduced Vector Homomorphic Encryption Technique. RVTHE (Reduced Vector

Technique for Homomorphic Encryption) is cryptographic cipher powered by

Clifford Geometric Algebra and versors. Versors are vectors in the geometric

product space which have simpler inverse characteristics. This approach is an

incredibly efficient method for encryption, decryption, and real time usage.

2. Designed the cipher to use number of keys and flexibility of position of keys and

generating smaller encrypted file size (reduced to 25%) than previous multi-vector

approaches. The use of multi-vectors for homomorphic encryption had been

15

demonstrated by David Williams Honorio Araujo Da Silva in his master thesis, the

algorithm was designed using a concept invented by Dr. Carlos Paz de Araujo in

2017. The Design of RVTHE provides agility and flexibility.

3. Implemented the Reduced Vector Homomorphic Encryption Technique with

various key sizes, file sizes and file types. Evaluated against AES Crypt encryption

method, the results showed it is faster in encryption and slower for decryption.

4. Securing the data involves two stages, data at rest, and data while traveling. “Data

at rest” means to describe the data before or after sending to server, storage, or

cloud. “Data in transit” means sending the data between client and server, storage,

or cloud. I will refer to these two stages in this paper as ESD security (Every Stage

of Data). Enterprises have been using security networks, servers, and storage; but

the data has not always been stored in a secure state, therefore there is a need for

fully data-centric security. Data-centric encryption is a way to achieve data-centric

security. RVTHE is a data-centric encryption cipher which is simple to implement

and provides ESD security for the entire system. This method requires less

resources to encrypt and decrypt, and it offers real-time data updates. It is also

scalable and adaptable from small devices to large enterprise storage.

1.10 Dissertation Organization

Chapter 2 discusses about Background about SSD, Survey of various encryption methods, and mathematical foundation. Chapter 3 discusses about problem case and how do the research achieve the problem case. Chapter 4 shows the work of the problem proving

16 with experiments of encryption methods in cloud. Chapter 5 covers new primitive Reduced

Vector Encryption Technique Homomorphic Encryption cipher design, homomorphic properties, and security. Chapter 6 shows the implementation and evaluation of new cipher.

Chapter 7 presents lessons learned and future work. Chapter 8 concludes along with contributions.

17

CHAPTER 2

BACKGROUND

This chapter discusses all the research related to storage and security methods background.

Mainly discussing about SSD storage device characteristics, various encryption approaches, and mathematical foundation related to new cipher that will be presented in this research.

2.1 Cloud Storage SSD

Most of the Cloud environments uses SSD as data storage or in the form of flash cache to increase the performance. Data is stored regardless of power availability status. It does not contain an actual disk (platter) as in a traditional HDD (Hard Disk Drive). SSD technology uses electronic interfaces like SATA (serial ATA) Express to make compatible with any host. It also uses typical traditional block input/output (I/O) provided by any host, thus permitting simple replacement of traditional hard disk drive technology in common applications. SSD is used as the primary data storage for communication devices, storage systems, modern computers, etc. [11].

In the perspective of security for an SSD, it strives to achieve the best data reliability, integrity, secure deletion, and encryption; plus, the unique physical nature of the device.

These aspects depend on ECCs, reliably erasing data from storage media (no digital footprint), and proper encryption methods. SSD’s built-in commands are effective for ECCs and deletion, but manufacturers sometimes implement them incorrectly. However, previous

18 research has been done to solve some of the above issues by implementing a variety of different approaches for achieving better ECCs and encryption methods. Previous research had not considered the sacrifice to performance due to encryption, when they implemented their methods. This thesis will consider those factors in the form of performance of SSD in

IOPS for different workloads, while doing the encryption process.

2.1.1 Data Reliability and Integrity

ECC is one of the functions of the FTL. ECC schemes are implemented to ensure raw reliability of data. It usually contributes overhead on resources; thus, it impacts the performance. Conventional ECCs such as the commonly used BCH (Bose-Chaudhuri-

Hocquengham) code reliability degrades as SSD capacity grows. It is important to implement a powerful ECC Engine with LDPC (low-density parity-check) code to improve the reliability of SSDs [28] . Previous research proposed different ECC approaches to increase the data reliability. One of the ECC research approaches is lightweight EDC (Error

Detection Code) for the block to achieve better cache performance [29].

2.1.2 Sanitization and Secure Deletion of SSD

The physical SSD architecture of the non-encrypted SSD had limitations for sanitizing the disk or securely deleting a file from SSD. In case the vendor did not implement the host interface built-in commands correctly, the sanitation of the SSD will not be achieved. There is no full sanitizing technique that worked for HDD that is guaranteed to work for an SSD.

Usually we can achieve sanitization of SSD by writing to the visible address space twice using FTL procedures. But this is a time-consuming process and it is not a true sanitization, because it does not take care of invisible address space (files marked as deleted, but physical data still exists). The data in SSD can be erased with erasure-based sanitization techniques

19

(overwriting the disk with multiple IO operations) that may be able to sanitize the SSD but these techniques have shortcomings and fail to do a real sanitization [30].

Completely deleting and securely erasing data in SSD is challenging. For that reason, storing unencrypted data creates a risk of exposing that data to unauthorized access. And although erasing files via sanitization methods will make the SSD more secure, it also creates a lot wear and tear on the device which will shorten its lifespan. To avoid these problems, the best option is encrypting the data on an SSD. Previous research created a couple of methods for encrypting files on SSDs, they are node level and password-based file level encryptions. In node level encryption, you encrypt the nodes. It stores keys on the dedicated KSA (Key Storage Area). That is the concern though, because KSA blocks can turn into bad blocks and at such a time they can be read [31]. In password-based file level encryption, files are encrypted using passwords, but encryption and deletion of the files is slow and accessing the files each time is tedious [32].

Even with all the challenges of encryption, it is still the best option for securing the data of SSD.

2.2 Survey of Various Encryption Approaches

This chapter describes encryption methods and algorithms; and we’ll look at their strengths and weaknesses in high detail. They are used in encryption software for SSD.

This chapter also talks about real randomness for creating keys and the most common types of existing encryption methods.

20

2.2.1 Block Ciphers

An encryption function on fixed-sized blocks of data is called a block cipher. “A secure block cipher is one for which no attack exists” [6]. Block cipher is an encryption function for fixed-sized blocks of plaintext and generates the same sized block ciphertext using the same secret key (key size can be different then plaintext). Without a secret key, no plaintext can be produced from ciphertext. Security of the block cipher is also defined as using attacks as non-generic methods to differentiate between block and an ‘ideal block cipher’

[6]. “Block cipher written in terms of E(K,p) or Ek(p) for encryption of plaintext p with key

K and D(K,c) or Dk(c) for decryption of ciphertext c with key K” [6]. In block cipher encryption, the key is a critical component and its integrity is absolute, changing a single bit in the key value can result into a different ciphertext [6].

Using a permutation on k-bit values generating the k-bit cipher with each of the key values can create 2k cipher values [6]. Suppose we have single permutation on 128-bit values, it will create a table of 2128 cipher values (each cipher is 128-bit). The ideal cipher should have a random permutation for each key value, this will give the ability to choose the look up table randomly. The “distinguisher” is an algorithm that converts data to a block cipher or an ideal block cipher using a black-box function. The distinguisher does not have knowledge of the internal process of the black-box function. There are limited amounts of computing that a distinguisher can do, otherwise more computing would complicate the process beyond an acceptable level of efficiency. A practical block cipher should be designed such that each encryption function appears to be a randomly chosen key with an invertible function [6].

21

The block cipher is an ‘ideal block cipher’ if it can withstand attacks like known plaintext, ciphertext only, related key, chosen plaintext, and other types of attacks. In SSD, encryption software uses one or more of the following block ciphers, the following sections will address them and some of their attacks.

2.2.1.1 DES (Data Encryption Standard)

The algorithms described in this DES standard specifies both enciphering and deciphering operations which are based on a binary number called a key. DES uses Feistel ciphers design with 16 rounds.

Figure 1 - Data Encryption Standard [27]

22

In Figure 1, DES starts with a 64-bit input block (binary digits). DES then applies a 56- bit key which was randomly generated from the 64-bit key. Out of this 56-bit key, 48 bits are used directly by the algorithm, and the other 8 bits are used for error detection as needed.

These 8 bits are used to set the parity of each 8-bit byte, the output should have an odd number of "1"s. XOR operations performed between key and data along with permutations makes the final cipher [18].

Figure 2 - TDEA [27]

In Figure 2 - TDEA , a 3DES (TDEA) key is made out of three DES keys, which are also referred to as a key bundle. The keys inside the key bundle are different from each other. This key bundle is used for encryption and decryption. The encryption process starts with encrypting using the first key, decrypting using the second key and then encrypting using the third key. The decryption process follows the reverse order of encryption process.

The encryption algorithms specified in this standard are commonly known among those using standard encryption [33].

23

3DES was heavily used by organizations until researchers discovered Active Collision

Attacks on different modes (e.g. CBC, CTR, GCM, OCB, etc.) [34]. The small key size, data block size (64 bits), and using same key for encryption became a vulnerability because it created the same ciphertext in every two to the power of data block size (232). These matching collision ciphers can expose security to attacks like birthday attacks. Due to XOR operation you can find plaintext XOR between the collision ciphers. Cipher collision is not enough to discover the plaintext, but that along with same secret key feed and some fraction of known plaintext, will make it easier to perform successful attacks. Due to ever increasing computer power these attacks are more easily committed by various attack methods like the man-in-the-browser attack [34].

2.2.1.2 AES (Advanced Encryption Standard)

Figure 3 – AES encryption process [35]

24

AES is symmetric encryption algorithm that was created to replace DES and 3DES encryption. Joan Daemen and Vincent Rijmen developed AES encryption. It was adopted as NIST encryption standard 2001 [35]. Figure 3 shows the AES encryption process.

AES is created with the key lengths of 128, 192 and 256 bits. It encrypts a 128-bit block of plaintext and generates a 128-bit block of ciphertext. It uses only one key for encryption and decryption. AES encryption consists of repeated rounds of implementing the following steps: sub bytes (replacing bytes using S-box table), shift rows, mix columns, and add round keys. The last round does all functions except mix columns. AES decryption reverses the encryption process.

Key Size Total Rounds

128 10

192 12

256 14

Table 1--AES Key Size and Number of Rounds [35]. Table 1 shows the total rounds performed based on key size [35] [36]. In recent years most of the AES implementations are using the 256-bit key length instead of 192-bit key.

Even though AES 256 has a longer key, the way the is designed it makes it more vulnerable to sub-key attacks. AES is subject to a theoretical brute-force attack, but even with current technology it would take a quintillion year to break the encryption key.

There are some additional theoretical attacks documented, and they are cryptanalytic attacks, related key attacks on AES 192 and 256, middle attacks on AES 128, and first key

25 attack on all of AES. Exploits of AES 256 have received the focus by the security community more than AES 128 and AES 192. Despite this, all AES versions are considered not breakable by today’s technology [36] [37].

2.2.1.3 Blowfish

Blowfish is a symmetric block cipher that was designed by Bruce Schneier to replace

DES in 1993.

Figure 4 - Blowfish Algorithm [38]

In Figure 4 , the original design of Blowfish manipulates data in 32-bit, 64-bit and 128- bit block sizes with variable key size scales from 32 bits to 256 bits. In Figure , the algorithm uses the XOR operation, table lookup (S-box), and modular multiplication. It has the same structure as the DES algorithm. This algorithm uses precomputable sub-keys to expedite

26 the speed of encryption. After a year, to increase the security the key size was increased from 256 bits to 448 bits and published in Dr. Dobb's Journal [38].

If the security person has chosen a small key length, then it will behave like weak keys in Blowfish, which will make it vulnerable to chosen-key and related-key attacks. Due to its Feistel structure and key dependent S-box substitution it is also prone to slide and simple power attacks. Because Blowfish is a block cipher, it is vulnerable to similar attacks as block ciphers are already prone to such as; side-channel, exhaustive search, and birthday attacks are just name a few [39].

2.2.1.4 Twofish

Twofish symmetric encryption algorithm is like AES. It uses key lengths of 128, 192, and 256-bits and a 128-bit block cipher.

Figure 5 – Twofish process [40]

27

The National Institute of Standards and Technology selected it as one of the top 5 finalists, but it was not selected for standardization in the end. Still recently developed encryption software for storage and file systems incorporated this algorithm (i.e. TrueCrypt,

BestCrypt, Dm-crypt, and DiskCryptor). Twofish algorithm is one of the ciphers included in the OpenPGP standard and it is free with no restrictions.

In Figure 5 , shows Twofish algorithm uses the same predefined key-dependent, S-box, and key schedule as AES. The first half of the key is used for encryption and the second half is used for an S-box lookup and modifying the encryption algorithm. Twofish’s design looks like a mix of DES and AES, one half is like DES in that it uses a Feistel structure, and other half is like AES in that it uses S-box and a Maximum Distance Separable matrix.

Twofish’s 128-bit key encryption is slower than its AES counterpart, but the 256-bit key encryption is faster [40].

Researchers had claimed that when the pairs were , there might have been vulnerability to a Twofish cipher by partial chosen-key and related-key attacks. But it was determined that the existence of these key pairs was not realistic, so the proposed attacks would not work [41]. With time the scholars found there are vulnerabilities with the

Twofish cipher after all. One attack, SPA (Simple Power Analysis) revealed the secret key of the cipher. It uses S-box with 8-bit predefined permutation and round operations, so it is prone to Side Channel attack with one iteration to discovering encryption key. [42].

28

2.2.1.5 Serpent

Serpent is also a block cipher, and it was published in 1998 by Ross Anderson, Eli

Biham and Lars Knudsen. This algorithm was selected as one of the finalists by the US

National Institute of Standards and Technology [43].

Figure 6 – Serpent Algorithm - [44] Figure 6 shows the process of Serpent. Serpent uses 128, 192 and 256-bit key lengths, and it uses 32-bit words with 32-bit round substitutions and a permutation network with 4- bit S-boxes running 32 rounds key mixing operation [43].

In 2011, there was a cryptographic analysis performed using a multidimensional linear method to find vulnerability with Serpent. Researchers proved Serpent breaks in 11 rounds of using a 128-bit key length with key mixing operations that find the encryption key [45].

29

2.2.2 Block Cipher Modes

A Block Cipher Mode is a repeated cryptographic conversion of a single-block operation on several bits to achieve confidentiality and authenticity. It adapts to the different operating environments and requirements. The following are some of the most commonly used block cipher modes.

2.2.2.1 CBC (CIPHER BLOCK CHAINING) MODE:

CBC mode uses the IV (Initial Vector) to encrypt the first plaintext using XOR operation. This method uses the previous ciphertext to encrypt the next plaintext block. The encrypted ciphertext is stored in a feedback register and used for inputting the XOR function with the next plaintext. This process repeats until all the plaintext has been addressed. From the second block onwards, all the blocks depend on the previous blocks.

In the decryption process, the same thing is applied but in reverse order. To decrypt the next cipher text block, use the cipher from previous decryption cycle and apply XOR with decryption key to get the next plaintext block. After each decryption cycle the cipher is stored in the feedback register.

Figure 7 - CBC Encryption and Decryption [27]

30

Figure 7 shows CBC encryption and decryption process. CBC structure may be exposed to some vulnerabilities. For example, in CBC mode the encryption process will not start until there is enough plaintext data to fill the entire block being processed.

Encryption: Ci = Ek(Pi  Ci − 1) and Decryption: Pi = Ci − 1  Dk(Ci)

In secured network communications, the terminals need to immediately send each character or string of bytes to the destination host, as they can’t wait until the block is full.

But when the string of bytes is smaller than a block, CBC mode will not be able to handle the encryption. Another weakness, the birthday paradox exposes identical patterns of the plaintext every 2m/2 blocks (m = block size), this is due to chaining. There are ways, you can mitigate these issues for example: taking care of the message starting point and endpoint, including controlled redundancy and authentication [27].

If an attacker added some bits to the ciphertext block and it was undetected during the decryption cycle, that block will result in gibberish. Sometimes it may not be an issue, but other times it can cause problematic situations. Altering the ciphertext by even one bit will cause the subsequent block to have the wrong input, and that will affect the decryption of that block. The combination of SSL v3 and TLS v1 with CBC is not recommended as it uses the entire traffic single set of ‘’ for the communication. This exposes the targeted block to a , where an attacker can figure out the padding information, then the attacker can determine the plaintext bytes from the ciphertext by running multiple queries [46]. This was addressed in TLS 1.2, which checks for multiple queries and stops the connection to prevent that type of queries, and it was recommended to upgrade all the secure communications by implementing this change.

31

2.2.2.2 CFB (CIPHER-FEEDBACK) MODE

Usually the block ciphering won’t start until the block data is received. As mentioned in the CBC section, CBC cannot handle when a string of bytes is smaller than a block. On the other hand, CFB mode can handle this smaller string of bytes. This process derives the next key from encrypting the previous ciphertext. This key is used for the next iteration to encrypt next plaintext bytes.

Figure 8 - CFB mode with 8 bits [27]

Figure 8 shows the encryption and decryption for n bit block. The encryption and decryption use block size with shifting and XOR operations.

Encryption:

Ci = Pi  Ek(Ci − 1) Decryption:

Pi =Ci  Ek(Ci − 1) CFB uses synchronous on both the encryption side and the decryption side. Encryption and decryption generators need to derive the exact same keys

32 on corresponding iterations. If any of them miss a cycle it can result in generating the wrong ciphertext or plaintext. CFB mode is like CBC mode, in that one incorrect bit can propagate to all the subsequent processing [27].

2.2.2.3 LRW (Liskov, Rivest, and Wagner) mode

To prevent attacks from the CBC mode the LRW mode was introduced. This is a tweakable narrow-block encryption, which is a random permutation using a key with a known tweak I on the plaintext P, the result of which will be block cipher C. This method uses two keys: The first key K is used to encrypt the plaintext with XOR, and the second key F is used for a finite field permutation. The key F is the same size as a block, it is used in the finite field permutation with a precomputation tweak of the plaintext. This outcome

X will be used for the encrypting process [47].

Encryption:

C = Ek(P  X) X

Where X = F  I Decryption:

P = C  Ek(Ci − 1) The XOR and multiplication are performed using key K and F on the plaintext and finite field (GF (2128) for AES) with a precomputation tweak.

F  I = F  (I0   ) = F  I0  F  

 represents all possible values in the binary finite field of (GF (2128)).

33

This method protects from CBC mode attacks, but still have its own leak. If the attacker changes a single block it only affects that cipher block but not all the subsequent cipher blocks.

2.2.2.4 XTS Mode

Figure 9 - XTS mode [27] In Figure 9, XTS mode is an Advanced Encryption Standard with XEX (XOR Encrypt

XOR) tweakable code value and . Simplified tweaked AES with XEX method will use the XOR operation on the plaintext to generate the tweaked output. Then the second-time AES encryption is applied on the tweaked output it will generate the final ciphertext. Ciphertext stealing is a block cipher mode that allows the encryption of the messages without having to divide them into sizes that are not divisible by the block size, this results in same size ciphertext, but it is more complex.

j X = Ek(I)  α

C = Ek(P  X) X

P - The plaint text.

34

I - The number of the sector.

α - The primitive element of (GF (2128)) defined by polynomial. j - The number of the block within the sector. XTS mode has similar vulnerabilities like CBC mode. For example, tampering of data can go unrecognized, which will when decryption occurs, generate gibberish. The system must be built to recognize this potential threat and be able to protect the data using checksums and authentication tags. This mode is prone to other vulnerabilities like replay attacks and randomization attacks. If the attacker has access to ciphertext blocks they can analyze them and use them for replay attacks and randomization attacks [48].

2.2.2.5 GCM (Galois/Counter Mode)

Figure 10 - GCM mode [49]

In Figure 10, GCM is a symmetric key cryptographic block cipher. It is derived from

GMAC (Galois Code), an authenticated incremental message communication. All blocks are numbered and then they are encrypted using XOR operation

(like a stream cipher operation order in the form of counters). GCM uses a hash key H, it

35 is a string of a 128 zero bits encrypted using the block cipher. For encryption, along with the hash key, it uses a unique arbitrary length initialization vector for each stream [49].

GCM mode does not have vulnerabilities like CBC. For example, in CBC mode tampering can occur without noticing, but in GCM the operations are performed using an method, which keeps data and communication confidential. It also maintains integrity, by using the main function’s authentication tag or mode to verify the data. It uses reasonable hardware resources (memory, CPU, etc.,), it also performs very efficiently due to parallel processing, and provides high speed communication [49].

The key in GCM mode is similar way to the one in LRW mode (multiplication for

Galois field) per each 128-bit block cipher (GF (2128) for AES). The GF polynomial is defined as:

. 128 7 2 푥 + 푥 + 푥 + 1 Feeding the blocks of data into the GHASH function and encrypting the output will generate the authentication tag.

GHASH (H,A,C) = X m+n+1

H - Hash Key ,A - Authenticated data (plaintext)

C – Ciphertext ,m - The number of 128-bit blocks in A

n - The number of 128-bit blocks in C

This encryption method has been shown to be secure and efficient. Currently, Google uses as it’s mode for their website certificate.

36

2.2.3 Encryption Methods for SSD

SSD serves as a typical alternative to HDD. In fact, SSD considerably emulates the technology of HDD such as the communication protocol and hardware interfaces. So, the technology of HDD can quickly be adapted to SSD. However, the methods that SSDs employ to process data is different from HDDs in storing, managing, accessing, and securing. Because of the differences between the two technologies, it is possible that the processing of the same commands on HDD will produce different results on an SSD [11].

When it comes to encryption, we need to consider these differences. There are couple of encryption techniques that have been used for SSD. This chapter will discuss those methods.

2.2.3.1 Dm-crypt

Dm-crypt is a disk encryption method compatible with Linux kernel version 2.6 or later. It uses API routines. Devices are mapped to encrypted containers using a device mapper

[50]. This API uses AES-256 cryptographic method along with other methods. Dm-crypt uses Linux Unified Key Setup (LUKS) to create encrypted containers which are independent from outside platforms. LUKS was developed by Clemens Fruhwirth in 2004

[51]. Using this method, user can even encrypt the root device. A passphrase is required to create encrypted containers.

There has been some research around the drawbacks of dm-crypt. For example, it has been discovered that hackers can sidestep the passphrase to access encrypted containers by hitting the ‘Enter’ key couple of times. They can also delete the containers, because to delete the containers they do not require the passphrase. Utilizing disk commands on the

37 system an intruder can determine critical components of the hidden containers relatively easily [52] [53].

2.2.3.1.1 Process Method

Dm-crypt uses device mapper and the Linux kernel’s Crypto API routines. This API is built with a cryptographic method using an AES-256 algorithm. Dm-crypt supports XTS, LRW, and other modes for the encryption. The encrypted containers are stored as files inside a folder. Users can create these containers (volumes) with LUKS (Linux Unified Key Setup) encryption specification that is protected by a passphrase. Using the system device mapper, it mounts the encrypted containers on the top of existing devices. Clemens Fruhwirth created LUKS in 2004, Dm-crypt uses this method to create encrypted containers, which are independent from the existing platform and allow compatibility from system to system.

2.2.3.1.2 Weaknesses

Using this method, a user can encrypt the root device, but they may need a smart device attached to the system so that they can boot to the primary system. When creating the containers, a passphrase is required, but to delete a container a passphrase is not even requested. This method is mainly used for Linux like systems. Some of the research showed that you can bypass the passphrase to access the containers by just pressing the enter key a couple of times. The file systems information displays the sizes of volumes, which may result in someone guessing information about the hidden containers.

38

2.2.3.2 BestCrypt

BestCrypt is encryption software implemented in 1995 and it is still in use. It creates, mounts, and manages encrypted volumes called containers. Because this encryption software is still in use, it will be selected for evaluation.

2.2.3.1 Process Method

This encryption method stores files in encrypted containers and keep them safe from unauthorized access. The benefits of BestCrypt is the system disk volumes can be mounted and stored as encrypted files when not in use. This method can be applied to removable media, network shares, archived storage, and email attachments on Windows or Linux OS.

It uses the following cryptographic methods: AES, Blowfish, DES, Triple DES, Twofish,

Serpent, and GOST 28147-89. All these cryptographic methods use LRW and CBC modes.

AES, Twofish and Serpent also use XTS mode [54].

2.2.3.2 Weaknesses

It seems a viable option, but like any software it can have bugs, these errors can be as large as damaging entire partitions.

2.2.3.3 FDE (Full Disk Encryption)

FDE is a hardware encryption method, it started implementation in 2009 and is still in use.

It encrypts all the partitions, system files and operating system using hardware component.

This technique is used by Samsung SSDs which are commonly used. Applying FDE on an

SSD is called an SED (Self-Encrypting Drive). Self-encrypting SSDs provide better performance than SSDs where the encryption software is installed [12]. This encryption implementation method will be selected for evaluation.

39

When Full Drive Encryption (FDE) is applied on an SSD it is called a Self-Encrypting

Drive (SED). FDE was developed in 2009, it is a literal encryption of the entire system which includes all the partitions, system files, and operating system. This encryption method assigns the process to use the hardware component of the drive. This helps to enhance the security by utilizing the Opal Storage Specification (which is a set of specification features of SEDs) [12]. SED needs a master password for the SED and a user password for each user. They are stored in the BIOS and handled by the hard disk controller.

SED uses AES 128 and AES 256.

Researchers have found the following vulnerabilities of this method: Hot Plug Attack, Hot

Unplug Attack, Forced Restart Attack, and Key Capture Attack. They have also shown that attackers can bypass the encryption and access data; this undermines the purpose of securing the data [55].

2.2.3.3.1 Process Method

This encryption method delegates the process logic to a dedicated hardware component of the drive using the Opal storage specification (a set of specification features of SEDs) to enhance security. The hard disk controller handles key management, it enhances the security and protects the data from unauthorized access. SED will have two passwords and they are User and Master password. Both passwords are stored in the BIOS. The Master password is generated by the SED and the user password generated by users for system access. In situations where user password is lost or forgotten then the Master password can be used to unlock the system. It uses the following cryptographic methods: AES 128 and

AES 256. Using a BIOS password, it is used for pre-boot authentication of the system.

40

2.2.3.3.2 Weaknesses

There are some attacks that are related to this method: Hot Plug Attack, Hot Unplug Attack,

Forced Restart Attack, and Key Capture Attack. Research has shown the attacker can bypass the encryption and access data; this undermines the purpose of securing the data

[55].

2.2.4 Comparable Encryption for Evaluations

“AES Crypt is a file encryption software available on several operating systems that uses the industry standard Advanced Encryption Standard (AES) to easily and securely encrypt files.” Represented in this paper as AES-Crypt [56].

2.2.5 Homomorphic Encryption

The first practical and feasible version of homomorphic encryption was introduced by

Craig Gentry applying addition and multiplication on the encrypted data over circuits in

2009 [57] [2]. Research had shown that there were advantages of leveraging the homomorphic encryption in the Cloud and in Multi-Party Computing environments [24]

[25]. Most of the previous implementations were asymmetric homomorphic methods. But researchers observed that some behaviors not practical for real world usage and they were:

• Key Sizes: Ranged from 17MB to 2.25GB

• Key Generation Time: Ranged from 2.5secs to 2.2hours

• Cipher Text size: Much larger cipher texts

• Noise: Creation exceeding thresholds

• Time: Very long execution times

41

These weaknesses made homomorphic encryption impractical to use in the cloud or real time systems [3] [58]. Currently there is no encryption method in production which can take advantages of homomorphic features for any system [26].

It must be way; we could create an encryption methodology that could derive great value from the advantages of the unique features of homomorphic encryption. Using

Versors from Clifford Algebra and Versors I developed a symmetric homomorphic encryption scheme. The next section will discuss mathematical foundation of new encryption method.

2.3 Mathematical Foundation

This section discusses the mathematical foundation which was used to architect

RVTHE.

Algebra is the base for most homomorphic encryption methods. It uses positive numbers, real numbers, complex numbers, linear algebra, geometric algebra and function spaces (e.g., Hilbert Spaces and Clifford Algebra) for number fields. If, Geometric Algebra uses vector spaces with a quadratic form and it is associative, then it is called Clifford

Algebra. I chose to use Clifford Algebra for RVTHE, because it calculates a geometric product of vectors and the generated results are not traceable, this is ideal for level of security that we want to achieve. So, it is important to understand these Clifford Geometric

Algebra terms [59]:

Geometric Algebra is the foundation for homomorphic encryption. It uses positive numbers, real numbers, complex numbers, linear algebra, and function spaces (e.g., Hilbert

Spaces and Clifford Algebra) for number fields. If Geometric Algebra uses vector spaces

42 with a quadratic form and it is associative then it is called Clifford Algebra. It is important to understand these Clifford Geometric Algebra terms [59]:

• Vector: “a quantity having direction as well as magnitude, especially as

determining the position of one point in space relative to another.”

• Vector Dimension: “Let V be a finite dimensional vector space over the field .

The Dimension of V denoted dim V is the number of vectors in any basis of 픽V.

픽 If V is an infinite dimensional vector space over then we write dim V ”.

픽 We can represent a “n” dimension vector as 픽 . =∞

i.e. If n=2 then 2D is used to represent a 2-dimensional“nD” vector.

• Vector Space or “ Bivectors” : “a space consisting of vectors, together with the

associative and commutative operation of addition of vectors, and the associative

and distributive operation of multiplication of vectors by scalars.”

• Multi-vector: “a mathematical structure comprising a linear combination of

elements of different grade, such as scalars, vectors, bivectors, tri-vector, etc.”

• Geometric Algebra Axioms: To understand combinations of scalars, vectors,

and bivectors, we first need to know the axioms behind the geometric algebra.

These are the proven axioms in geometric algebra. Vectors are represented by

scalars by , and bivectors by ( .

(푎, 푏, 푐), (휆 휀), 푎푏, 푏푎, 푎푐, 푒푡푐) Axiom 1: associative rule (4.1.1.1) 푎(푏푐) = (푎푏)푐 Axiom 2: (4.1.1.2) distributive rules 푎(푏 + 푐) = 푎푏 + 푎푐 (푏 + 푐)푎 = 푏푎 + 푐푎

43

Axiom 3: (4.1.1.3)

(휆푎)푏 = 휆(푎푏) Axiom 4: = 휆푎푏 [휆 ∈ 푅] (4.1.1.4)

Axiom 5: 휆(휀푎) = (휆휀)푎 [휆, 휀 ∈ 푅] (4.1.1.5)

Axiom 6: 휆(푎 + 푏) = 휆푎 + 휆푏 [휆 ∈ (4.1.1.6) 푅] Axiom 7: (휆 + 휀)푎 = 휆푎 + 휀푎 [휆, 휀 ∈ (4.1.1.7) 푅2] 2 Axiom 8: |푎a · =b| =|a |a||b|| cos θ (4.1.1.8)

Axiom 9: |a b| = |a||b| sin θ (4.1.1.9)

Axiom 10: ab∧ = a · b + a b (4.1.1.10)

Axiom 11: a b = −b ⋀ a. (4.1.1.11)

• Product of Vectors: The⋀ result of multiplying⋀ the vectors with scalar and cross

products. These two products are foundation for geometric algebra’s inner, outer,

and geometric products of vectors.

o Scalar Product: (Also known as dot product) The magnitude of production

of vector quotients.

o Cross Product: (Also known as vector product) A binary operation on

two vectors in three-dimensional space.

o Outer Product: (Also known as wedge product) The tensor product of

two coordinate vectors.

o Inner Product: The dot product of the Cartesian coordinates of

two vectors.

o Geometric Product: The sum of the inner and outer products

44

• Vector Inverse: When performing geometric product between vector A and

another vector B; if the result is “1” then vector B is called the inverse of vector A

and vice versa.

• Blade: The outer product of k vectors is called a k-blade, suppose 1-blade means

vector, 2-blade means bivector, 3-blade means tri-vector, and so on. Where k

indicates the grade of the blade.

• Versors: Versors are multiple vectors using geometric product following Clifford Geometric Algebra.

• Vector: “a quantity having direction as well as magnitude, especially as

determining the position of one point in space relative to another.”

• Vector Dimension: “Let V be a finite dimensional vector space over the field .

The Dimension of V denoted dim V is the number of vectors in any basis of V.픽

If V is an infinite dimensional vector픽 space over then we write dim V =∞”.

We can represent a “n” dimension vector as “nD”.” 픽 픽 i.e. If n=2 then “2D” is used to represent a 2-dimensional vector.

• Vector Space or Bivectors: “a space consisting of vectors, together with the

associative and commutative operation of addition of vectors, and the associative

and distributive operation of multiplication of vectors by scalars.”

• Multi-vector: “a mathematical structure comprising a linear combination of

elements of different grade, such as scalars, vectors, bivectors, tri-vector, etc.”

45

To show how Clifford Geometric Algebra is represented in math, I will use two dimensional (2D) vectors for inner product, outer product, and geometric product representations [21] [59].

2.3.1 Geometric Algebra Overview

Geometric Algebra combines the work of Hamilton () and Grassman (Non-

Commutative Algebra) into a field that generalizes the product of two vectors, including the 3-dimensionally restricted “Cross Product” to an n-dimensional subspace of the vector space (V) over number fields ( ) such that the subspace is a product space that allows two vectors to have aℤ, “geometric ℝ, ℂ, ℕ, 푒푡푐. product” as [59]::

푉̅1푉̅2 = 푉̅1 ∙ 푉̅2 + 푉̅1 ∧ 푉̅2 Where and are vectors or multi-vectors (i.e.: a collection of “blades”). The

1 2 operation 푉̅ is푉̅ known as a “wedge product” or “exterior product.” The operation

1 2 1 is the “dot푉̅ ∧ product” 푉̅ or “interior product” (aka. “inner product”). 푉̅ ∙

푉̅2 For a simple pair of two-dimensional vectors:

푉̅1 = 푎1푒̅1 + 푎2푒̅2

푉̅2 = 푏1푒̅1 + 푏2푒̅2 where the set are unit vectors and are scalars, the geometric product follows the{푒̅ 1rules, 푒̅1} of Geometric Algebra,{푎 as푖}, {described푏푖}, 푖 = 1,2 below:

푒̅푖 ∧ 푒̅푖 = 0푒̅푖 ∧ 푒̅푗 = −푒̅푗 ∧ 푒̅푖

46

푒̅푖 ∧ 푒̅푗 = 푒̅푖푗 (compact notation)

푒̅푖 ∧ 푒̅푖 = 0

푒̅푖 ∙ 푒̅푖 = 1

푒̅푖 ∙ 푒̅푗 = 0 Thus, by performing the geometric product of and we have

푉̅1 푉̅2

e̅i∙e̅i=1 e̅i∙e̅j=0 e̅j∙e̅i=0 e̅j∙e̅j=1

푉̅1푉̅2 = [(a1b1) e̅⏞1 ∙ e̅1 + (a1b2) e̅⏞1 ∙ e̅2 + (a2b1) e̅⏞2 ∙ e̅1 + (a2b2) e̅⏞2 ∙ e̅2] ⏟ dot product

+ e̅i∧e̅i=0 e̅j∧e̅i=−e̅i∧e̅j e̅j∧e̅j 1 1 ⏞1 1 1 2 1 2 2 1 ⏞2 1 2 2 ⏞2 2 [(a b ) e̅ ∧ e̅ + (a b )e̅ ∧ e̅wedge+ ( producta b ) e̅ ∧ e̅ + (a b ) e̅ ∧ e̅ ] ⏟

Resulting in

푉̅1푉̅2 = (푎1푏1 + 푎2푏2) + (푎1푏2 − 푏1푎2) 푒̅1 ∧ 푒̅2 The product produces a scalar and an object which in compact notation is written as and푉̅1 represents푉̅2 an area created by 푒̅1 rotation∧ 푒̅2 (clockwise) or in

12 1 2 2 1 anti-clockwise.푒̅ The orientation is given by the 푒̅ sign∧ 푒̅ of the term in front of −푒̅the ∧ 푒̅ component. 푒̅1 ∧ 푒̅2

47

A versors is product of vectors in the geometric product space which has simpler inverse characteristics. …

푉̅ = 푉̅1푉̅2 푉̅3 푉̅푛 2.3.2 Inner Product

Inner product (also called dot product or scalar product) is synonymous with transforming

vectors into scalars. Inner product of vectors ‘ ’ and ‘ ’ is represented by a “ ”.

If ‘ ’ and ‘ ’ are vectors, defined as: 푎 푏 and 푎 · 푏then:

1 1 2 2 1 1 2 2  푎 푏 a = ( a e + a e ) 푏 = (푏 푒 + 푏 푒 )

1 1 2 2 1 1 2 2  푎 · 푏 = (푎 푒 + 푎 푒 ) · (푏 푒 + 푏 푒 )

 푎 · 푏 = (푎1푏1푒1 · 푒1 + 푎1푏2푒1 · 푒2 + 푎2푏1푒2 · 푒1 + 푎2푏2푒2 · 푒2)

Inner푎 ·product 푏 = 푎 is1푏 the1 + magnitude 푎2푏2 of production of vector quotients. If we were to reverse the order of the vectors to the inner product, then the resulting value will always be the same.

Example: 푎 · 푏 = 푏 · 푎

hen and

W a = (2e1 + 3e2) 푏 = (4푒1 + 5푒2) Then the inner product is:

푎 · 푏 

1 2 1 2  푎 · 푏 = (2푒 + 3푒 ) · (4푒 + 5푒 )

 푎 · 푏 = (8푒1 · 푒1 + 10푒1 · 푒2 + 12푒2 · 푒1 + 15푒2 · 푒2)

 푎 · 푏 = 8 + 15

푎 · 푏 = 23

48

Reversing the order of the vectors, the inner product is:

푏 · 푎 

1 2 1 2  푏 · 푎 = (4푒 + 5푒 ) · (2푒 + 3푒 )

 푏 · 푎 = (8푒1 · 푒1 + 12푒1 · 푒2 + 10푒2 · 푒1 + 15푒2 · 푒2)

 푏 · 푎 = 8 + 15

2.3.3 푏Outer · 푎 = Product 23 = 푎 · 푏

Outer product of vectors ‘ ’ and ‘ ’ (also called wedge product) is represented by a

“ ”. If ‘ ’ and ‘ ’ are vectors푎 defined푏 as: and

푎 ⋀ 푏 then: 푎 푏 a = (a1e1 + a2e2) 푏 = (푏1푒1 +

푏2푒2) 

1 1 2 2 1 1 2 2  푎 ⋀ 푏 = (푎 푒 + 푎 푒 ) ⋀(푏 푒 + 푏 푒 )

1 1 1 1 1 2 1 2 2 1 2 1 2 2 2 2  푎 ⋀ 푏 = (푎 푏 푒 ⋀ 푒 + 푎 푏 푒 ⋀ 푒 + 푎 푏 푒 ⋀ 푒 + 푎 푏 푒 ⋀ 푒 )

 푎 ⋀ 푏 = (푎1푏2 푒1 ⋀ 푒2 − 푎2푏1 푒1 ⋀ 푒2 )

 푎 ⋀ 푏 = ( 푎1푏2 − 푎2푏1 )푒1 ⋀ 푒2

1 2 2 1 12 In the above푎 ⋀ 푏 formula = ( 푎 푏the −“ 푎 푏 )푒 ” represents a coefficient scalar term of the area of a parallelogram associated( 푎 1with푏2 − the 푎2 plane푏1) containing the two basis vectors and .

푒1 푒2

Figure 11 - Outer Product [59] Figure 11 shows outer product. Outer product of two vectors is antisymmetric.

49

Such that

푎 ⋀ 푏 = − 푏 ⋀ 푎 Example:

‘ ’ and ‘ ’ are vectors and when and

푎 푏 a = (2e1 + 3e2) 푏 = (4푒1 + 5푒2); 

1 2 1 2  푎 ⋀ 푏 = (2푒 + 3푒 ) ⋀(4푒 + 5푒 )

 푎 ⋀ 푏 = (8푒1⋀푒1 + 10푒1⋀푒2 + 12 푒2⋀푒1 + 15푒2⋀푒2)

 푎 ⋀ 푏 = 10푒1⋀푒2 − 12푒1⋀푒2

 푎 ⋀ 푏 = −2푒1⋀ 푒2

12 If we푎 reverse⋀ 푏 = the −2푒 order of the vectors, then the outer product is:

푏 ⋀ 푎 

1 2 1 2  푏 ⋀ 푎 = (4푒 + 5푒 ) ⋀(2푒 + 3푒 )

 푏 ⋀ 푎 = (8푒1⋀푒1 + 12푒1⋀푒2 + 10 푒2⋀푒1 + 15푒2 ⋀ 푒2)

 푏 ⋀ 푎 = 12푒1⋀푒2 − 10푒1 ⋀ 푒2

 푏 ⋀ 푎 = 2푒1⋀ 푒 2

 푏 ⋀ 푎 = 2푒12

The math− 푏confirms ⋀ 푎 = −2푒that12 the outer product is antisymmetric:

2.3.4 Geometric Product 푎 ⋀ 푏 = −푏 ⋀ 푎

Geometric product (also called wedge product) of vectors ‘ ’ and ‘ ’ is represented by a “ ”. If ‘ ’ and ‘ ’ are vectors defined as: 푎 and 푏 then푎푏 [59]: 푎 푏 푎 = (푎1푒1 + 푎2푒2) 푏 = (푏1푒1 + 푏2푒2)

As per

50

e̅i∙e̅i=1 e̅i∙e̅j=0 e̅j∙e̅i=0 e̅j∙e̅j=1

푉̅1푉̅2 = [(a1b1) e̅⏞1 ∙ e̅1 + (a1b2) e̅⏞1 ∙ e̅2 + (a2b1) e̅⏞2 ∙ e̅1 + (a2b2) e̅⏞2 ∙ e̅2] ⏟ dot product

+ e̅i∧e̅i=0 e̅j∧e̅i=−e̅i∧e̅j e̅j∧e̅j 1 1 ⏞1 1 1 2 1 2 2 1 ⏞2 1 2 2 ⏞2 2 [(a b ) e̅ ∧ e̅ + (a b )e̅ ∧ e̅wedge+ ( producta b ) e̅ ∧ e̅ + (a b ) e̅ ∧ e̅ ] ⏟ 

1 1 2 2 1 1 2 2  푎푏 = (푎 푒 + 푎 푒 )(푏 푒 + 푏 푒 )

 푎푏 = (푎1푒1 + 푎2푒2) · (푏1푒1 + 푏2푒2) + (푎1푒1 + 푎2푒2) ⋀(푏1푒1 + 푏2푒2)

1 1 1 1 1 2 1 2 2 1 2 1 2 2 2 2 푎푏 = (푎 푏 푒 · 푒 + 푎 푏 푒 · 푒 + 푎 푏 푒 · 푒 + 푎 푏 푒 · 푒 ) +

1 1 1 1 1 2 1 2 2 1 2 1 2 2 2 2  (푎 푏 푒 ⋀ 푒 + 푎 푏 푒 ⋀ 푒 + 푎 푏 푒 ⋀ 푒 + 푎 푏 푒 ⋀ 푒 )

 푎푏 = (푎1푏1 + 푎2푏2) + (푎1푏2 푒1 ⋀ 푒2 − 푎2푏1 푒1 ⋀ 푒2)

 푎푏 = (푎1푏1 + 푎2푏2) + ( 푎1푏2 − 푎2푏1 )푒1 ⋀ 푒2

The푎푏 output = of(푎 1the푏1 geometric+ 푎2푏2) + product( 푎1푏2 contains− 푎2푏1 two)푒12 terms. The first term from the output

“ ” is a scalar. The second term “ ” is bivector with a coefficient of

1 1 2 2 12 “(푎 푏 + 푎 푏 ) ”. 푒

( 푎1푏2 − 푎2푏1 ) Geometric product of two vectors is not equal when we change the order of vectors.

Such that the exception would be if the vectors are parallel then .

푎푏 ≠ 푏푎 푎푏 = 푏푎 Example:

hen and

W a = (2e1 + 3e2) 푏 = (4푒1 + 5푒2) As per

51

e̅i∙e̅i=1 e̅i∙e̅j=0 e̅j∙e̅i=0 e̅j∙e̅j=1

푉̅1푉̅2 = [(a1b1) e̅⏞1 ∙ e̅1 + (a1b2) e̅⏞1 ∙ e̅2 + (a2b1) e̅⏞2 ∙ e̅1 + (a2b2) e̅⏞2 ∙ e̅2] ⏟ dot product

+ e̅i∧e̅i=0 e̅j∧e̅i=−e̅i∧e̅j e̅j∧e̅j 1 1 ⏞1 1 1 2 1 2 2 1 ⏞2 1 2 2 ⏞2 2 [(a b ) e̅ ∧ e̅ + (a b )e̅ ∧ e̅wedge+ ( producta b ) e̅ ∧ e̅ + (a b ) e̅ ∧ e̅ ] ⏟

From above formula

 푎푏 = (2푒1 + 3푒2) · (4푒1 + 5푒2) + (2푒1 + 3푒2) ⋀(4푒1 + 5푒2)

1 1 1 2 2 1 2 2 1⋀ 1 1⋀ 2 푎푏 = (8푒 · 푒 + 10 푒 · 푒 + 12푒 · 푒 + 15푒 · 푒 ) + (8푒 푒 + 10푒 푒 +

 12푒2⋀푒1 + 15푒2⋀푒2)

 푎푏 = (8 + 15 ) + (10푒1⋀푒2 − 12푒1⋀푒2 )

 푎푏 = 23 − 2푒1⋀ 푒2

Reversing푎푏 = 23the order − 2푒12 of the vectors, the outer product is:

푏푎 

 푏푎 = (4푒1 + 5푒2) · (2푒1 + 3푒2) + (4푒1 + 5푒2) ⋀(2푒1 + 3푒2)

1 1 1 2 2 1 2 2 1⋀ 1 1⋀ 2 푏푎 = (8푒 · 푒 + 12푒 · 푒 + 10푒 · 푒 + 15푒 · 푒 ) + (8푒 푒 + 12푒 푒 +

 10푒2⋀푒1 + 15푒2 ⋀ 푒2)

 푏푎 = 8 + 15 + 2 푒1⋀푒2

푏푎 = 23 + 2푒12

The math confirms that the .

푎푏 ≠ 푏푎

52

2.3.5 Inverse of Vector

If a vector geometric product then is called the left inverse of vector −1퐿 −1퐿 and if then is퐴 called퐴 =the 1 right inverse퐴 of vector . Geometric product −1푅 −1푅 퐴is not commutative,퐴퐴 = 1therefore퐴 the left inverse and right inverse may or퐴 may not be equal.

2.3.6 Versors

“One type of multi-vector that lends itself for inversion has the form

here are vectors, and Versor is their collective geometric퐴 =

product.푎1푎2푎3 …Such 푎푛 푤multi-vectors푎1푎2푎3 are… 푎 called푛 versors. 퐴

“Versor geometric product of vectors.”

1 2 3 푛  Reverse퐴 =of 푎versors푎 푎 … 푎is . † 푛 3 2 1  Multiplying with퐴 퐴 = 푎 . . . 푎 푎 푎 †  퐴 퐴 † 푛 3 2 1 1 2 3 푛  퐴 퐴 = (푎 … 푎 푎 푎 )(푎 푎 푎 … 푎 ) † 푛 3 2 1 1 2 3 푛  퐴 퐴 = ((푎 … + (푎 (푎 (푎 푎 )푎 )푎 ) … 푎 ) ) 2 2 2 2 1 2 3 푛  Furthermore퐴†퐴 = |푎 | Multiplying|푎 | + |푎 with| + . . . + |푎 | †  퐴 퐴 † 1 2 3 푛 푛 3 2 1  퐴 퐴 = (푎 푎 푎 … 푎 )(푎 … 푎 푎 푎 ) † 1 2 3 푛 푛 3 2 1  퐴 퐴 = (푎 (푎 +(푎 … (푎 푎 ) . . . 푎 )푎 )푎 ) ) 2 2 2 2 † 1 2 3 푛  퐴 퐴 = |푎 | and| 푎it is| scalar + |푎 | + . . . + |푎 |

 퐴†퐴 = 퐴 퐴† −1  We퐴 퐴 can =say 1 −1 퐴†퐴 퐴 = 퐴†

53

 −1 ( 퐴†퐴) 퐴 = 퐴†  † −1 퐴 퐴 = † 퐴퐴  = 1 † † −1 퐴 퐴 퐴 퐴 퐴 = † 퐴 = † For versors implies 퐴퐴 that 퐴 퐴 are same. −1퐿 −1푅 퐴 푎푛푑 퐴 Suppose is a multi-vector, if writing in reverse order = .

퐴 = 푎 퐴 퐴† 푎  = † 2 퐴퐴 |푎|  = = −1 −1 푎 퐴 푎 2 |푎|  There for given we can derive multiplying with −1  = 1 푎푏 푏 푎 −1  푎푎 = b −1 푎 푎푏  similarly, we can obtain 푎 푏 푏 = | |2 푎푏 푎 = | |2 푎푏 = 푎 Example: using 푎 versors and inverse we derive component 푏 of geometric product.

Assume secret key = 5 is defined as a vector

푠1 푎 = (2e1 + 3e2) data value = 9 is defined as a vector

푑1 푏 = (4푒1 + 5푒2) secret key = 7 is defined as a vector

푠2 푐 = (3푒1 + 4푒2) hen and

1 2 1 2 1 2 W 푎 = (2e + 3e ) , 푏 = (4푒 + 5푒 ) 푐 = (3푒 + 4푒 )

54

 푎푏 = (2푒1 + 3푒2) · (4푒1 + 5푒2) + (2푒1 + 3푒2) ⋀(4푒1 + 5푒2)

 푎푏 = 23 − 2푒12

 푎푏푐 = (23 − 2푒12 )(3푒1 + 4푒2)

1 2 푎푏푐To derive = 61 푒 + 98푒 −1 −1 푣푎푙푢푒 표푓 푏 = 푎푎 푏푐푐  1 2 1 2 3푒 +4푒 푏 = (61푒 + 98푒 ) ( 25 )  2푒1+3푒2 푏 = 23 − 2푒12 13 ( )  1 푏 = 46 + 6 푒1 + 69 – 4 푒2 13 ( ( . ) ( ) )

푏 This = 4푒 is1 foundation+ 5푒2 for new encryption cipher. How the geometric production and inverse will play a big role in the development of new cipher using versors. Versors gives a choice to have multiple vectors in the geometric product which results two types of output. The intermediate result produced contains a scalar and a multi-vector. The result of the vectors geometric product is a vector.

55

CHAPTER 3

PROBLEMS AND LIMITATIONS

This chapter and next chapter mainly discuss about how I found the problem statement. In this chapter, I will present various security problems with Cloud and SSD storage. I will present about various types of cyberattacks and discuss the importance of randomness of encryption methods and its limitations. I evaluate existing encryption methods and their performance on SSD in the Cloud and the performance penalties in terms of IOPS. This section will show that encryption methods/techniques affect workload performance. I used

Amazon Web Services (AWS) for this performance benchmarking. First, I studied the storage (SSD) performance impact between various storage options provided by AWS without encryption. Next, I benchmarked workloads with various block sizes, read/write ratio, and encryption methods on VMs with regular, encrypted SSD, and software encrypted containers. Also, this chapter will discuss existing encryption methods including homomorphic encryption methods.

3.1 Defining the Problem

Problem Statement: “Current homomorphic encryption schemes are still not efficient enough for real time applications.”

In the cloud computing environment, there are several security threats. Cloud Storage

SSDs brings their own strengths and weaknesses. Here I consider the causes, conditions, and limitations of enterprise cloud storage that can generate security concerns, to see if

56 there are practical solution(s) to all stages ESD security. I will also explain how these weaknesses are exploited using cyber-attacks. Currently various encryption methods are used to handle this problem, but each has its limitations. I will discuss the limitations and problems of existing and proposed encryption methods including FHE.

3.1.1 Encryption Security Limitations and Problem

Practicality of Homomorphic Encryption: Practical Homomorphic Encryption Survey

[26] say “A significant amount of research on homomorphic cryptography appeared in the literature over the last few years; yet the performance of existing implementations of encryption schemes remains unsuitable for real time applications”. Due to homomorphic encryption speeds are one of the main reasons for this conclusion, such as because it takes ranges from 2.5 sec to 2.2 hours to generate the key, the implementation is complex, noise creation can exceed thresholds, and bigger key sizes ( 17MB to 2.25GB) require high memory resources; all this becomes impractical in real systems [3] . Fully Homomorphic

Encryption (FHE), is on the “bleeding edge” of encryption technology. But currently there is no FHE available for real time applications [26]. There is still a lot of work that needs to be done to have “production ready” version of FHE.

Execution of Encryption method in the Cloud: The conventional encryption methods have a couple of issues.

• Large amount of data that needs to be transferred between the client and the cloud.

• If client is okay to have the encryption key on the cloud, that means the very item

used to decrypt the file will be readily available, in case an attacker gets into the

cloud system, which is clearly a security concern.

57

• If the client chooses to not store the key in the cloud, to update a file; they must

download all the encrypted file, decrypt it, modify it, encrypt it again, and upload

the encrypted file back to the cloud. As the file grows it increases the overhead on

the resources.

This research will focus on deriving production ready secure, efficient, scalable, and portable homomorphic encryption method focusing on the following section Encryption

Limitations.

3.1.2 Encryption Limitations:

Key Strength: If the data is encrypted, customers must use a key to manage the data storage process. If the key was generated with low randomness, that will create weaker security.

Encryption Algorithm: The degree of the system’s security depends on the strength of the cryptography method and its implementation. Increased computing power allows hackers to break encryption algorithms that were once considered state of the art.

Encryption vs Performance: There is very little research on how various encryption software methodologies impact performance of various workloads on SSD in the cloud.

The problem with these methods is that enterprises use the same encryption software for all types of workloads and different storage systems. Encrypting and doing regular application workload functions simultaneously will adversely impact the read write performance of

SSD drives.

3.2 Other problems contributed for research motivation

All the following problems also motivated to do this work but mainly solving problem mentioned in 3.1.1 section.

58

SSD Physics: Some SSD vendors implemented their FTL (Flash Translation Layer) with errors, those errors may prevent full sanitization or may delete all the data by overwriting the entire visible address space. Overwriting SSD address space is not always enough to sanitize the drive because the data persists, and this is a time-consuming process

[30]. When a file is deleted, from the OS’s perspective it is deleted, but on the SSD it may remain until garbage collection happens with the TRIM process [11].

Persistence of Data: When an SSD write occurs, data writes to new cells, but the data still exists in the old cells until a TRIM is executed [30]. If the key and encrypted file are stored on the same system, there is a possibility to read the encryption key from the SSD key storage area [31]. The SSD’s internal design and the way IO(Input/Output) operations happen are different than HDD’s. Yet, most encryption software for SSDs was developed using the same cryptographic algorithms that were used for HDDs. However, this does not account for SSD’s ghost data.

Data Exposure: If the data is not encrypted, then there is a risk of exposing personal data, this state can pose a security threats while data is at rest or traveling. The data can be accessed from different devices like PCs, phones, and public networks, which can each pose a security threat due to malware, adware, and non-secured public networks if they can be accessed by hacker. Public cloud poses its own security issues due to other cloud security threats like account hijacking, human error, etc.

Account Hijacking: One of the major security issues for the cloud is account hijacking, where someone gains access to account credentials and uses them for nefarious purposes.

Human Error: Human error and negligence can pose a security threat. For example, not removing the key or plain-text file from the cloud system. In Cloud computing users

59 must move the key between their system and the cloud. Security issues can be caused, if the users are not following proper security procedures and practices; such as writing passwords on sticky notes, forgetting passwords, sharing passwords, and sharing keys in non-secure way, etc.

3.2.1 Cyber Attacks

There are various attacks can be performed by attackers. One must remember while designing the encryption cipher should able to protect the data from these attacks.

Ciphertext-Only: When an attacker has access to ciphertext and nothing else, such as the key or plaintext, then using statistical methods they can guess the distribution of characters and use them to reveal the plaintext or secret key. This is called a Ciphertext-Only attack.

This most difficult type of attack for the attacker, since the attacker has the smallest amount of information [27].

Known-Plaintext: In this case if an attacker will have some of the plaintext/ciphertext pairs and then they use them to derive the key. This is called a Known-Plaintext attack. I will show using statistical methods and mathematical operations manipulation and see how I can able to derive the keys.

Chosen-Plaintext: It is similar way to a Known-Plaintext attack, but an attacker can choose and manipulate the plaintext input to the encryption algorithm, then evaluate the resulting cipher text to obtain the key.

Distinguishing-Attack: The goal of a is to distinguish the keystream of the cipher from a truly random sequence. An attacker can distinguish the cipher output

60 from random data faster than a brute force search is found. This sort of information can be very valuable to an attacker to reveal the plain text.

Birthday Attack: A Birthday-Attack is based on the statistical concept of the Birthday

Paradox where a match between two random items increases as the number of elements to use increases. For example, if there are 23 people in a room the probability of two people having same birthday increases to 50.7%. This concept is expounded upon with determining the encryption key (). While the numbers are higher, the concept of matching the encryption key is statistically much higher than the true randomness of the key.

Meet-in-the-Middle Attack: In this method the attacker builds a table with keys and

MACs (Message Authentication Code). A MAC is computed using 50% of the possible keys of key length on the same plaintext. Then the attacker eavesdrops on each transaction and compares the cipher with MAC table and reveals the key.

There are several more methods of attacks and cyber threats like spectra and meltdown.

The impact of an attacker finding a key could be devasting; this would give attackers to access to personal, financial, medical information and prevent access to this information from authorized users. All of these are a justification to constantly increase the strength and complexity of ciphers which are an important part of security [6].

3.2.2 Real Randomness

To generate an encryption key, real randomness is critical but extremely hard to achieve on computer system. Pseudorandom numbers can be generated from the system’s entropy resources: timing of keystrokes, exact movements of a mouse, and fluctuations of hard-disk

61 access time [60]. The key generated from randomness of these sources may become suspect, if an attacker is able to measure those sources and apply them to simulate the same ; but this is difficult, due to the amount of entropy generated from these resources.

Timing of a single keystroke will generate 1 to 2 bytes of random data and cryptographers think that is not enough entropy to thwart off the threat of attacker determining the key. Better typists have a consistent typing pace, where the timing between each keystroke will be within milliseconds, limiting frequency of which keystroke timing can be scanned, so timing of typing data may not be random. In this example, the attacker may have access to resources such as the computer’s microphone to hear the keystrokes and determine the timings (pace). Even generating the randomness using quantum physics force specific patterns that may be prone to attacks. This is because an attacker can use the

RF (Radio Frequency) field to influence these patterns [54]. Suppose I have a key with 128 bits of random data, this can still be vulnerable because an attacker can try 128 computations. This brute force attack is of growing concern as computation speeds2 increase.

3.2.3 Storage Security Limitations

This thesis first evaluates the SSD storage security and modern encryption software for securing the SSD. First, I will discuss the importance of reliability and integration of the

SSD and then I will address security. Cloud storage primarily uses SSD as storage to achieve performance guarantees. Second, I studied SSD characteristics to understand SSD strengths and performance metrics, when I use various storage specific encryption methods.

62

By using performance benchmarking, I want to prove that encryption will impact the performance of read and write operation of storage.

3.2.4 SSD System Level Induced Limitations

SSD physical structure poses reliability and scalability limitations. This can result system level limitation like wear leveling (endurance), Bad Block Management, and

Performance. Understanding the SSD limitations can help to determine or derive better security techniques for the device.

3.2.4.1 Physical Limitations Contribute to Logical (Software) Limitations

This chapter will describe the SSD physical limitations and how they will impact logical SSD functions. The following four major components of SSD functions will detail the physical and logical limitations.

3.2.4.2 Physical Level Address Map

In SSD, the address map is applied the same as traditional hard disk drives. The SSD

FTL maintains all the address table information. In figure 12, the top row is the logical address space and the bottom row is the physical address space. From the host’s perspective the writes and edits happen in plain sight.

63

Figure 12 - Address Mapping between physical to logical [11] Due to the limitations of SSD, it does not allow writes on the unused pages in the block, it instead writes to a new page in a new block, which is assigned in the physical block (in physical world it is the string). The old pages are not erased, but they are marked as invalid pages. Writing and rewriting to a cell causes cells to be exposed to multiple voltage impacts which deteriorates the cell walls, which reduces its life span. To avoid deterioration of an individual or set of SSD blocks, each rewrite follows a wear leveling algorithm to make sure all the cells deteriorate consistently. Also, when the current physical block is full, then another free one is assigned to the logical block. These changes add mapping addresses to the translation table (address mapping table), which is also stored on the SSD. The data for this table may be stored on the SSD itself, that could decrease the storage capacity of the device [11] [61].

Even with the best wear leveling algorithm, bad blocks will be created due to the inherent limitations of SSD writes and erases. When the blocks are not reliable, they are called bad blocks; information about these addresses are maintained by the BBM (Bad

64

Block Management) map. The limitation is keeping the BBM up to date, which is important for reliability. If the BBM is not maintained with correct information about bad blocks, then the system will try to write to those blocks. The data which is written to bad blocks will not be reliable. Monitoring the BER (Bit Error Rate) is also important to achieve a reliable system. ECC (Error Correction Code) is used to maintain the BER, but the ECC engine may cause performance issues, if it is not designed to perform in parallel for multiple channels. Correcting too many errors though, will negatively impact the efficiency of the drive [62].

3.2.4.3 Physical Wear Leveling Limitation

TOX (ZrO2) is a dielectric material and its thickness is a limiting factor in SSD. Floating gate cells will lose their charge over time through TOX, due to the thinness of the TOX layer. Floating gate cells also experience wear and tear due to additional stresses caused by voltage fluctuations. Electric charge for “program” (writes) operations are transferred through the TOX in the form of oxide traps. The concentration of the traps increases along with each write and erase operation, this called oxide stress. When electrons leak from a floating gate, these traps are used as a path for these electrons to travel toward the cell channel region [63]. The number of electrons leaking through the border of TOX is lower than the electrons traveling through SILC (stress-induced leakage current). If you have a close distance for SILC between each tunneling step, it increases the leakage. The TOX thickness scalability limitation is defined by important factors: the number of traps, SILC, and oxide voltage of the floating gate cell during retention. It’s been determined that the

TOX thickness must be 8.0-7.5 nm [64].

65

The floating gate cells should be able to hold a charge for minimum of 10 years. This was determined based on how much leakage is acceptable in a 10-year time span. The TOX thickness requirement plays an important role in defining the acceptable leakage. The number of cycles of program/erase operations applied to that cell also depends on TOX thickness. After about 10 thousand program/erase cycles the cell voltage threshold shifts upwards which would then require more voltage to do the operations of the cell. Physically neighboring cells share the same sensing amplifier. Because of this, a voltage shift in one cell will be used by neighboring cells. But this could damage cells which do not require more voltage. The effects of cells going bad will change the over-provisioned cell amount

(each SSD is manufactured with more storage, at least 25% more than the stated amount).

Over-provisioned cells play a main role on endurance, as they decrease the SSD life span also decreases [64].

3.2.4.4 Physical Limitation of Parallelism

When I discuss parallelism in terms of SSD, we are discussing parallelism of the read, write, and erase operations. The performance of these operations in parallel will be faster because multiple operations are processed at the same time. There are a couple of ways to increase the parallelism, one would be increasing the dies per channel, another would be increasing the number of channels. In increasing the dies per channel method, this may cause channel overloading and it may not be helpful for write performance. In increasing the number channels method, this can pose different Error Correction Codes for each channel, for this it needs dedicated SRAM (Static RAM). This option is scalable and can increase performance for the read and write operations. Hence, memory components must

66

be coordinated to operate in parallel. The serial ‘interface’ is over flash packages which can cause a bottle neck for the performance.

Other techniques to consider that may improve performance with parallelism: page size, page spanning process, queueing methods, ganging multiple flash, interleaving between flash, and the background cleaning process. With the page size technique, if the page size is smaller this will make look up times faster and take less space than if the page size table were larger. But this may not be good for performance if the data blocks are not consistently accessed. With the page spanning process technique, different flash packages can distribute the information to a single or multiple package. If the data stays on the same package, the results will have faster performance; otherwise it goes through different packages which will lower the performance. With the separate queue technique, each package handles parallel requests simultaneously, this means there is access to all the flash packages at the same time. This process is scalable and flexible and wear-leveling is maintained equally.

The drawback in this is each queue needs to maintain its own ECC, SRAM, and it also complicates the FTL. Handling too many ECCs may decrease performance. Ganging multiple flash packages technique is when SSD algorithms combines multiple flash packages together, then maintains for that group packages the same queues, ECCs, and

FTL. It handles multi-page requests with a reduced number of queues than the separate queue technique uses. This processing helps with less overhead for the ECC, but too few queues to work with, can cause a bottle neck for a busy system. With interleaving in flash packages all processes occur within a single die to speed up the read and write operations.

To avoid the latency in this process, it can access all related blocks in one place, which is faster than crossing between flash packages through a serial connection. The drawback of

67 this process is it may be writing to the same blocks over and over. When we focus on interleaving the benefits of wear-leveling are lost. Background cleaning process of SSD happens on packages when the system is not busy. When the cleaning process occurs, crossing between different packages means moving the erase blocks from one package to another through the serial connection. This generally is slower than cleaning the same die, but it will maintain wear leveling. Each technique has its own pros and cons, so we need to carefully analyze which technique is better depending on each workload situation [11] [65].

Figure 13 - Flashes and their parallel architecture[11]

68

Figure 13 shows the standard way parallelism in Flash. There is another form of parallelism which may improve performance, placing continuously allocated data from one domain over a set of N domains (A set of flash memories that share a specific set of resources like channels, queues, and ECCs; that can be divided into sub-domains as packages) like a stripe using mapping policy. Most flash memory packages support two- plane operations to read multiple pages from two planes in parallel and the operation across the dies can be interleaved. Since logical pages are normally striped over the flash memory array, reading multiple logical continuous pages in parallel for read ahead can be performed efficiently [11].

Most of the SSD operations store two bits per MLC cell. It was theorized that storing more (3 to 4) bits in each cell would increase the performance. But research showed, the

Vth voltage threshold required for the read, write, and erase operations took longer for 3 and 4 bits than it took for 2 bits per cell. Strategy wise, running NAND chips in parallel

(Figure13) would give the best performance, but it has its own limitations. More chips require more current flow, and that may not be possible due to the limiting factor of the maximum allowed current. Also, you need to read these strings using thousands of reading circuits with lots of sensors, which can make the process too complex and is more error prone [11].

3.2.4.5 Physical Limitation of Workload Management

In the current market, the SSD for consumer and enterprise versions are different.

Vendors built according to the anticipated workloads. Depending on the workload

69 requirements, they are built and programed with different designs. The consumer version does not need as complicated algorithms as does the enterprise version.

Figure 14 - Consumer Vs Enterprise SSD [11] In the real world, the consumer version of SSD falls short of the needs of the enterprise version (Figure 14), in that it does not have algorithms for zero tolerance of data loss, the uptime reliability, the endurance, the performance, and the error correction code handling; plus, it does not need to work with multiple I/O operations. Usually enterprise SSD systems come as pure flash (SSD) storage or hybrid (combined HDD and SSD) storage. Enterprise

SSDs must be able to simultaneously handle workloads like file, database, email, etc.; that are generated by multiple users with various traffic patterns. These different traffic patterns are multi-threaded random workloads, they are handled independently using multiple initiators. Additionally, for enterprise usage it must maintain consistent I/O throughput

(IOPS), integrity, and availability. The SSD controller needs to be tested thoroughly before it can be placed into enterprise usage to handle workloads 24/7/365.

70

In the case of power failures or other disruptions in a data center the workloads must be protected, so enterprise SSD systems are designed to handle those situations with the help of ECCs and CRCs (Cyclic Redundancy Check). Reliability of the workloads is very important, and SSD systems are built using redundancy techniques (RAID) to cover any hardware failures. If an enterprise wanted to have the higher performance, they can replace

HDD storage with SSD, but it can become expensive. The details will be discussed in the existing research section [11].

3.2.5 Existing research to mitigate the software limitations

Some of the main limitations in SSD are address mapping, parallelism (performance), wear leveling, and workload management. The user will not have the option to change the physical structure of the SSD. They will be limited to software approaches to mitigate the physical limitations. This section explains the research that has been done to mitigate these limitations. Most approaches have been focused on improving processes within the FTL.

The FTL is a core part of the SSD controller that maintains a sophisticated address mappings ( Indirect address mappings between ‘physical block address’ and ‘logical block address’), log-like write mechanism, GC (Garbage Collection), wear leveling, ECC, and over-provisioning [66].

3.2.5.1 Address Mapping

One of the FTL main functions is to maintain a mapping table of virtual addresses to physical addresses. Write operations can only happen when the block is in a special state called “Erased”. The erase operations happen at a much coarser spatial granularity than write operations, since page-level erases are extremely time consuming [67]. Page-level

FTL mapping can provide compact and efficient utilization of each block, but the issue is

71 that this takes a large amount of printing paging-table space (32MB SRAM large page table for 16GB Flash) and in some situations the lookup time will also be higher than calculating the off-set in block-level mapping. The block-level FTL mapping uses offset to calculate the page number, to maintain page information it requires just a fraction of the printing page-table space. However, looking up a page information in this mapping is more time- consuming than it is in page-level mapping. It also forces the logical page to be mapped to a physical page within each block. As a result, garbage collection overhead grows. Still the block level address mapping is the better option to use because it uses a lot less space [68].

Both schemes are opposite extremes in their weaknesses. This means page level mapping uses more space for the mapping table while block level mapping generates more garbage collection [11].

To address this issue, researchers implemented hybrid FTL, which combines page-level and block-level address mapping in the SRAM. In this method, some of the address table is stored on SRAM while the rest is stored on flash. This results into a problem with the hybrid FTL approach, because random writes (need to look both areas for addresses) induce costly garbage collection which it impacts the performance on subsequent operations.

Demand-based page-mapped FTL-DFTL (Demand-based Flash Translation Layer) addresses this problem in their approach. DFTL stores only the most recently used address translations on SRAM, while the rest are stored on flash [68]. The reason for this storage strategy is that most enterprise-scale workloads exhibit significant temporal locality.

However, the DFTL does not support spatial locality of workloads, which means frequent

“evict out” operations will cause extra erase operations and page mapping lookup overhead for workloads with less temporal locality. DFTL limits the space to store the page table and

72 it suffers from frequent updates to the page mapping table in the SSD flash for write intensive workloads and garbage collection [68]. The CFTL (Convertible Flash Translation

Layer) approach tries not to depend on the space of SRAM. CFTL is a hybrid FTL with efficient caching strategies and can dynamically change according to data access patterns.

CFTL’s concept is to use read-intensive data managed by block level mapping and write- intensive data managed by page level mapping. CFTL uses a hot data (data that is accessed the most by users) identification method to change the page mapping table. The CFTL uses a bloom-filters-based scheme which can capture recent and frequently accessed information at a fine-grained level. CFTL considers temporal and spatial locality of workloads for page level cache. If the page size is large, this means the chance that a file is spanning to multiple pages is lower; hence, the consecutive field of CFTL will be less effective [69]. SCFTL (Strategy Caching Flash Translation Layer) deals with the large page size and the spanning issue of pages. SCFTL stores a page-mapping table in several

TPs (translation pages) containing thousands of physical page numbers and mapped to consecutive logical addresses. SCFTL’s PMT (page-mapping table) contains TPD

(translation page directory) and CMT (cache mapping table). TPD is in RAM and indexes

CMT by the most significant bits of logical addresses. The performance degradation from offloading the mapping table is reduced by caching several mapping entries in the CMT.

CMT integrates two spatial locality exploitation techniques and a customized cache replacement policy to enhance its efficiency of SCFTL. SCFTL performs multilevel page table lookups for address maps. If there were a cache miss then the request goes to TPs, if a cache miss occurs there too, then the requested block must get it from flash [70]. CA-

SSD (Content Aware SSD) is a modified FTL that adds minimal support in the form of

73 additional hardware for hash functions. It uses hashes as values in the mapping table instead of page information. It also requires battery-backed RAM to store hashes. The drawback of the approach of CA-SSD is that it depends on battery power and extra hardware [71].

Implementing encryption on the above approaches will become cumbersome. When the scholars studied address mapping enhancements, they may have not considered encryption. The existing research results may not be the same with encryption and that needs to be studied further.

3.2.5.2 Wear Leveling

Due to the locality in most workloads, writes are often performed over a subset of blocks (e.g. file system metadata blocks). Some flash memory blocks may be frequently overwritten and tend to wear out earlier than other blocks [11] [65]. FTLs usually employ some wear-leveling mechanism to ‘shuffle’ cold blocks with hot blocks to even out writes over flash memory blocks. There is has been some research with some variations on how to approach wear-leveling in the form of managing workloads. Researchers approached implementing CAFTL (content aware FTL) for removing unnecessary duplicate writes to improve the efficiency of garbage collection, wear-leveling, and reduce the write traffic to flash [72]. One of the previous researchers came up with an approach to solve the wear- leveling issue by reusing the flash blocks, which have been cycled to the specified worn out algorithm SR-FTL (Smart Retirement FTL) [73]. Another approach is to use a dual- pool algorithm to store cold data to the blocks that have been identified as more worn and smartly leave them alone until wear leveling takes effect [74].

With all the bodies of research on wear leveling approaches, it is a complex (full of unknown variables) process and there may never be a perfect solution. That’s because there

74 are no consistent workflows nor predictable usage of storage. So, the researchers weigh the pros and cons for various approaches to evaluate the performance versus endurance versus reliability with different workloads. But the inherent nature of SSD is to move data around to maintain wear leveling. In doing so, it leaves valuable data in the invisible address space, even though it is not retrievable by normal operations, it is still there. Ideally, purging or overwriting the address space is most desired, but it may create a lot of wear on an SSD.

Encrypting the data allows us to retain existing wear-leveling algorithms without exposing this valuable data.

3.2.5.3 Parallelism

The bandwidth and operation rate of any given flash chip is not enough to achieve optimal performance. SSD has multiple flash arrays so we can run multiple I/O jobs concurrently and this will improve the performance of the SSD. A single flash memory package can only provide limited bandwidth (e.g. 32-40MB/sec). Writes are slower than reads, other necessary background jobs like garbage collection, wear-leveling, can incur latencies as high as milliseconds [65]. These limitations can be addressed by SSD’s clever structure that is built with an array of flash memory packages connected through multiple channels to flash memory controllers to provide internal parallelism. The logical block addresses as the logical interface to the host system, and it can stripe over multiple flash memory packages. This way the data accesses can be conducted independently in parallel, it will provide high bandwidth in aggregate and hide high latency operations, that combination can result in high performance [72]. One way is to improve the sequential writes is by dividing the flash array into banks; each bank will be able to read/write/erase independently. The performance gains from internal parallelism are highly dependent on

75 how the SSD internal-mapping and resource management compete for critical hardware resources. The workloads are in the form of mixing reads and writes, but they interfere with each other, so proper address mapping management and design of applications is critical.

Most of the applications are designed for HDD storage. When we execute them to an SSD, this may be not optimal. The critical issues in SSD parallelism include thin interface between the storage device and the host, workload access patterns, asynchronous background operations generated by reads and writes, effect on read ahead, ill-mapped data layout, and application designs [75]. There are different levels of parallelisms in SSD:

Channel, Package, Die, and Plane. The previous research [75] concluded that read ahead is not affected by access patterns in MLC-SSD, writes though are strongly correlated to access patterns. Small size random writes suffer from high latencies and high interference between reads and writes [75]. Adding a disk cache helped improve the performance for read and write operations. But background operations like the erase operation can cause interference with reads and writes and internal fragmentation is too high for excessive random writes.

Studies on the four levels of parallelism such as channel, chip, die, and plane have shown a direct impact to SSD performance, but they provided limited information, considering that the SSD structure is a block box. The advanced commands utilize only die and plane levels of parallelism; they explore how allocation schemes can determine priority order for multiple levels of parallelism for different types of application loads. The channel-level parallelism should be given the highest priority order among the four levels and it was observed that chip level parallelism keeps chips very busy. The service request can only be handled when chips are idle [75].

76

Parallelism has the biggest impact on SSD performance. The advantages of existing parallelism can still be viable even with the addition of encryption methodologies for storage.

3.2.5.4 Workload Management Integrated with SSD

Performance is highly workload dependent. Well-designed systems, databases, and applications improve performance. The following are some of the classic examples of integrating SSD to systems to achieve better performance [11]. Integrating the SSD into existing system is a complex process. Scalability (replacing 1GB of HDD with 1GB of

SSD) is limited by cost effectiveness, because the gains in performance don’t justify the added expenses. HybridDyn (Integration of HDD and SSD storage) is an innovative storage design that is cost-effective and improves performance and endurance. It handles incoming workloads by dynamically partitioning and distributing them between SSD and HDD. This design showed better performance than HDD alone [76]. Another research approach is

LSM-tree-based store with an open-channel SSD to utilize channel level parallelism. Level

DB (a fast key-value storage library in LSM-tree-based store) is extended as multi-threaded to fully utilize the channel level parallelism with evaluating optimal I/O request scheduling and dispatching. Evaluating the utilization of channel level parallelism’s impact on I/O performance showed that it outperforms conventional SSDs [13]. Another system, Libra tracks the I/O consumption of each tenant; it recognizes the application’s dynamic I/O usage profiles and provides I/O resources accordingly. Libra based VOP (virtual I/O operations) captures the non-linear relationship between SSD I/O bandwidth and I/O operations throughput; it does this while considering the disk-IO (disk Input Output) cost

77 model [77]. Hadoop workloads showed a performance increase over HDD alone when an

SSD was integrated into the underlying storage system.

The research showed workloads performance always improved with adding SSD or just SSD as the storage. SSD is faster than HDD, so adding it to the storage system it was expected to improve performance. But, in some cases, the applications won’t able to utilize the SSD performance fully due to the nature write guarantees. This research studies the impact on performance of the different types of workloads with the different encryption methodologies.

78

CHAPTER 4

STORAGE ENCRYPTION ANALYSIS

In this section, I showed how the SSD storage performance is affected by storage type(t2 micro versus i1.xlarge) and encryption software methods I proved that in both aspects there is performance penalties for workloads.

4.1 Measurement Environment

Each Amazon EC2 (Elastic Compute Cloud) instance can access disk storage from disks that are physically attached to the host computer. This disk storage is referred to as an instance store or EBS (Elastic Block Store) volumes. An instance store provides temporary block-level storage for use with an instance. The size of an Amazon instance store ranges from 8GB to 48TB, and varies by instance type (i.e., larger instance types have larger instance stores) for HDD. Using regular SATA SSD, the storage ranges from 8GB to 6.4TB.

If the storage type is NVMe (Non-Volatile Memory express) SSD, then the storage ranges from 8GB to 16TB.

Amazon EBS provides two volume types: Standard volumes and Provisioned IOPS volumes, which differ in performance characteristics and price. Standard volumes offer storage for applications with moderate or burst I/O requirements. These volumes deliver approximately 100 IOPS on average but can burst up to hundreds of IOPS. Provisioned

IOPS volumes offer storage with consistent and low-latency performance, which allows users to predictably scale to thousands of I/O operations per second per Amazon EC2 instance. These volume-types are designed for applications with I/O-intensive workloads.

79

Backed by SSDs, Provisioned IOPS volumes support up to 30 IOPS per GB, which enables a system to be provisioned up to a maximum of 4,000 IOPS per volume. While it is possible to stripe multiple volumes together to achieve up to 48,000 IOPS when attached to larger EC2 instances, but as per theory it may show as regular SSD disk volumes, so we did not evaluate this type of VMs. When attached to an EBS-optimized instance,

Provisioned IOPS volumes are designed to deliver consistent performance within 10 percent of the guaranteed rate throughput (Provisioned IOPS) 99.9% of the time. In addition, the delivered IOPS rate depends on the block size of the various reads and writes.

Amazon Provisioned IOPS volumes process reads and writes in I/O block sizes of 16KB or less with every increase in I/O size above 16KB, linearly increasing. A significant amount of data was produced during the experiments and it was used to analyze the main concepts about SSD performance variations with different variables including encryption methods.

The experiments in this study have been conducted on three different 64-bit VM

(Virtual Machine) instances in Amazon EC2; the first one was an Amazon Linux AMI

(HVM) 2014.03.1 and the remaining two VMs were Amazon Ubuntu Server 16.04 LTS

(HVM). The first VM is an instance store (i2.xlarge) of an 800GB SSD, which can provide up to 36,000 IOPS. The second VM (standard t2.micro) is an 8GB instance store with

3,000 IOPS. And the third VM (standard t2.micro) is an 8GB encrypted EBS General

Purpose (SSD) Volume Type with 3,000 IOPS.

The first VM is drastically different from the other two (in: memory, vCPUs, and processor model), I chose those VMs to analyze their unique SSD characteristics. The second and third VMs are similar (having the same: ECUs, 1GB memory, vCPUs (1), and

80 processor (2.5 GHz, Intel Xeon Family)); the only difference between the two VMs is one of them is a standard instance store SSD without encryption and the other VM has an attached EBS SSD volume with encryption.

4.1.1 Selection of Encryption methods

I selected the following two software encryption methods; encrypted SSD and regular

SSD. The following explains each in very high level of them and what type of algorithm I used in these evaluations.

Dm-crypt:

Dm-crypt is a disk encryption method compatible with Linux kernel version 2.6 or later. It uses API routines. Devices are mapped to encrypted containers using a device mapper

[50]. This API uses AES-256 cryptographic method along with other methods. Dm-crypt uses Linux Unified Key Setup (LUKS) to create encrypted containers which are independent from outside platforms. LUKS was developed by Clemens Fruhwirth in 2004

[51]. Using this method, the user can even encrypt the root device. A passphrase is required to create encrypted containers.

There has been some research around the drawbacks of dm-crypt. For example, it has been discovered that hackers can sidestep the passphrase to access encrypted containers by hitting the ‘Enter’ key a couple of times. They can also delete the containers, because deleting containers does not require a passphrase. An intruder can determine critical components of the hidden containers relatively easily by utilizing disk commands on the system [52] [53].

81

BestCrypt:

BestCrypt is an encryption software installed on the OS level that care create encrypted containers or volumes downloaded and created encrypted volumes. Use them to store secure data with encryption password. These volumes are mounted as file system to store data. I applied AES encryption algorithm as option to gather performance statistics [54].

Self-Encrypting Drive (SED):

When Full Drive Encryption (FDE) is applied on an SSD, it is called a Self-Encrypting

Drive (SED). FDE was developed in 2009. It is a literal encryption of the entire system which includes all the partitions, system files, and operating system. This encryption method assigns the process to use the hardware component of the drive. This helps to enhance the security by utilizing the Opal Storage Specification (which is a set of specification features of SEDs). SED needs a master password for the SED and a user password for each user. They are stored in the BIOS and handled by the hard disk controller. SED uses AES 128 and AES 256.

Researchers have found the following vulnerabilities of this method: Hot Plug Attack, Hot

Unplug Attack, Forced Restart Attack, and Key Capture Attack. They have also shown that attackers can bypass the encryption and access data; this undermines the purpose of securing the data [55].

82

4.1.2 Experimental Tools and Workloads

To evaluate the internal parallelism of SSDs by producing the necessary workloads in this research, FIO (Flexible I/O) Synthetic Benchmarks were used1. FIO is a tool that generates multi-threaded workloads with different configuration variables to fully utilize the hardware, such as: a read/write ratio, a block size, and the number of concurrent jobs.

This process produces a report that contains the bandwidth, the IOPS, the latency, plus many other measurements. I used various SSD storage device with different I/O workloads to calculate their performance metrics; each workload was run for 60 seconds using FIO.

A sample FIO command is provided below:

fio --filename=/dmcrypt/4krandreadwrite6040j8 --direct=1 --rw=randrw --size=1024m --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio --bs=4k --rwmixread=60 --iodepth=8 --numjobs=8 --runtime=60 --group_reporting --name=4krandreadwrite60j8 --output=/home/output/4kdmcryptrandreadwrite60j8

Sample FIO Command

In the Sample FIO Command, the file size to be written is 1024MB (size=1024m) in block sizes of 4K (bs=4k). The workload is split between 60 percent random read

(rwmixread=60) and 40 percent random write (=100-60% read) with 8 jobs (numjobs=8) running in parallel for 60 seconds(runtime=60).

Experiments were executed independently on each virtual machine to fully utilize the

SSD parallelism capability while introducing variations in the block size (4k, 8k, 16k, 32k,

64k, and 128k), the number of parallel jobs (8), and the random read/write ratio (100 percent reads, 100 percent writes, and 60/40 read/write workloads). These factors were tested on an unencrypted SSD, different SSD’s with two software-based encryption

1 http://freecode.com/projects/fio

83 methods, and one fully Amazon encrypted SSD. Each experiment was executed for a total of 60 seconds utilizing the FIO benchmark, version 2.1.7.

The FIO command were in the following order: 100% write, 60/40 read write, and

100% read. Each one was executed for six different block sizes with 8 number of jobs. To emulate an enterprise workload environment, I used random read/writes workload environment. The research is about how these workloads get affected based on the encryption method and its implementation. A queue depth of eight was selected as enough, because only a handful of earlier trials were utilizing a depth past eight.

4.2 SSD performance without Encryption

I completed lengthy experiments and exposed the knowledge of the internal structure of SSDs, and background information regarding the storage options within Amazon EC2.

I am now positioned to evaluate the experimental results and answer several related questions.

I created different types of VM instances using SSDs with different IOPS ranges. Our research considered all of those to understand the internal characteristics of SSD. Baseline metrics were created from those experiments to use for performance comparisons with various encryption implementations.

4.2.1 Performance differences between Amazon EC2 VMs

There were significant differences in the performance between the two Amazon EC2 instances. While this was expected, it was interesting to validate the actual performance characteristics of the two different instances versus the specs that Amazon provided about their VMs.

84

In Graph 1 and Graph 4, the performance of the i2.xlarge instance consistently out- performed the t2.micro instance in all experimental runs with all block sizes. In addition, this difference typically increases as the read/write ratio transition closer to 100 percent reads, regardless of whether evaluating a sequential or random read/write. This is likely since the instance store volume is physically attached to the computer to which the EC2 instance is running. Our experiments focused on random read writes. One of the limitations in this comparison is that the total random reads and writes were limited to

35,000 IOPS on the i2.xlarge instance and only 3,000 IOPS for the t2.micro instance. This prompted me have a more in-depth comparison between t2.micro instance store and EBS storage volume to perform a more in-depth comparison of the two different storage mechanisms. Section 4.3.1 I discuss the results.

4.2.2 Did various block sizes significantly affect I/O throughput?

In both Amazon EC2 instances I observed that as the block size increases the number of IOPS decreases along with the execution time to complete the required reading and writing of data by FIO. This is most likely because as the block size increases, there is less frequent overhead required to manage the writing of larger blocks. In addition, as expected with increased block sizes, the reading or writing of data is also completed in increasingly larger chunks. The metrics in Graph 1 plot the ratio of reads and writes versus the number of IOPS completed for various levels of block sizes. I can see that IOPS decreased as block size increased; the only exception was that the 16K 100% read out performed the 8K 100% read.

85

i2.xlarge Block size can affect number of IOPS

70000

60000 4k - rand read

8k - rand read 50000 16k - rand 40000 read 4k - rand write

IOPS 30000 8k - rand write

16k - rand 20000 write

10000

0 0 10 20 30 40 50 60 70 80 90 100

Read Percentage

Graph 1 - IOPS Vs Block Size

4.2.3 Did various levels of parallelism affect I/O throughput?

Experiments were performed consisting of 8, 16, and 32 threads, or jobs, operating in parallel on all block sizes. As seen in the Graph 2 (using a block size of 8K), I did not see any significant improvements between 8 threads, 16 threads, or 32 threads; but instead saw a drop in IOPS for the 16 thread and 32 thread simulations. This may indicate the SSD is saturated after 8 threads and cannot provide any increase in performance using parallelism.

The main observation is that 8 threads or jobs saturated the SSD parallelism and increasing the jobs did not help.

86

i2.xlarge Number of random read jobs VS IOPS 45000

40000 16 jobs rand read 35000 8 jobs rand 30000 read

25000 32 jobs rand read 20000 KB / KB Sec 8 jobs 15000 rand write

10000 16 jobs rand write 5000 32 jobs 0 rand write 0 10 20 30 40 50 60 70 80 90 100 Read Percentage

Graph 2 -Parallelism Vs Throughput 4.2.4 Did random and sequential jobs have a different IOPS?

In Graph 3 , I observed there was no significant difference between the observed behavior of sequential reads and writes versus those of random reads and writes. The i2.xlarge instance has been optimized by Amazon for random reads and writes; as it even performed better than the corresponding sequential reads and writes. This occurs around 55 percent reads and 45 percent writes and continue until about 90 percent reads, where sequential outperforms random reads/writes again. The results showed that at 100 percent sequential write it was significantly slower than the equivalent random write. I hypothesize this is related to garbage collection or trying to understand the changing of write mode at the FTL level. However, there is no such gain for random reads/writes on the t2.micro machine. As can be seen in Graph 3, the total random reads/writes are capped around 3,000

IOPS for 4k or 8k block sizes. This performance is expected per the performance metrics

87 provisioned by Amazon for the EBS volume attached to this instance. Additionally, at no time does random read/write operations outperform sequential read/write operations. This type of performance is more in line with what is expected from a traditional SSD.

i2.xlarge Random vs Sequential 8 jobs 50000 rand read 45000

40000 8 jobs sequenti 35000 al read

30000 8 jobs 25000 rand write

20000/ KB Sec

15000 8 jobs seq 10000 write

5000 8 jobs 0 rand 0 10 20 30 40 50 60 70 80 90 100 read Read Percentage write

Graph 3 - Random Versus Sequential Operations

4.2.5 SSD Random Workload Analysis on t2.micro VM

From Section 5.3.1 to 5.3.4, I observed that random and sequential operations are very close in IOPS. Amazon provides different numbers of IOPS for various types of SSD VMs.

I chose Amazon t2.micro (16.04 LTS HVM, SSD volume type VM instance store in EC2) as the VM machine. And I used the block sizes (4k, 8k, 16k, 32k, 64k, and 128k) on random reads, writes, and read/writes to establish baseline metrics. These metrics will be used for comparing with different encryption methods workloads. These experiments were done using random workloads for 100 percent reads, writes and 60/40 read/writes (Mixed).

88

IOPS and Random Workloads Without 4000 Encryptions Read IOPS 3000

WriteIOPS

2000 IOPS

1000 Mixed IOPS 0 4k 8k 16k 32k 64k 128k Block Size

Graph 4 - t2.micro Block Size Versus IOPS

Block Size and No Encrypted SSD 8000000 Performance 6000000 Read IO

4000000 Write IO

KB/Sec 2000000 Mixed IO 0 4k 8k 16k 32k 64k 128k Block Size

Graph 5 - t2.micro Block Size Versus KB/Sec In Graph 4 and Graph 5, it was observed workloads for 100 percent reads, writes and

60/40 read/writes showed similar IOPS (maximum IOPS Amazon provisioned) for 4k, 8k, and 16k block sizes. Once it reached 32k the IOPS decreased 40%, 64k IOPS decreased

60%, and 128k decreased to 85% of the 4k block size IOPS, but as seen in Graph 5, overall reading and writing of data to the disk increased because of increased block size. I hypothesize that this is related to the block size overhead, but the increase is not

89 proportional to the block size data input. Also, another important SSD characteristics I observed was that reads were faster than writes as shown in Graph 5 for the block sizes:

32k, 64k, and 128k (which were less impacted by Amazon maximum provisioned IOPS).

This type of performance is more in line with what is expected from a traditional SSD.

Going back to Graph 4, the evidence of Amazon’s data capping is clear at 4k, 8k, and 16k, plus 32k mixed, because the IOPS hovers around 3,110. I used these metrics as baseline for future comparisons.

4.3 SSD performance with Encryption

In chapter 5.3, I established set of baseline metrics. I then ran the same experiments with various encryption methods, block sizes, and workloads. I chose not to vary the number of jobs based on the data described in section 5.3.3, which showed little difference between 8 jobs versus 16 or 32 jobs. So, I set the number of jobs/threads to 8 for all block sizes and all workloads. These experiments were conducted on two different software encryption methods (BestCrypt and Dm-crypt) and one encrypted SSD by Amazon.

Amazon EBS volumes were encrypted with unique 256-bit key using the AES-256 algorithm. Also, when you snapshot (a way of cloning storage volumes) these volumes share the same key2. Customers maintain these keys using their own key management infra-structure.

To execute the experiments, I created a working environment by creating a VM in

Amazon EC2 and installing encryption software and FIO benchmarking software. I used

2 http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSEncryption.html

90 the same process for both software-based encryption methods. For the encrypted SSD, I created a VM and attached encrypted EBS SSD volume to it. The following graphs (Graph

6 – 16) will show the different encryption methods and their performance patterns for the different the block sizes versus IOPS and KB/Sec.

4.3.1 Did various block sizes significantly affect IOPS

I observed that as the block size increases the number of IOPS decreases along with the execution time to complete the required reading and writing of data by FIO in all types of encryption software methods. The performance metrics from Graph 6, 7, 8 showed a similar decrease of IOPS for all types of encryption methods.

Encrypted SSD Block Size Versus IOPS 3500 3000 2500 2000

IOPS 1500 1000 500 0 4k 8k 16k 32k 64k 128k Block Size Read IOPS WriteIOPS Mixed IOPS

Graph 6 - Encrypted SSD Block Size Versus IOPS

91

BestCrypt Block Size Versus IOPS 500

400 Read IOPS

300 WriteIOPS

IOPS 200

100 Mixed IOPS 0 4k 8k 16k 32k 64k 128k

Block Size

Graph 7 - Best Crypt Block Size Vs IOPS

dm-crypt Block Size Versus IOPS 1400 1200 Read IOPS 1000 800 WriteIOPS

600 KB/Sec 400 MixedIOPS 200 0 4k 8k 16k 32k 64k 128k Block Size

Graph 8 - Dm-crypt Block Size Vs IOPS One of the main characteristics of SSD is reads outperform writes, but when I use encryption software they showed the opposite results; writes performed better than reads

(Graph 6 versus , Graph 7, Graph 8 - This is a very significant discovery about doing encryption on SSDs. This finding may indicate that when using software-based encryption

92 on an SSD, the decryption (read) process takes more time than the encryption (write) process.

4.3.2 Did various block sizes affect Performance Throughput

In the previous section, I observed that IOPS decreased as block size increased in all encryption methods.

Encrypted EBS SSD Volume Block Size Versus 8000000 throughput

6000000 Read IO

4000000 Write IO KB/Sec 2000000 Mixed IO

0 4k 8k 16k 32k 64k 128k Block Size

Graph 9 - Encrypted EBS SSD Volume Block Size Versus throughput

BestCrypt Block Size Versus Throughput 300000

250000 Read IO 200000 Write IO 150000

KB/Sec 100000 Mixed IO

50000

0 4k 8k 16k 32k 64k 128k Block Size

Graph 10 - BestCrypt Block Size Versus Throughput

93

Dm-Crypt Block Size Versus Throughput 2000000 Read IO 1500000 Write IO 1000000

KB/Sec Mixed 500000 IO

0 4k 8k 16k 32k 64k 128k Block Size

Graph 11 -Dm-Crypt Block Size Versus Throughput In the Graph 9, I observed there was no significant difference between reads and writes versus unencrypted SSD throughput. Also, I observed that at 32k and higher I do not see any significant throughput increase. In Graph 10 10, using the software encryption method

Best Crypt, I observed the 32k block size had the lowest performance of all other block sizes. In Graph 11, using software Dm-crypt encryption I observed that the throughput has a linear increase as the block size increases.

4.3.3 Did various Encryptions Versus Performance Throughput

The experiments showed that there is a significant difference in between SSD’s that use encryption software methods and encrypted SSD.

94

Encryption Methods Versus IOPS 3500 3000 Read IOPS 2500 2000 Write

IOPS 1500 IOPS 1000 500 Mixed 0 IOPS Without Encrypted EBS Dm-crypt BestCrypt Encryption SSD Encryption Methods

Graph 12 - Encryption Methods versus IOPS

Encryption Methods Versus Throughput 2000000 Read IO 1500000

1000000 Write IO KB/Sec 500000 Mixed IO 0 Without Encrypted EBS Dm-crypt BestCrypt Encryption SSD Encryption Methods

Graph 13 - Encryption Methods versus Throughput

I did various workload performance metrics experiments for all sizes (4k, 8k, 16k, 32k,

64k, and 128k) of blocks. In Graph 12 and Graph 13, I observed that an encrypted SSD outperformed software-based encryption methods (graphs show only for 4k block size). In

Graph 12, the encrypted volume showed very similar performance to regular unencrypted

SSD.

95

4.3.4 Reads, Writes and Mixed workloads Versus Block Sizes.

Read workloads - Block Sizes Versus Throughput 3500 Without Encryptio 3000 n 2500 Encrypted EBS SSD 2000

IOPS 1500 Dm-crypt 1000 500 BestCrypt 0 4k 8k 16k 32k 64k 128k Block Size

Graph 14 - Read workloads for various Block Sizes

Write workloads - Block Sizes Versus Throughput 3500 Without Encryptio 3000 n 2500 Encrypted EBS SSD

2000 IOPS

1500 Dm-crypt 1000

500 BestCrypt 0 4k 8k 16k 32k 64k 128k Block Size

Graph 15 – Write workloads IOPS for various Block Sizes

96

Mixed Workloads - Block Sizes Versus 3500 Without Throughput Encrypti 3000 on

2500 Encrypte d EBS 2000 SSD

IOPS 1500 Dm- 1000 crypt

500 BestCryp 0 t 4k 8k 16k 32k 64k 128k Block Size

Graph 16 - Mixed Workloads IOPS for Various Block Sizes

Graph 14, Graph 15, and Graph 16 indicates that as the block size increased, the IOPS decreased. For software encryption methods, block sizes of 64k and 128k had lower performance than the 4k, 8k, 16k, and 32k. For 128k block size, the encryption method

Best Crypt, 100% reads showed such a low performance that it could be measured in just a single digit (only 9 IOPS).This was by far the lowest performance of all the encryption methods.

4.4 Fully Homomorphic Encryption Limitations

4.4.1 FHE with Vector Space

The first simple FHE cipher using multi vectors, called EDCHE (Enhanced Data –

Centric Homomorphic Encryption) was presented by DaSilva. It uses geometric algebra and multi-vector spaces where is 2 or 3. And these vectors represent the dimensions n of vector space 2D and 3Dℝ respectively., 푛 When using a 3D vector space, it will generate an

97 encrypted file that is 8 to 10 times the size of original plaintext file . This makes it hard for users to justify this method for their applications. When creating [78 the] most robust secure algorithms, the cryptographer needs to keep in mind that the algorithms should be simple, efficient, secure, practical, and able to accommodate computer resources. This gives an opportunity to develop a new FHE cipher to fulfill these requirements.

4.4.2 Previous homomorphic encryption using multi-vector technique.

Key Size and Time in sec on Regular SSD 150 AES-Crypt Ecnryption 100 Xlg-Crypt Encryption AES Crypt 50 Decryption xlg Crypt Decryption 0 64 128 256 512 1024

Graph 17 – Multi-vector Based Homomorphic Encryption Graph 17 shows that I used different key sizes ranging from 64 bits to 1024 bits for encryption and decryption. I observed that when comparing the performance in terms of time; xlg-crypt underperformed than AES-Crypt for full file encryption and decryption.

Xlg separates itself from AES because it is fully homomorphic encryption and does not need to encrypt/decrypt an entire file on every update. Due to this unique characteristic, xlg-crypt will outperform AES-Crypt on smaller updates.

Even though xlg-crypt takes more time to encrypt than AES-Crypt, it offers additional security features. Such as the unique nature of xlg allows a client to work with all, some, or even none of the encrypted files from the server. This allows the system to only expose

98 necessary parts of encrypted files to the client keeping the rest of the files encrypted and secured on the server. When using Symmetric encryption methods, during any update process the decrypted (plain text) version of the file exists until it is deleted. As xlg-crypt is homomorphic encryption I do not need to have any plain text file on VM due to its characteristics.

Encrypted file size in MB on Regular SSD VM 900 800 700 600 500 400 300 200 100 0 32 64 128 256 512 AES-Crypt xlg-Crypt

Graph 18 – Multi-vector based encrypted file sizes I also observed that when I encrypt a 100 MB file AES-Crypt created a 101 MB encrypted file while xlg-crypt created an 801MB encrypted file. In general, the xlg-crypt

I have observed 8 times the encrypted file generated from original plain text file. This is due to xlg-crypt math algorithm calculations it creates bigger encrypted file. Its output file is 8 times larger, due to that it takes longer time to decrypt. Xlg uses 3 dimensional and infinite filed space and it causes this growth. Even though it takes more space, each update does not require a rewrite of cells in the SSD, this is more aligned with endurance concerns on SSD storage devices.

99

CHAPTER 5

RVTHE

In this chapter I will present a new symmetric homomorphic encryption method

“Reduced Vector Technique Homomorphic Encryption”. This section will discuss its design, mathematical implementation, and homomorphic properties.

RVTHE (Reduced Vector Technique Homomorphic Encryption) is a new symmetric homomorphic encryption method and this chapter descripts its design and homomorphic properties.

5.1 Design of RVTHE

Design of RVTHE depends on Versors. Mathematically a Versor can be represented as:

In퐴 the = 푎 design1푎2푎3 …of 푎 RVTHE푛 we will have vectors as the number of keys and one vector as the data. For example, if then there would be 4 vectors of keys and 1 vector of data. 푛 − 1 푛 = 5

Table 2 - Key and data location in versors

Vectors Example 1 Example2 Key3 Data Key1 Key1 1 푎 Data Key4 2 푎 Key2 Key2 3 푎 Key4 Key3 4 푎 5 The location of each key푎 and the data are flexible, their locations are determined by the designer.

100

To reduce the generated cipher text, we must choose two term vectors.

5.1.1 RVTHE Encryption and Decryption

Each key is a random generated number that is converted into base 10. We divide each key and the data into two parts and use them as two terms (coefficients) for each vector.

5.1.2 Encryption of RVTHE

Once we design the keys and data locations. We perform a geometric product operation from the first two vectors and that will generate an intermediate ciphertext. Next, we perform geometric product operation between the intermediate cyphertext and the next vector, repeat this calculation for each vector. This will generate cipher text.

( 1)

퐸 (푑1 ) = 푠1푠2푠3푑1 … 푠푛

From above are keys and is data and E is encryption.

푠1푠2푠3 … 푠푛 푑1 5.1.3 Decryption of RVTHE

For the decryption process finding the inverse of the key vectors is critical. First, we perform a geometric product operation between the cipher text and the inverse of the key, that will generate an intermediate ciphertext. Next, we perform a geometric product operation between the intermediate cyphertext and the next vector inverse, repeat this calculation for each vector. This will generate the plain text.

101

( 2) −1 −1 −1 −1 1 3 2 1 1 2 3 1 푛 푛 퐷 (푐 ) = 푠 푠 푠 푠 푠 푠 푑 … 푠 푠 ( 3) −1 −1 −1 −1 1 3 2 1 1 푛 퐷 (From푐 ) = above 푠 푠 푠 푐 … 푠 inverse vectors for keys and is data −1 −1 −1 −1 and D is decryption.푠3 푠2 푠1 … 푠푛 푠1푠2푠3 … 푠푛 푐1

In our implementation we chose to use three vectors, two vectors for keys ( and one vector for data . 푠1푠2)

(푑1) 5.2 Mathematical Implementation of RVTHE Using Versors

In the versors example from section 3.3, while using the vector inverse, we derived the vector ‘b’ value. If we chose to present the same example in terms of encryption methods then the from the math become in the scheme RVTHE, and they

1 1 2 represent 푎, the 푏, andfirst csecret key, the data value,푠 and, 푑 ,theand second 푠 secret key, respectfully.

RVTHE’s mathematical representation choosing two secret keys and one data is shown in the format of . In other words, we chose only three vectors in our implementation. 푠1푑1푠2 푎1, 푎2, 푎3

Encryption method is represented as ‘E’ and Decryption method represented as ‘D’.

Assume

secret key 푠 1 = 푎

1 data 푑 = 푏 secretAnd key assigning 푠2 = c these values, where are the vectors.

1 2 푒 and 푒

푎 = (2푒1 + 3푒2)

102

푏 = (4푒1 + 5푒2)

푐 =Then(3푒1 + 4푒2)

For Encryption of which E(

푑1 푑1) = 푎푏푐

1 2 1 2 1 2 1 2 푎푏 = (2푒 + 3푒 ) · (4푒 + 5푒 ) + (2푒 + 3푒 ) ∧ (4푒 + 5푒 )

푎푏 = 23 − 2푒12

12 1 2 푎푏푐 = (23 − 2푒 )(3푒 + 4푒 )

푎푏푐For = 61Decryption푒1 + 98푒 to2 derive

1 D 푑 b −1 −1 1 1 (E(푑 )) = D(푎푏푐) = 푎푎 푏푐 푐 = 푑 = 3푒1+4푒2 푏 = (61푒1 + 98푒2) ( 25 )

2푒1+3푒2 푏 = 13 (23 − 2푒12)

1 1 2 푏 = 13 ( (46 + 6. )푒 + (69 – 4)푒 )

푏This = Encryption 4푒1 + 5푒2 implementation is based on Versors providing a new way to utilize the

Geometric Product of Algebra.

5.3 Homomorphism of RVTHE

In this section I will show the properties of Homomorphism of RVTHE.

Homomorphism will have addition, subtraction, multiplication and division properties

[79].

103

5.3.1 Addition

We represent

푑푎푡푎 1 (푑1 ) , 푑푎푡푎 2 (푑2 ) ,

Prove푠푒푐푟푒푡 the 푘푒푦 following 1 (푠1 ) and 푠푒푐푟푒푡 푘푒푦 2 (푠2 )

E(

푑1 + 푑2) = E(푑1) + E(푑2) Example:

When

푑1 = 8 and, 푑2 = 6 then 푑1 + and 푑2 = 16

푠Applying1 = (2푒1 regular+ 3푒2) geometric푑1 = (4푒 1product+ 4푒2 ) 푠2 = (3푒1 + 4푒2)

e̅i∙e̅i=1 e̅i∙e̅j=0 e̅j∙e̅i=0 e̅j∙e̅j=1

푉̅1푉̅2 = [(a1b1) e̅⏞1 ∙ e̅1 + (a1b2) e̅⏞1 ∙ e̅2 + (a2b1) e̅⏞2 ∙ e̅1 + (a2b2) e̅⏞2 ∙ e̅2] ⏟ dot product

e̅i∧e̅i=0 e̅j∧e̅i=−e̅i∧e̅j e̅j∧e̅j 1 1 ⏞1 1 1 2 1 2 2 1 ⏞2 1 2 2 ⏞2 2 + [(a b ) e̅ ∧ e̅ + (a b )e̅ ∧wedge e̅ + product(a b ) e̅ ∧ e̅ + (a b ) e̅ ∧ e̅ ] ⏟

Then Encryption of

E(

1 1 1 2 E(푑 ) = 푠 푑 푠

1 1 2 1 2 1 2 E(푑 ) = ((2푒 + 3푒 )( 4푒 + 4푒 )(3푒 + 4푒 ))

Then푑1) Encryption = 44푒1 + of92 푒2

104

E(

2 1 2 2 E(푑 ) = 푠 푑 푠

2 1 2 1 2 1 2 E(푑 ) = ((2푒 + 3푒 )( 3푒 + 3푒 )(3푒 + 4푒 ))

2 1 2 E(푑 ) = 33푒 + 69푒

1 2 1 2 1 2 1 2 E(푑 + 푑 ) = ((2푒 + 3푒 )( 7푒 + 7푒 )(3푒 + 4푒 ))

1 2 1 2 E(푑 + + 푑E() = 77푒 + 161푒

It proves푑1) E(푑2) = 77푒1 + 161푒2

1 2 1 2 5.3.2 Subtraction푑 + 푑 ) = E(푑 ) + E(푑 )

E(

Example:푑1 − 푑2 ) = E(푑1) − E(푑2)

When

푑1 = 8 and 푑2 = 6 then 푑1 − 푑2 = 2

1 1 2 푠 = (2푒 + 3푒 )

푑1 = (4푒1 + 4푒2)

푠Then2 = (Encryption3푒1 + 4푒2) of

E(

E(푑1) = 44푒1 + 92푒2

2 1 2 E(푑 ) = 33푒 + 69푒

1 2 1 2 1 2 1 2 E(푑 − 푑 ) = ((2푒 + 3푒 )( 푒 + 푒 )(3푒 + 4푒 ))

1 2 1 2 E(푑 − 푑 E() = 11푒 + 23푒

It proves푑1) − E(푑2) = 11푒1 + 23푒2

푑1 − 푑2) = E(푑1) − E(푑2)

105

5.3.3 Multiplication

In vectors we have scalar multiplication.

푑1 = 8 and 푠푐푎푙푎푟 푟1 = 2 then E(푟1푑1) = 푟1E(푑1) , and Then Encryption of 푠 1 = (2푒1 + 3푒2) 푑1 = (4푒1 + 4푒2) 푠2 = (3푒1 + 4푒2) E(

1 1 1 2 1 2 1 2 E(푟 푑 ) = ((2푒 + 3푒 )( 8푒 + 8푒 )(3푒 + 4푒 ))

E(푟1푑1) = 88푒1 + 184 푒2

1 1 2 푑 ) = 44푒 + 92푒

푟This1E(푑 proves1)) = E( 88푒1 + 184푒2 for scalar multiplication.

1 1 1 1 5.3.4 Division 푟 푑 ) = 푟 E(푑 ))

In vectors we have scalar division.

1 푑1 = 8 and 푠푐푎푙푎푟 푟1 = then E(푟1푑1) = 푟1E(푑1) 2 , and Then Encryption of 푠 1 = (2푒1 + 3푒2) 푑1 = (4푒1 + 4푒2) 푠2 = (3푒1 + 4푒2) E(

1 1 1 2 1 2 1 2 E(푟 푑 ) = ((2푒 + 3푒 )( 2푒 + 2푒 )(3푒 + 4푒 ))

E(푟1푑1) = 22푒1 + 46푒 2

1 1 2 푑 ) = 44푒 + 92푒

푟This1E(푑 proves1)) = E( 22푒1 + 46푒2 for scalar division.

Design of RVTHE푟1푑1 )depends = 푟1E(푑 on1)) Versors. Mathematically a Versor can be represented as:

106

퐴 = 푎1푎2푎3 … 푎푛 5.4 Security of RVTHE

There is a need to make sure this encryption method is good enough in terms of security. The design of RVTHE depends on versors (two-dimensional vectors), Geometric

Product, and inverse.

5.4.1 RVTHE Key Strength

Versors represents the multiple vectors’ geometric product and holds the properties of vectors in vector space. Selecting multiple vectors with smaller dimensions and performing a Geometric Product on them, results in an intractable vector in vector product space with two dimensions.

Suppose we have a key with 128 bits of random data, this can still be vulnerable.

Ideally, real randomness would require a hacker to implement each of the 128 combinations. But the probability may require less iterations to match the original key.2

We designed the keys as two coefficients of two vectors. This is very important to consider in security.

Having a ‘b’ bit keys for RVTHE, the number of attacks ‘A’ until finding the key is:

– Key strength of RVTHE. 풃ퟐ 풃ퟐ 퐀 = ퟐ . ퟐ The dimensions of vectors contribute an extra layer of security. This can be accomplished by using simple mathematical manipulations on known information. The

107 security of RVTHE is derived from applying mathematical manipulations on known- plaintext and known-ciphertext and try to derive keys.

5.4.2 High Level Evaluation

There are various attacks that can be performed by attackers. I evaluate RVTHE in two major type of attacks to show designing the encryption cipher of RVTHE is very secure.

Ciphertext-Only:

Assume that an attacker has access to the cipher-text produced by RVTHE and nothing else. In such cases, it is not possible to find the plaintext or secret key by using mathematical and statistical operations. I will show you a high-level evaluation of this.

I represent

Example:푑푎푡푎 1 (푑1 ) , 푑푎푡푎 2 (푑2 ) , 푠푒푐푟푒푡 푘푒푦 1 (푠1 ) and 푠푒푐푟푒푡 푘푒푦 2 (푠2 )

When , , and 푑1 = 8 and 푑2 = 6 then 푑1 + 푑2 = 14 푠 1 = (2푒1 + 3푒2) 푑1 = (4푒1 + 4푒2) 푑2 = (3푒1 + 3푒2) 푠2 = (3푒1 + 4푒2) Applying regular geometric product

e̅i∙e̅i=1 e̅i∙e̅j=0 e̅j∙e̅i=0 e̅j∙e̅j=1

푉̅1푉̅2 = [(a1b1) e̅⏞1 ∙ e̅1 + (a1b2) e̅⏞1 ∙ e̅2 + (a2b1) e̅⏞2 ∙ e̅1 + (a2b2) e̅⏞2 ∙ e̅2] ⏟ dot product

e̅i∧e̅i=0 e̅j∧e̅i=−e̅i∧e̅j e̅j∧e̅j 1 1 ⏞1 1 1 2 1 2 2 1 ⏞2 1 2 2 ⏞2 2 + [(a b ) e̅ ∧ e̅ + (a b )e̅ ∧ e̅wedge+ ( producta b ) e̅ ∧ e̅ + (a b ) e̅ ∧ e̅ ] ⏟ Then Encryption of

108

E(

1 1 1 2 E(푑 ) = 푠 푑 푠

1 1 2 1 2 1 2 E(푑 ) = ((2푒 + 3푒 )( =4푒 + 4푒 )(3푒 + 4푒 ))

Then푑1) Encryption = 44푒1 + of92 푒2 퐶1

E(

2 1 2 2 E(푑 ) = 푠 푑 푠

2 1 2 1 2 1 2 E(푑 ) = ((2푒 + 3푒 )( =3푒 + 3푒 )(3푒 + 4푒 ))

Cipher-text푑2) = 33 푒1 and+ 69 푒2 퐶2 by by while applying statistical methods.푐1 It is푐 very2 is produced hard to evaluate 푑푎푡푎 because 1 (푑1 the ) , 푑푎푡푎cipher 2 texts(푑2 ) are stored with two dimensional. Even applying statistical and mathematical operations such as additions and subtractions, I do not see way to derive the keys.

+

퐶1 퐶2 = 77푒1 + 161푒2

Known-Plaintext:퐶1 − 퐶2 = 11푒 1 + 23푒2

In this case, if an attacker has some of the plaintext/ciphertext pairs, then they use them to derive the key. This is called a Known-Plaintext attack. I will demonstrate this using statistical methods and mathematical operations manipulation to show I can derive the keys.

I represent

Example: 푑푎푡푎 1 (푑1 ) , 푑푎푡푎 2 (푑2 ) , 푠푒푐푟푒푡 푘푒푦 1 (푠1 ) and 푠푒푐푟푒푡 푘푒푦 2 (푠2 )

When , , and 푑1 = 8 and 푑2 = 6 then 푑1 + 푑2 = 14 푠Applying1 = (2푒1 regular+ 3푒2) geometric푑1 = (4푒 1product+ 4푒2 ) 푑2 = (3푒1 + 3푒2) 푠2 = (3푒1 + 4푒2)

109

e̅i∙e̅i=1 e̅i∙e̅j=0 e̅j∙e̅i=0 e̅j∙e̅j=1

푉̅1푉̅2 = [(a1b1) e̅⏞1 ∙ e̅1 + (a1b2) e̅⏞1 ∙ e̅2 + (a2b1) e̅⏞2 ∙ e̅1 + (a2b2) e̅⏞2 ∙ e̅2] ⏟ dot product

e̅i∧e̅i=0 e̅j∧e̅i=−e̅i∧e̅j e̅j∧e̅j 1 1 ⏞1 1 1 2 1 2 2 1 ⏞2 1 2 2 ⏞2 2 + [(a b ) e̅ ∧ e̅ + (a b )e̅ ∧ e̅wedge+ ( producta b ) e̅ ∧ e̅ + (a b ) e̅ ∧ e̅ ] ⏟ Then Encryption of

E(

1 1 1 2 E(푑 ) = 푠 푑 푠

1 1 2 1 2 1 2 E(푑 ) = ((2푒 + 3푒 )( =4푒 + 4푒 )(3푒 + 4푒 ))

Then푑1) Encryption = 44푒1 + of92 푒2 퐶1

E(

2 1 2 2 E(푑 ) = 푠 푑 푠

2 1 2 1 2 1 2 E(푑 ) = ((2푒 + 3푒 )( =3푒 + 3푒 )(3푒 + 4푒 ))

Cipher-text푑2) = 33푒1 + 69푒2 퐶2

Performing 퐶the1 푓푟표푚 following 푑푎푡푎 operations 1 (푑1 ) , 퐶on2 푓푟표푚 ciphertexts. 푑푎푡푎 2 (푑2 )

+

퐶1 퐶2 = 77푒1 + 161푒2

퐶while1 − 퐶applying2 = 11푒 1statistical+ 23푒2 method is very hard to evaluate as I have two keys while designing RVTHE. Even applying statistical methods and mathematical operations such as additions and subtractions, I do not able to derive the secret keys because math is implemented with two keys and two-dimensional vectors. There is a pattern 11,33,44, and

77 but no way I can guess plaintext or keys.

110

CHAPTER 6

IMPLEMENTATION AND EVALUATION OF

RVTHE

In this chapter I will discuss how I converted RVTHE into an executable program. This section deeply discusses the implementation of RVTHE into various applications and compares it with AES-Crypt encryption method in terms of speed of encryption and decryption performance. I evaluated the RVTHE security at high level by analyzing mathematical operations and performing statistical evaluations on cipher-texts.

I evaluated encryption, decryption and the ability to update/append real time files without decrypting and re-encrypting on the RVTHE scheme. I ran these evaluations on a cloud system provided by one of the leading cloud providers Amazon (AWS EC2).

6.1 Implementation of RVTHE

AES-Crypt is one of most widely known methods for encrypting individual real-time files. It offers a high speed and security. I developed the same executable program as one package for both encryption, decryption and append. So, I created an executable crypto program in ‘C’ language based on the RVTHE scheme like AES-Crypt program. I executed that on real-time files for encryption, decryption, and appending new data to the end of the encrypted file without decrypting the original ciphertext. The design of the cipher considering the design principles and key recommendations.

111

• Chose three vectors. First vector was secret key 1, second vector was data value , and

third vector as secret key 2.

• Converted the key and data as base 10 for computations.

• Divided the key and data by two and used them as vector coefficients. If the data or key

was an odd number, then we subtracted 1 from the data and divided them into two equal

coefficients and added 1 to the second coefficient.

• Designed the program to use various key sizes and file size to use 4k bytes at a time to

compute cipher text.

• With these design options we successfully implemented the RVTHE technique.

• We converted RVTHE crypto technique into the same executable model as AES Crypt

(a program that encrypts files using the AES method) using C language. Finally, we

executed this program using real time files in a Cloud environment (AWS EC2) that

uses SSD storage. I used the following command to run encryption and decryption.

AES-Crypt:

• time aescrypt -e -p key plaintext_file_name

• time aescrypt -d -p key plaintext_file_name.aes

RVTHE:

• time xlg -e -x key1 -y key2 plaintext_file_name

• time xlg -d -x key1 -y key2 plaintext_file_name.xlg

• time xlg -a -x key1 -y key2 “data” plaintext_file_name.xlg

In the above commands, the ‘-e’ indicates encryption, ‘-d’ indicates decryption, and ‘- a’ indicates append.

112

When we used a 512-bit key for the AES-Crypt program we then split that key into two

256-bit keys for RVTHE (xlg) program. I did this for all key sizes starting from 64 to 1028- bit key sizes. For evaluation, evaluate I chose 256-bit key size.

6.2 Experimental Systems

Our evaluations have been conducted on a 64-bit Amazon EC2 virtual machine SSD instance. I chose an instance type of t2-micro, which has 1 vCPU, 1GB Memory, and 8GB maximum storage. I specifically selected a VM with SSD storage, because SSD has high performance and has become an industry standard for cloud computing.

AES-Crypt is one of most widely known methods for encrypting individual real-time files. It offers a high speed and security. I choose it to develop baseline statistics on speed and output file size after encryption. It is also using an AES algorithm and it is a symmetric method like RVTHE. I compared them in the context of encryption speeds, decryption speeds and encrypted file size (disk storage used by cipher text).

In addition, I will also explain the additional security and efficiency benefits that are unique to a homomorphic encryption method.

6.3 Experimental Evaluations

I ran both executables (AES-Crypt and RVTHE) on various key sizes and files sizes.

The key sizes were 64, 128, 256, 512, and 1024-bit and file sizes were 1MB, 10MB, and

100MB. From that we measured the speed for encryption and decryption plus the storage size of the encrypted file on the cloud server.

113

6.3.1 Time measurements on various key sizes

Key Size and Time in sec on Regular SSD for 100MB file 15

10

5 Time Time Sec in

0 64 128 256 512 1024 Key Sizes in Bits AES-Crypt Ecnryption RVTHE Encryption AES Crypt Decryption RVTHE Decryption

Graph 19 - Key size Vs Encryption/Decryption time in Sec Graph 19 show the key size versus time to create encryption and decryption using

RVTHE and AES-Crypt on regular SSD. For Encryption I, did not observe a sizeable increase in time for any key size for either encryption method. Across the board, RVTHE required less time for encryption than AES-Crypt. For decryption the fastest method was

RVTHE at 64-bit. However, the decryption proves for the RVTHE method took longer as the cipher-text size got bigger. The most commonly used key size is 256-bit; for that both encryption method speeds are almost same.

Note: Having homomorphic features as in RVTHE means that full file decryption should be rare. In other words, file updates do not require the full file to be decrypted.

114

6.3.2 Time measurements on various file sizes

Key Size and Time in sec on Regular SSD

6

4

2 Time Sec Time in

0 1 10 100 File size in MB AES-Crypt Ecnryption RVTHE Encryption AES Crypt Decryption RVTHE Decryption

Graph 20 - File size and Encryption/Decryption times I chose 256-bit key size for all tests. Encryption and decryption for both RVTHE and

AES-Crypt methods took more time with larger file sizes.

Key Size and Time in sec on Regular SSD

0.08 0.07 0.06 0.05 0.04 0.03

Time Time Sec in 0.02 0.01 0 1 10 100 File size in MB AES-Crypt Ecnryption Rate RVTHE Encryption Rate AES Crypt Decryption Rate RVTHE Decryption Rate

Graph 21-Key Size and Time on Regular SSD

In Graph 20 and Graph 21 showed across the board, RVTHE required less time for encryption than AES-Crypt. Also, RVTHE performed at about the same rate regardless of

115 the file sizes for encryption. As with encryption, the RVTHE decryption process performed at about the same regardless of file sizes.

6.3.3 Size measurements on Encrypted Files

Encrypted file size in MB on Encrypted Volume

300 in SSD

200

100

Size Size of file the in MB 0 64 128 256 512 1024 Key Size AES-Crypt xlg-Crypt

Graph 22 - Encrypted file sizes in MB Graph 22 shows the output file generated from encryption process is always the double the of original file for RVTHE whereas AES-Crypt has only 10% penalty.

When you decrypt a 1GB file using AES-Crypt you need an extra 1.2GB of space. In case of full file decryption, you need 2.2GB of extra space for RVTHE needed but it will be rare as it allows computation on cipher-text. Security Evaluation of RVTHE

116

CHAPTER 7

LESSONS LEARNED AND FUTURE WORK

7.1 Challenges and Lessons Learned

This section will present the research flow, some of the challenges and issues that were faced during the study of encryption methods and how I overcame those challenges.

1. The basis of new research expands on previous work and improves it by eliminating

shortcomings and risks in the same domain. When I began working on my research,

I first educated myself about SSDs using surveys or previous literature. To further

understand these SSDs, I did various benchmarks to calculate its performance on

AWS. After these tests, I was able to prove that modern SSD’s perform better under

random workloads than previous SSDs. Because there weren’t any deficiencies

found to investigate, it was difficult to find a problem for my research. I felt lost,

so I took a step back and started to think about what area of research I wanted to

pursue, and I started researching existing literature related to SSD security.

2. I deleted all the data from local SSD drive and was able to recreate the file using

freely available recovery software. That itself gave me the first step to investigate

the sanitization of SSD. I demonstrated that SSD has limitations about sanitization.

This gave me some hope to start finding a problem case which is very critical for

any research. I read previous research papers related to Cloud security and Storage

security. Most of the previous work in the domain of security on Storage and Cloud

was using encryption as primary method to protect the data. All this proved that we

117

can’t leave plain text on the SSD. Still I was not sure whether my findings were

enough to pursue this as a research problem for further study.

3. I thought about finding the various encryption methods for Cloud Storage. Did

research and read about various encryption methods and their weaknesses. While I

am reading about these methods, I learned that there is some overhead for

encrypting the data in the Cloud. I conveyed this to my advisor, and he provided

needed confidence and motivation to do research in this area. With that support, I

started my research vigorously.

4. I started investigating encryption methods and running them in the Cloud to see

how much performance overhead and security issues existed. It was harder to

choose what encryption software should be used for evaluating performance

benchmarking. Over few months, I chose some encryption methods to perform

performance benchmarking. I found that performance decreases between 20 -50%

performance tradeoffs, when we use encryption software. Also, I found that hidden

folders exist for encrypted containers. Now I knew that there are issues related to

encryption methods. My next step was expanding my knowledge in encryption

methods. I met my Professor and he suggested that I should consider homomorphic

encryption. I started reading about homomorphic encryption method from

previously written literature. Homomorphic encryption allows computation on

encrypted files without decrypting and my thought was that this method would

incur less overhead. At this time, I felt that I was one step closer and confident

about my direction. I had a conversation with my advisor, and he explained about

118

multi-vectors can be used to achieve encryption. With that, I knew my next steps,

which was to learn mathematics in the area of geometric algebra.

5. Mathematical knowledge is needed to create a secure and efficient cipher. My last

work in math was about 25 years ago. Trying to understand what my advisor was

telling me about this new primitive of multi-vectors was not easy, to say the least.

For many months, I started studying geometric algebra to gain knowledge in that

area. Finding an implementation of this math as an encryption cipher was

challenging. By this time, the implementation of cipher using multi-vectors was

completed by another researcher for his master’s thesis. I read his thesis paper and

found and converted his design into executable program. This multi-vector

approach was creating a ciphertext file which was eight times of the original

plaintext file. This big ciphertext file increased the time of encryption and

decryption and this method was going to be hard to be used on bigger files. I had

conversation with my advisor about the results and he asked me to consider using

compression techniques. I experimented the program on storage with deduplication

enabled thinking that this might reduce the output file size, but the speeds of

encryption and decryption were very low. I learned that compression or any type

of deduplication will add overhead on the top of the encryption. My main challenge

was to improve the multi-vector-based encryption. I was thinking of this challenge

all the time. It took many months and I was getting discouraged, but I decided to

keep on learning. One night around 3:00AM, I woke up with an idea on how to

decrease the size of the encrypted file using versors. I proved the math by hand and

make sure it was feasible. Immediately, I put these thoughts on the paper with all

119

homomorphic properties and sent an email to my advisor. I learned to keep pushing

myself.

6. Once I found out that I can decrease the encrypted file size, it was a challenge to

design the RVTHE because it implements a scalar and multi-vector as intermediate

products before generating the final output vector. I developed RVTHE into an

executable program in C language. It was challenging to find a comparable

encryption method. I found AES-Crypt, which is symmetric and uses AES

encryption, which was perfect match to compare RVTHE.

I enjoyed the process but at times it was very stressful. I experienced how joyful when we find a solution for a problem. This process made me grow into better person.

7.2 Future Work

In cloud computing, homomorphic encryption provides secure computing on encrypted data. It is an encryption method that allows the users to compute in the Cloud, without converting the ciphertext into plaintext. In recent years, there is lot of research and interest in the domain of homomorphic encryption; most of them focus on asymmetric homomorphic encryptions. Very little research has happened for symmetric homomorphic encryption. Some applications can use symmetric homomorphic encryption very well. We proposed a very simple cryptographic primitive which had low time for encryption and decryption.

RVTHE encryption method is a symmetric homomorphic encryption which supports addition, subtraction, scalar multiplication, and scalar division. RVTHE is developed using

Clifford Geometric Algebra as a foundation. It uses Vectors, Versors, and the Inverse of a

120

Vector. RVTHE showed promising preliminary results that it is feasible to apply on real files. The converted RVTHE algorithm was implemented into a program and executed on various file types and sizes. I also added an appending feature for the program. A comparison was conducted between AES-Crypt and RVTHE in terms of time to encrypt, decrypt and generated output file size.

RVTHE was designed with three vectors in the current work. In future research it can be extended to various designs including more vectors as secret keys in the algorithm. Then the performance of the algorithm can be calculated. Experimenting with new designs for encryption and decryption of various file sizes and file types is a great way to explore the

RVTHE. At this time, I added the code for addition, but it can be expanded to deletion, scalar multiplication, and scalar division. These additions to the program can be tested to see how that enhances the user experience computing on encrypted data. RVTHE can expand to various application, OS’s, and hardware. We can also explore using RVTHE in various applications and databases such as password stores. Also, leveraging multithreading for computation of encryption and decryption will improve the performance.

There is a possibility for exploring in the area of scalability and availability of algorithm. RVTHE has been implemented and applied on various types of files such as:

.txt, .doc, .pdf, .xlsx, and .jpeg. It worked as expected without any noise and holding integrity, but it will be helpful to expand this program for databases and other application updates like add, update and delete operations. It would be nice to see if this approach can

121 be used in various device level encryption systems and see if it is possible to expand to all types of devices including mobile and IoT.

When using Cloud computing, we can segregate the application and computation part.

The RVTHE encryption method can be utilized to perform heavy computation by outsourcing computation to the Cloud withholding security. I performed a high-level security analysis for RVTHE but doing a depth security analysis of RVTHE will enhance its security strength. Any encryption adds overhead on the performance of reads and writes on SSD.

The theory in Geometric Algebra is vast and there is always room for improvements to find various encryption methods through a deep learning in the area of geometric algebra. Geometric algebra can be studied in more depth along with new encryption algorithms and new ciphers like RVTHE.

Overall, the future work can be presented as the use of the RVTHE encryption method with different types of technologies and systems because RVTHE is a symmetric cipher like AES. We can include hardware systems and other features. In cloud computing, the homomorphic encryption provides secure computing. It does this by allowing users to compute in the Cloud without converting the cipher text into plain text. RVTHE (an implementation of homomorphic encryption) satisfies that requirement efficiently. We can explore using RVTHE in various applications and databases such as password stores.

Geometric algebra can be studied in more depth and new encryption algorithms/ciphers can still be created.

122

CHAPTER 8

CONTRIBUTIONS

This chapter provides the challenges of coming up with a new encryption method. Next,

I discussed about how I designed, implemented and calculated the performance and security strength of this new encryption cipher RVTHE.

The goal and purpose of this study is to explain what encryption can do for devices in terms of tradeoffs between performance and security. The main thought behind this thesis is that there is a way to have high security and performance without having to compromise either. This challenge prompted me to study how different types of encryptions, like Best

Crypt, Dm-crypt, and SED, can affect SSD security and performance. The above- mentioned encryption software has their own drawbacks. The study has proved these encryptions have performance differences for sequential, random reads and writes. Most enterprise workloads are random. However, little research has been done for various workloads like random read writes. With my experiments, I showed that random workload performs better on newer SSD storage systems. I then evaluated how modern SSDs handle random workloads when using encryption. Evaluating different workloads with many types of functions for different encryptions will produce various performance metrics. I went through selected methods (BestCrypt, Dm-crypt, and encrypted Elastic Block Store volume from Amazon) to analyze their strengths and weaknesses for SSDs evaluating both security and performance. This helped to came to conclusion that use of homomorphic encryption is best suited for each workload in the Cloud.

123

By applying fully homomorphic encryption, it is possible to achieve cyber security as

it allows a series of new computations on encrypted data. Technically, I can start a zero

bytes file or data and encrypt it and apply homomorphic encryption on that zero bytes file

and never expose or leave a data footprint on the disk. Rivest et al in [20] first mentioned

this idea and Gentry first proposed fully homomorphic encryption [2] using binary circuits

on encrypted data and performed basic mathematical operations. All other scholars inspired

by Gentry’s approach improved his scheme or contributed new approaches. His theoretical

approach of homomorphic encryption providing a new way to solve security encryption,

but his solution was not ready to be applied easily and thus was impractical.

RVTHE is a new homomorphic symmetric encryption scheme based on Clifford Geometric Algebra. The foundation for this encryption used mathematics extensions of versors, geometric product and inverse in the form of language. Geometric Algebra is a very critical part for developing its design and framework. The design of this new cipher is simple, but combining versors, geometric product, and inverse will generate a strong cipher. This is a very powerful and substantial cipher which fulfils the requirements to build a new cipher. These requirements include the security of the system defending from various attacks with smaller updates to cipher-text. In this work, I showed how to design the following design principles. Its application in real world showing the performance and mathematical approach of security defense towards attacks. I created a measurable benchmark to calculate encryption speeds. First, I did experiments to understand encryption performance penalties on SSD in the Cloud and then created an experimental environment to calculate RVTHE performance.

124

CONCLUSION

SSD contains the following data functions: read, write, erase, purging and securing.

These functions are processed differently than the same functions on an HDD. This study started showing SSD storage has performance differences for sequential reads, sequential writes, and random read writes. Most of the enterprise workloads are a mix of random reads and random writes. This research showed SSD has been changed to handle and perform better for random workloads over the years. In SSD, writing, purging, and securing functions are drastically different. Previous research has shown some data is nearly impossible to completely delete on some SSD. Research showed that deleted data from an

SSD can be restored via recovery(block-level) software. This ultimately prompted me to study different types of encryptions. These encryptions include TrueCrypt, DiskCryptor,

BestCrypt, Dm-crypt, VeraCrypt, tomb, BitLocker, and SED. Each of the encryptions was studied to see their weaknesses and strengths. I selected two software-based encryption methods (BestCrypt, Dm-crypt) and selected encrypted volumes (Elastic Block Store volume from Amazon) for further study to analyze their strengths and weaknesses for SSDs security and performance. I chose sequential read, sequential write, sequential mixed, random read, random write, random mixed for different blocks sizes

(4k,8k,16k,32k,64k,128k). After evaluating different workloads with different block sizes and percentage of read write ratios for BestCrypt, Dm-crypt encryptions and encrypted

Elastic Block Store volume from Amazon produced various performance metrics. This research presented IOPS (Input/output Operations Per Second) performance metrics to show how each of encryption methods impacted different workloads. Results proved that

125 an encryption can have 20-50% performance decrease on SSD for TrueCrypt and Best

Crypt software. The results showed how modern encryption software methods impact storage devices such as SSDs in the Cloud. This proved that traditional symmetric encryption has high performance penalties on workloads.

Any existing symmetric encryption software utilizes the systems resources like memory and CPU when they encrypt and decrypt, which causes delays in operations.

Securing the data involves two stages: data at rest and data while transiting. “Data at rest” is the stage before or after sending data to the Cloud. “Data while transiting” is the stage between sending the data between the Client and the Cloud. The further study of encryption methods leads to homomorphic encryption approaches. So, we can’t ignore the possibilities and potential of homomorphic cryptography in cloud computing environments. Simple homomorphic encryption methods can be made feasible in cloud computing without sacrificing the security and enhancing the user experience while performing the operations as need on encrypted data. I conducted a study of previous existing homomorphic encryption literature and found that most of them are asymmetric and are very slow. There is no homomorphic encryption which is faster and can easily implemented on real systems. RVTHE is the solution to this problem.

Using properties from Clifford Geometric Algebra including Versors, Vectors and

Inverse of Vector, it is possible to design a homomorphic cipher that has simple structure, versality, flexibility of key assignments, and a great speed that rivals previous approaches.

In conclusion, homomorphic encryption provides secure cloud computing. It does this by allowing users to compute in the cloud without converting the cipher text into plain text.

126

RVTHE (an implementation of homomorphic encryption) satisfies this requirement efficiently.

In this work, I developed a Reduced Vector Technique Homomorphic Encryption

(RVTHE) and it is a symmetric and somewhat homomorphic encryption. RVTHE was developed based on using Versors and Clifford Geometric Algebra properties. The evaluation of our implementation shows I can edit/append a file in .001 sec. In the case of full file encryption, RVTHE is 75% faster on encryption and 25% slower on decryption compared with encryption software ‘AES-Crypt’. RVTHE generated ciphertext size that was reduced to 25% from previous approaches, which used multi-vectors and Clifford

Geometric Algebra; RVTHE has the potential to be used on real workloads. It is a great success as it is faster, more efficient and only takes twice the size for cipher text.

127

REFERENCES

[1] E. Aïmeur and D. Schőnfeld, "The ultimate invasion of privacy: Identity theft," in 2011 Ninth Annual International Conference on Privacy, Security and Trust, 2011.

[2] C. Gentry, "Fully Homomorphic Encryption Using Ideal Lattices," in Proceedings of the Forty-first Annual ACM Symposium on Theory of Computing, New York, NY, USA, 2009.

[3] C. Gentry and S. Halevi, "Implementing Gentryś Fully-homomorphic Encryption Scheme," in Proceedings of the 30th Annual International Conference on Theory and Applications of Cryptographic Techniques: Advances in Cryptology, Berlin, 2011.

[4] O. Dictionaries, "Definition of security," [Online]. Available: https://en.oxforddictionaries.com/definition/security.

[5] R. Kissel, R. Kissel, R. Blank and A. Secretary, "Glossary of key terms," in NIST Interagency Reports NIST IR 7298 Revision 1, National Institute of Standards and Technology, 2011.

[6] N. Ferguson, B. Schneier and T. Kohno, Cryptography Engineering: Design Principles and Practical Applications, Wiley Publishing, 2010.

[7] S. Mauw and M. Oostdijk, "Foundations of Attack Trees," in Proceedings of the 8th International Conference on Information Security and Cryptology, Berlin, 2006.

[8] C. E. Shannon, "Communication theory of secrecy systems," The Bell System Technical Journal, vol. 28, pp. 656-715, Oct 1949.

[9] "Intro-Samsung Elec. Datasheet (K9LBG08U0M).," 2007.

[10] J.-U. Kang, J.-S. Kim, C. Park, H. Park and J. Lee, "A Multi-channel Architecture for High-performance NAND Flash-based Storage System," J. Syst. Archit., vol. 53, pp. 644-658, sep 2007.

[11] R. Micheloni, A. Marelli and K. Eshghi, Inside Solid State Drives (SSDs), Springer Publishing Company, Incorporated, 2012.

[12] B. Bosen, "Full Drive Encryption with Samsung Solid State Drives," nov 2010.

128

[13] P. Wang, G. Sun, S. Jiang, J. Ouyang, S. Lin, C. Zhang and J. Cong, "An Efficient Design and Implementation of LSM-tree Based Key-value Store on Open-channel SSD," in Proceedings of the Ninth European Conference on Computer Systems, New York, NY, USA,, 2014.

[14] D. E. Denning and P. J. Denning, "Data Security," ACM Comput. Surv., vol. 11, pp. 227-249, 9 1979.

[15] M. Tebaa, S. E. Hajji and A. E. Ghazi, "Homomorphic encryption method applied to Cloud Computing," in 2012 National Days of Network Security and Systems, 2012.

[16] S. I. M. O. N. SINGH, The code book : the science of secrecy from ancient Egypt to , NEW YORK : ANCHOR BOOKS, 2000.

[17] J. Nechvatal, E. Barker, L. Bassham, W. Burr and M. Dworkin, "Report on the development of the Advanced Encryption Standard (AES)," 2000.

[18] N. I. of Standards and T. (NIST), "FIPS Publication 46-2: Data Encryption Standard," 1993.

[19] J. Nechvatal, E. Barker, D. Dodson, M. Dworkin, J. Foti and E. Roback, "Status report on the first round of the development of the Advanced Encryption Standard," Journal of Research of the National Institute of Standards and Technology, vol. 104, 1999.

[20] R. L. Rivest, L. Adleman and M. L. Dertouzos, "On Data Banks and Privacy Homomorphisms," Foundations of Secure Computation, Academia Press, pp. 169-179, 1978.

[21] L. N. Childs, A Concrete Introduction to Higher Algebra, Volume1, Springer, 1979.

[22] S. Burris and H. P. Sankappanavar, A Course in Universal Algebra-With 36 Illustrations, 2006.

[23] A. Acar, H. Aksu, A. S. Uluagac and M. Conti, "A Survey on Homomorphic Encryption Schemes: Theory and Implementation," CoRR, vol. abs/1704.03578, 2017.

[24] A. López-Alt, E. Tromer and V. Vaikuntanathan, "On-the-fly Multiparty Computation on the Cloud via Multikey Fully Homomorphic Encryption," in Proceedings of the Forty-fourth Annual ACM Symposium on Theory of Computing, New York, NY, USA, 2012.

129

[25] M. Tebaa and S. E. Hajji, "Secure Cloud Computing through Homomorphic Encryption," CoRR, vol. abs/1409.0829, 2014.

[26] C. Moore, M. OŃeill, E. OŚullivan, Y. Doröz and B. Sunar, "Practical homomorphic encryption: A survey," in 2014 IEEE International Symposium on Circuits and Systems (ISCAS), 2014.

[27] B. Schneier, Applied Cryptography (2Nd Ed.): Protocols, Algorithms, and Source Code in C, New York, NY, USA,: John Wiley & Sons, Inc., 1995.

[28] K. Zhao, W. Zhao, H. Sun, X. Zhang, N. Zheng and T. Zhang, "LDPC-in-SSD: Making Advanced Error Correction Codes Work Effectively in Solid State Drives," in Presented as part of the 11th USENIX Conference on File and Storage Technologies (FAST 13), San, 2013.

[29] P. Huang, P. Subedi, X. He, S. He and K. Zhou, "FlexECC: Partially Relaxing ECC of MLC SSD for Better Cache Performance," in Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference, Berkeley, 2014.

[30] M. Wei, L. M. Grupp, F. E. Spada and S. Swanson, "Reliably Erasing Data from Flash-based Solid State Drives," in Proceedings of the 9th USENIX Conference on File and Stroage Technologies, Berkeley, 2011.

[31] J. Reardon, S. Capkun and D. Basin, "Data Node Encrypted File System: Efficient Secure Deletion for Flash Memory," in Proceedings of the 21st USENIX Conference on Security Symposium, Berkeley, 2012.

[32] Y. Choi, D. Lee, W. Jeon and D. Won, "Password-based Single-file Encryption and Secure Data Deletion for Solid-state Drive," in Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication, New York, NY, USA,, 2014.

[33] N. I. of Standards and Technology, FIPS PUB 46-3: Data Encryption Standard (DES), pub-NIST:adr,: pub-NIST, 1999.

[34] K. Bhargavan and G. Leurent, "On the Practical (In-)Security of 64-bit Block Ciphers: Collision Attacks on HTTP over TLS and OpenVPN," in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA,, 2016.

[35] M. A. Wright, "Feature: The Advanced Encryption Standard," Netw. Secur., vol. 2001, pp. 11-13, oct 2001.

130

[36] N. Ferguson, J. Kelsey, S. Lucks, B. Schneier, M. Stay, D. Wagner and D. Whiting, "Improved Cryptanalysis of Rijndael," in Proceedings of the 7th International Workshop on Fast Software Encryption, London, 2001.

[37] A. Biryukov, O. Dunkelman, N. Keller, D. Khovratovich and A. Shamir, Key Recovery Attacks of Practical Complexity on AES Variants With Up To 10 Rounds, 2009.

[38] B. Schneier, "Description of a New Variable-Length Key, 64-bit Block Cipher (Blowfish)," in Fast Software Encryption, Cambridge Security Workshop, London, 1994.

[39] A. Biryukov and D. Wagner, Slide Attacks, L. Knudsen, Ed., Berlin, Heidelber: Springer Berlin Heidelberg, 1999, pp. 245-259.

[40] B. Schneier, J. Kelsey, D. Whiting, D. Wagner and C. Hall, "On the Twofish Key Schedule," in Proceedings of the Selected Areas in Cryptography, London, 1999.

[41] N. Ferguson, J. Kelsey, B. Schneier and D. Whiting, "A Twofish Retreat: Related- Key Attacks Against Reduced-Round Twofish," 2000.

[42] J. J. G. Ortiz and K. J. Compton, "A Simple Power Analysis Attack on the Twofish Key Schedule," CoRR, vol. abs/1611.07109, 2016.

[43] R. Anderson, E. Biham and L. Knudsen, Serpent: A Proposal for the Advanced Encryption Standard, 1998.

[44] User:Dake commonswiki, "File:Serpent-linearfunction.png," 2005. [Online]. Available: https://commons.wikimedia.org/wiki/File:Serpent-linearfunction.png.

[45] M. Hermelin, J. Y. Cho and K. Nyberg, "Multidimensional of Reduced Round Serpent," in Proceedings of the 13th Australasian Conference on Information Security and Privacy, Berlin, 2008.

[46] J. Rizzo and T. Duong, "Practical Padding Oracle Attacks," in Proceedings of the 4th USENIX Conference on Offensive Technologies, Berkeley, 2010.

[47] M. Liskov, R. L. Rivest and D. Wagner, "Tweakable Block Ciphers," in Proceedings of the 22Nd Annual International Cryptology Conference on Advances in Cryptology, London, 2002.

[48] L. Martin, "XTS: A Mode of AES for Encrypting Hard Disks," IEEE Security and Privacy, vol. 8, pp. 68-69, may 2010.

131

[49] D. A. McGrew and J. Viega, "The Security and Performance of the Galois/Counter Mode (GCM) of Operation," in Proceedings of the 5th International Conference on Cryptology in India, Berlin, 2004.

[50] Dm-crypt, "Dm-crypt," [Online]. Available: https://wiki.archlinux.org/index.php/dm-crypt/Device_encryption. [Accessed 10 12 2016].

[51] C. Fruhwirth, "LUKS- Wikipedia," [Online]. Available: https://en.wikipedia.org/wiki/Linux_Unified_Key_Setup. [Accessed 2018].

[52] L. s. weakness, "https://thehackernews.com/2016/11/hacking-linux-system.html," https://thehackernews.com/2016/11/hacking-linux-system.html. [Online].

[53] d.-c. plausible-deniability, "https://blog.linuxbrujo.net/posts/plausible- deniability-with-luks/," https://blog.linuxbrujo.net/posts/plausible-deniability- with-luks/. [Online].

[54] M. Bauer, "Paranoid Penguin: BestCrypt: Cross-platform Filesystem Encryption," Linux J., vol. 2002, pp. 9--, jun 2002.

[55] B. Daniel and K. Fowler, "Bypassing Self-Encrypting Drives (SED) in Enterprise Environments," Europe,,, 2015.

[56] packetizer, "AES Crypt or AES-Crypt," 2018. [Online]. Available: https://www.aescrypt.com.

[57] C. Gentry, "A fully homomorphic encryption scheme," 2009.

[58] W. Wang, Y. Hu, L. Chen, X. Huang and B. Sunar, "Accelerating fully homomorphic encryption using GPU," in 2012 IEEE Conference on High Performance Extreme Computing, 2012.

[59] J. Vince, Geometric Algebra: An Algebraic System for Computer Games and Animation, 1st ed., Springer Publishing Company, Incorporated, 2009.

[60] D. Davis, R. Ihaka and P. Fenstermacher, Cryptographic Randomness from Air Turbulence in Disk Drives, Y. G. Desmedt, Ed., Berlin, Heidelber: Springer Berlin Heidelberg, 1994, pp. 114-120.

[61] J. Kim, J. M. Kim, S. H. Noh, S. L. Min and Y. Cho, "A Space-efficient Flash Translation Layer for CompactFlash Systems," IEEE Trans. on Consum. Electron., vol. 48, pp. 366-375, may 2002.

132

[62] R. Micheloni, A. Marelli and R. Ravasio, Error Correction Codes for Non-Volatile Memories, 1st ed., Springer Publishing Company, Incorporated, 2010.

[63] J. H. Stathis, "Reliability Limits for the Gate Insulator in CMOS Technology," IBM J. Res. Dev., vol. 46, pp. 265-286, mar 2002.

[64] P. Olivo, T. N. Nguyen and B. Ricco, "High-field-induced degradation in ultra- thin SiO2 films," IEEE Transactions on Electron Devices, vol. 35, pp. 2259-2267, dec 1988.

[65] N. Agrawal, V. Prabhakaran, T. Wobber, J. D. Davis, M. Manasse and R. Panigrahy, "Design Tradeoffs for SSD Performance," in USENIX 2008 Annual Technical Conference, Berkeley, 2008.

[66] A. Birrell, M. Isard, C. Thacker and T. Wobber, "A Design for High-performance Flash Disks," New York, NY, USA,, 2007.

[67] F. Chen, D. A. Koufaty and X. Zhang, "Understanding Intrinsic Characteristics and System Implications of Flash Memory Based Solid State Drives," in Proceedings of the Eleventh International Joint Conference on Measurement and Modeling of Computer Systems, New York, NY, USA,, 2009.

[68] A. Gupta, Y. Kim and B. Urgaonkar, "DFTL: A Flash Translation Layer Employing Demand-based Selective Caching of Page-level Address Mappings," in Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, New York, NY, USA,, 2009.

[69] D. Park, B. Debnath and D. H. C. Du, "A Workload-Aware Adaptive Hybrid Flash Translation Layer with an Efficient Caching Strategy," in 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems, 2011.

[70] P. Thontirawong, M. Ekpanyapong and P. Chongstitvatana, "SCFTL: An efficient caching strategy for page-level flash translation layer," in 2014 International Computer Science and Engineering Conference (ICSEC), 2014.

[71] A. Gupta, R. Pisolkar, B. Urgaonkar and A. Sivasubramaniam, "Leveraging Value Locality in Optimizing NAND Flash-based SSDs," in Proceedings of the 9th USENIX Conference on File and Stroage Technologies, Berkeley, 2011.

[72] F. Chen, T. Luo and X. Zhang, "CAFTL: A Content-aware Flash Translation Layer Enhancing the Lifespan of Flash Memory Based Solid State Drives," in Proceedings of the 9th USENIX Conference on File and Stroage Technologies, Berkeley, 2011.

133

[73] P. Huang, G. Wu, X. He and W. Xiao, "An Aggressive Worn-out Flash Block Management Scheme to Alleviate SSD Performance Degradation," in Proceedings of the Ninth European Conference on Computer Systems, New York, NY, USA,, 2014.

[74] L.-P. Chang, "On Efficient Wear Leveling for Large-scale Flash-memory Storage Systems," in Proceedings of the 2007 ACM Symposium on Applied Computing, New York, NY, USA,, 2007.

[75] Y. Hu, H. Jiang, D. Feng, L. Tian, H. Luo and C. Ren, "Exploring and Exploiting the Multilevel Parallelism Inside SSDs for Improved Performance and Endurance," IEEE Transactions on Computers, vol. 62, pp. 1141-1155, jun 2013.

[76] Y. Kim, A. Gupta, B. Urgaonkar, P. Berman and A. Sivasubramaniam, "HybridStore: A Cost-Efficient, High-Performance Storage System Combining SSDs and HDDs," in 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems, 2011.

[77] D. Shue and M. J. Freedman, "From Application Requests to Virtual IOPs: Provisioned Key-value Storage with Libra," in Proceedings of the Ninth European Conference on Computer Systems, New York, NY, USA,, 2014.

[78] D. W. H. A. D. A. Silva, "Fully Homomorphic Encryption over exterior product spaces," 2017.

[79] F. Armknecht, C. Boyd, C. Carr, K. Gjøsteen, A. Jäschke, C. A. Reuter and M. Strand, A Guide to Fully Homomorphic Encryption, 2015.

[80] S. S. W. Jr, Cryptanalysis of number theoretic ciphers, CRC Press, 2002.

[81] C. Swenson, Modern cryptanalysis: techniques for advanced code breaking., John Wiley & Sons, 2008.

[82] J. Yi-ming and L. Sheng-li, "The Analysis of Security Weakness in BitLocker Technology," in Proceedings of the 2010 Second International Conference on Networks Security, Wireless Communications and Trusted Computing - Volume 01, Washington, 2010.

[83] J. Suter, Geometric Algebra Primer, 2013.

[84] K.-D. Suh, B.-H. Suh, Y.-H. Lim, J.-K. Kim, Y.-J. Choi, Y.-N. Koh, S.-S. Lee, S.-C. Kwon, B.-S. Choi, J.-S. Yum and others, "A 3.3 V 32 Mb NAND flash memory with incremental step pulse programming scheme," IEEE Journal of Solid-State Circuits, vol. 30, pp. 1149-1156, 1995.

134

[85] D. Stehlé, Floating-Point LLL: Theoretical and Practical Aspects, Springer, 2010, pp. 179-213.

[86] D. Stehlé and R. Steinfeld, "Faster Fully Homomorphic Encryption," {IACR} Cryptology ePrint Archive, vol. 2010, p. 299, 2010.

[87] D. Stehlé and R. Steinfeld, "Faster Fully Homomorphic Encryption," in ASIACRYPT, 2010.

[88] R. Snyder, "Some Security Alternatives for Encrypting Information on Storage Devices," in Proceedings of the 3rd Annual Conference on Information Security Curriculum Development, New York, NY, USA,, 2006.

[89] B. Schneier, Secrets & Lies: Digital Security in a Networked World, 1st ed., New York, NY, USA: John Wiley & Sons, Inc., 2000.

[90] V. Rijmen and B. Preneel, "Improved Characteristics for Differential Cryptanalysis of Hash Functions Based on Block Ciphers," in Fast Software Encryption: Second International Workshop. Leuven, Belgium, 14-16 December 1994, Proceedings, 1994.

[91] N. Palaniswamy, D. M. Dipesh, J. N. D. Kumar and S. G. Raaja, "Notice of Violation of IEEE Publication Principles Enhanced Blowfish algorithm using bitmap image pixel plotting for security improvisation," in 2010 2nd International Conference on Education Technology and Computer, 2010.

[92] E. OŚullivan and F. Regazzoni, "Efficient Arithmetic for Lattice-based Cryptography: Special Session Paper," in Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion, New York, NY, USA, 2017.

[93] R. Olsson, Performance differences in encryption software versus storage devices, 2012, p. 36.

[94] D. Mittal, D. Kaur and A. Aggarwal, "Secure Data Mining in Cloud Using Homomorphic Encryption," in 2014 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM), 2014.

[95] K. Minematsu, "Improved Security Analysis of XEX and LRW Modes," in Proceedings of the 13th International Conference on Selected Areas in Cryptography, Berlin, 2007.

[96] D. N. G. C. R. Micheloni, VLSI-Design of Non-Volatile Memories, New York,,: (Springer), 2005.

135

[97] D. Micciancio, The Geometry of Lattice Cryptography, A. Aldini and R. Gorrieri, Eds., Berlin, Heidelber: Springer Berlin Heidelberg, 2011, pp. 185-210.

[98] L. Martin, "XTS: A Mode of AES for Encrypting Hard Disks," IEEE Security Privacy, vol. 8, pp. 68-69, may 2010.

[99] J.-D. Lee, S.-H. Hur and J.-D. Choi, "Effects of floating-gate interference on NAND flash memory cell operation," IEEE Electron Device Letters, vol. 23, pp. 264-266, may 2002.

[100] S. K. Lai, J. Lee and V. K. Dham, "Electrical properties of nitrided-oxide systems for use in gate dielectrics and EEPROM," in 1983 International Electron Devices Meeting, 1983.

[101] D. Kahng and S. M. Sze, "A floating gate and its application to memory devices," The Bell System Technical Journal, vol. 46, pp. 1288-1295, jul 1967.

[102] C. Gentry, S. Halevi and N. P. Smart, "Homomorphic Evaluation of the AES Circuit," in Proceedings of the 32Nd Annual Cryptology Conference on Advances in Cryptology --- CRYPTO 2012 - Volume 7417, New York, NY, USA, 2012.

[103] N. Ferguson, J. Kelsey, S. Lucks, B. Schneier, M. Stay, D. Wagner and D. Whiting, "Improved Cryptanalysis of Rijndael," in Proceedings of the 7th International Workshop on Fast Software Encryption, London, 2001.

[104] Y. Doröz, J. Hoffstein, J. Pipher, J. H. Silverman, B. Sunar, W. Whyte and Z. Zhang, "Fully Homomorphic Encryption from the Finite Field Isomorphism Problem," {IACR} Cryptology ePrint Archive, vol. 2017, p. 548, 2017.

[105] Diskcryptor, "Diskcryptor," 2011. [Online]. Available: https://diskcryptor.net/wiki/Main_Page.

[106] W. Dai, Y. Doröz, Y. Polyakov, K. Rohloff, H. Sajjadpour, E. Savas and B. Sunar, "Implementation and Evaluation of a Lattice-Based Key-Policy ABE Scheme," {IEEE} Trans. Information Forensics and Security, vol. 13, pp. 1169-1184, 2018.

[107] A. Czeskis, D. J. S. Hilaire, K. Koscher, S. D. Gribble, T. Kohno and B. Schneier, "Defeating Encrypted and Deniable File Systems: TrueCrypt V5.1a and the Case of the Tattling OS and Applications," Berkeley, 2008.

[108] J. H. Cheon and D. Stehlé, "Fully Homomorphic Encryption over the Integers Revisited," {IACR} Cryptology ePrint Archive, vol. 2016, p. 837, 2016.

[109] J. H. Cheon and D. Stehlé, "Fully Homomophic Encryption over the Integers Revisited," in EUROCRYPT (1), 2015.

136

[110] K. K. Chauhan, A. K. S. Sanger and A. Verma, "Homomorphic Encryption for Data Security in Cloud Computing," in 2015 International Conference on Information Technology (ICIT), 2015.

[111] N. Chan, M. F. Beug, R. Knoefler, T. Mueller, T. Melde, M. Ackermann, S. Riedel, M. Specht, C. Ludwig and A. T. Tilke, "Metal control gate for sub-30nm floating gate NAND memory," in 2008 9th Annual Non-Volatile Memory Technology Symposium (NVMTS), 2008.

[112] A. Chakraborti, C. Chen and R. Sion, "POSTER: DataLair: A Storage Block Device with Plausible Deniability," in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA,, 2016.

[113] Z. Brakerski, C. Gentry and V. Vaikuntanathan, "(Leveled) Fully Homomorphic Encryption Without Bootstrapping," in Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, New York, NY, USA, 2012.

[114] E. Biham, O. Dunkelman and N. Keller, "The Rectangle Attack - Rectangling the Serpent," in Advances in Cryptology – Proceedings of EUROCRYPT 2001, LNCS 2045, 2001.

[115] E. Biham, "New Types of Cryptanalytic Attacks Using Related Keys," in Advances in Cryptology --- Eurocrypt'93, Berl, 1994.

[116] D. Benarroch, Z. Brakerski and T. Lepoint, "FHE over the Integers: Decomposed and Batched in the Post-Quantum Regime," in Proceedings, Part II, of the 20th IACR International Conference on Public-Key Cryptography --- PKC 2017 - Volume 10175, New York, NY, USA, 2017.

137

Appendix A – Cloud Storage SSD

In this Appendix, I will guide you through the steps that create various types of VMs with various types of SSD in Amazon Cloud. Login into Amazon Cloud and select EC2.

Launch instance and select instance type select i2.xlarge and t2.micro. Both are SSD storage VMs.

1. Visit Amazon Cloud Services EC2 website at

https://us-west-2.console.aws.amazon.com/console.

2. Create a VM following instruction from Amazon.

In the first evaluation I compared between these two types of VMS and proved storage optimized VMs sure will have better performance. First, I created two type of VMs in

AWS. Instance type i2.xlarge follows:

Instance type t2.micro follows

138

3. Installed FIO benchmark tool.

root@ip-172-31-17-80: /home/ubuntu/Desktop#wget http://brick.kernel.dk/snaps/fio-2.1.10.tar.gz . root@ip-172-31-17-80:/home/ubuntu/Desktop# gunzip fio-2.1.10.tar.gz root@ip-172-31-17-80:/home/ubuntu/Desktop# tar -xf fio-2.1.10.tar

Run the following command to calculate benchmarks for performance

fio --filename=/dmcrypt/4krandreadwrite6040j8 --direct=1 --rw=randrw --size=1024m --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio --bs=4k --rwmixread=60 --iodepth=8 --numjobs=8 --runtime=60 --group_reporting --name=4krandreadwrite60j8 --output=/home/output/4kdmcryptrandreadwrite60j8

Sample Generated output:

Used this output gathered IOPS information for 4k to 1024kb block sizes for 1GB files.

Also calculated the time for sequential and random read writes. I used this performance metrics to understand SSD characteristics in terms of performance in the Cloud.

139

Appendix B – Cloud Storage and Encryptions

In this Appendix, I evaluated regular SSD, hardware encrypted SSD and software encryption creating container performance running the FIO benchmarks to understand the performance of the penalties of encryption software in Cloud environment.

1. Visit Amazon Cloud Services EC2 website at

https://us-west-2.console.aws.amazon.com/console.

All the VM types are t2.micro (Variable ECUs, 1 vCPU, 2.5 GHz, Intel Xeon Family,

1 GiB memory, EBS only). Ubuntu Server 16.04 LTS (HVM), SSD Volume Type - ami- efd0428f

2. Create two VMs following instruction from Amazon.

Instance type t2.micro with regular SSD

3. Created Instance type t2.micro with encrypted SSD.

140

4. Installed an encryption Best crypt to 3GB volume and Dm-Crypt software on 3GB

volume on one of the t2.micro regular SSD. root@ip-172-31-17-80:yum install gcc kernel-devel kernel-headers dkms root@ip-172-31-17-80: wget -O /etc/yum.repos.d/bestcrypt.repo https://www.jetico.com/packages/el/bestcrypt.repo root@ip-172-31-17-80: yum install bestcrypt bestcrypt-panel root@ip-172-31-17-80: bctool new /root/BestCrypt -a Rijndael -s 3gb -d password root@ip-172-31-17-80: bctool format /root/BestCrypt -t ext3 root@ip-172-31-17-80: Enter password: root@ip-172-31-17-80:/sys/block/xvda/queue# apt-get install cryptsetup Reading package lists... Done Building dependency tree Reading state information... Done cryptsetup is already the newest version. 0 upgraded, 0 newly installed, 0 to remove and 50 not upgraded. root@ip-172-31-17-80:/sys/block/xvda/queue# fallocate -l 2048M /root/dmcrypt root@ip-172-31-17-80:/sys/block/xvda/queue# cryptsetup -y luksFormat /root/dmcrypt

WARNING! ======This will overwrite data on /root/dmcrypt irrevocably.

Are you sure? (Type uppercase yes): y

141 root@ip-172-31-17-80:/sys/block/xvda/queue# cryptsetup -y luksFormat /root/dmcrypt

WARNING! ======This will overwrite data on /root/dmcrypt irrevocably.

Are you sure? (Type uppercase yes): yes root@ip-172-31-17-80:/sys/block/xvda/queue# cryptsetup -y luksFormat /root/dmcrypt

WARNING! ======This will overwrite data on /root/dmcrypt irrevocably.

Are you sure? (Type uppercase yes): YES Enter passphrase: Verify passphrase: Passphrases do not match. root@ip-172-31-17-80:/sys/block/xvda/queue# cryptsetup -y luksFormat /root/dmcrypt

WARNING! ======This will overwrite data on /root/dmcrypt irrevocably.

Are you sure? (Type uppercase yes): YES Enter passphrase: Verify passphrase: root@ip-172-31-17-80:/sys/block/xvda/queue# df -h Filesystem Size Used Avail Use% Mounted on udev 492M 12K 492M 1% /dev tmpfs 100M 384K 99M 1% /run /dev/xvda1 7.8G 3.7G 3.7G 50% / none 4.0K 0 4.0K 0% /sys/fs/cgroup none 5.0M 0 5.0M 0% /run/lock none 497M 68K 497M 1% /run/shm none 100M 8.0K 100M 1% /run/user root@ip-172-31-17-80:/sys/block/xvda/queue# cd /root root@ip-172-31-17-80:~# ls -las total 2097208 4 drwx------8 root root 4096 Apr 9 20:40 . 4 drwxr-xr-x 22 root root 4096 Apr 9 09:06 . 8 -rw------1 root root 6914 Apr 9 13:17 .bash_history 4 -rw-r--r-- 1 root root 3106 Feb 20 2014 .bashrc 4 drwxr-xr-x 3 root root 4096 Apr 9 10:11 BestCrypt

142

4 drwx------2 root root 4096 Apr 9 09:07 .cache 4 drwxr-xr-x 3 root root 4096 Apr 9 10:02 .config 4 drwx------3 root root 4096 Apr 9 10:02 .dbus 2097156 -rw-r--r-- 1 root root 2147483648 Apr 9 20:41 dmcrypt 4 drwxr-xr-x 2 ubuntu ubuntu 4096 Apr 9 13:09 plain 4 -rw-r--r-- 1 root root 140 Feb 20 2014 .profile 4 drwx------2 root root 4096 Apr 9 09:06 .ssh 4 -rw------1 root root 648 Apr 9 10:41 .viminfo root@ip-172-31-17-80:~# file /root/dmcrypt /root/dmcrypt: LUKS encrypted file, ver 1 [aes, xts-plain64, sha1] UUID: 5302390d- a47a-47cc-99a7-a846d164197c root@ip-172-31-17-80:~# cryptsetup luksOpen /root/dmcrypt dmcrypt Enter passphrase for /root/dmcrypt: root@ip-172-31-17-80:~# df -h Filesystem Size Used Avail Use% Mounted on udev 492M 12K 492M 1% /dev tmpfs 100M 388K 99M 1% /run /dev/xvda1 7.8G 3.7G 3.7G 50% / none 4.0K 0 4.0K 0% /sys/fs/cgroup none 5.0M 0 5.0M 0% /run/lock none 497M 68K 497M 1% /run/shm none 100M 8.0K 100M 1% /run/user root@ip-172-31-17-80:~# mkfs.ext4 -j /dev/mapper/dmcrypt mke2fs 1.42.9 (4-Feb-2014) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=0 blocks, Stripe width=0 blocks 131072 inodes, 523776 blocks 26188 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=536870912 16 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912

Allocating group tables: done Writing inode tables: done Creating journal (8192 blocks): done Writing superblocks and filesystem accounting information done root@ip-172-31-17-80:~# df -h Filesystem Size Used Avail Use% Mounted on

143 udev 492M 12K 492M 1% /dev tmpfs 100M 388K 99M 1% /run /dev/xvda1 7.8G 3.7G 3.7G 50% / none 4.0K 0 4.0K 0% /sys/fs/cgroup none 5.0M 0 5.0M 0% /run/lock none 497M 68K 497M 1% /run/shm none 100M 8.0K 100M 1% /run/user root@ip-172-31-17-80:~# mkdir dmcrypt mkdir: cannot create directory ‘dmcrypt’: File exists root@ip-172-31-17-80:~# pwd /root root@ip-172-31-17-80:~# cd / root@ip-172-31-17-80:/# mkdir dmcrypt root@ip-172-31-17-80:/# mount /dev/mapper/dmcrypt /dmcrypt root@ip-172-31-17-80:/# df -h Filesystem Size Used Avail Use% Mounted on udev 492M 12K 492M 1% /dev tmpfs 100M 388K 99M 1% /run /dev/xvda1 7.8G 3.7G 3.7G 50% / none 4.0K 0 4.0K 0% /sys/fs/cgroup none 5.0M 0 5.0M 0% /run/lock none 497M 68K 497M 1% /run/shm none 100M 8.0K 100M 1% /run/user /dev/mapper/dmcrypt 2.0G 3.0M 1.9G 1% /dmcrypt root@ip-172-31-17-80:/#

5. Run FIO benchmarks on these four types of VMs: SSD, Encrypted SSD, Dm-Crypt

container, and Bestcrypt container.

fio --filename=/dmcrypt/4krandreadwrite6040j8 --direct=1 --rw=randrw --size=1024m --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio --bs=4k --rwmixread=60 --iodepth=8 --numjobs=8 --runtime=60 --group_reporting --name=4krandreadwrite60j8 --output=/home/output/4kdmcryptrandreadwrite60j8

6. Sample Generated output:

144

Used this similar output gathered IOPS information for 4k to 1024kb block sizes for

1GB files. Also calculated the time for sequential and random read writes for all VMs.

I used this performance metrics to understand SSD characteristics versus encryption software performance penalties in the Cloud. It proved there is a performance overhead for software-based encryption versus regular or encrypted SSDs.

7. Hidden encrypted containers information.

145

In the above command df-h simply can be run by anyone and will show the encrypted container information may be security concern.

146

Appendix C – Multi-Vector Based Encryption

In this Appendix C demonstrate the implementation of multi-vector-based encryption.

After that I performed survey of Homomorphic encryption techniques. I converted Multi-

Vector based homomorphic encryption proposed by David Williams Honorio Araujo Da

Silva for his Masters’ thesis. Converted that into executable program and run it on

AWSVM on file level encryption. This is a symmetric encryption that’s why I chose AES-

Crypt symmetric encryption software to compare the results. File level encryption and compared the results. Written similar type of program as AES-Crypt.

Compilation:

1. Source Code is at

http://cs.uccs.edu/~gsc/pub/phd/stedla//src/MultiVector.zip

2. Download math libraries and install them in the VM.

sudo apt-get install libmath-mpfr-perl

3. Install AES-Crypt executable

wget https://www.aescrypt.com/download/v3/linux/AESCrypt-GUI-3.11-Linux- x86_64-Install.gz gunzip AESCrypt-GUI-3.11-Linux-x86_64-Install.gz chmod +x AESCrypt-GUI-3.11-Linux-x86_64-Install ./AESCrypt-GUI-3.11-Linux-x86_64-Install

4. Compile the code with following command.

gcc main.c xlg_compression.c xlm.c xlg.c xlg_massive_encryption.c -o xlg -lgmp -w

5. Using the following commands compared against AES-Crypt.

147

AES-Crypt:

• time aescrypt -e -p key plaintext_file_name

• time aescrypt -d -p key plaintext_file_name.aes

6. Using the following commands compared against AES-Crypt.

AES-Crypt:

• time aescrypt -e -p key plaintext_file_name

• time aescrypt -d -p key plaintext_file_name.aes

RVTHE:

• time xlg -e -x key1 -y key2 plaintext_file_name

• time xlg -d -x key1 -y key2 plaintext_file_name.xlg

• time xlg -a -x key1 -y key2 “data” plaintext_file_name.xlg

148

Appendix D – Demonstrate RVTHE

In this Appendix, we demonstrate design, implementation, performance and cipher size of RVTHE. First, I design RVTHE then I converted into similar executable program like

AES-Crypt and run it on AWS VMs on file level encryption. RVTHE and AES-Crypt both are symmetric encryptions and it is very much comparable to each other. file level encryption and compared the results. Written similar type of program as AES-Crypt.

Compilation:

1. Source Code is at

http://cs.uccs.edu/~gsc/pub/phd/stedla//src/RVTHE2.zip

2. Download math libraries and install them in the VM.

sudo apt-get install libmath-mpfr-perl

3. Install AES-Crypt executable

Wget https://www.aescrypt.com/download/v3/linux/AESCrypt-GUI-3.11-Linux- x86_64-Install.gz gunzip AESCrypt-GUI-3.11-Linux-x86_64-Install.gz chmod +x AESCrypt-GUI-3.11-Linux-x86_64-Install ./AESCrypt-GUI-3.11-Linux-x86_64-Install

4. Compile the code with following command.

gcc main.c xlg_compression.c xlm.c xlg.c xlg_massive_encryption.c -o xlg -lgmp -w

5. Using the following commands compared against AES-Crypt.

AES-Crypt:

• time aescrypt -e -p key plaintext_file_name

• time aescrypt -d -p key plaintext_file_name.aes

149

RVTHE:

• time xlg -e -x key1 -y key2 plaintext_file_name

• time xlg -d -x key1 -y key2 plaintext_file_name.xlg

• time xlg -a -x key1 -y key2 “data” plaintext_file_name.xlg

6. Sample Output of created encrypted file:

7. Cipher text size and contents:

150

8. Sample Performance Metrics:

9. Various Type Files and their encryption and decryption outputs:

.TXT: The below screenshot includes append option.

.JPEG:

151

.PDF:

. DOCX:

152

.XLSX:

.PPTX:

All the above worked with RVTHE encryption method with various file types producing 25% reduced cipher text than multi-vector based.

153

Appendix E – Acronym List

Abbreviation Term HDD Hard Disk Drive SATA Serial AT Attachment SSD Solid State Drive FDE Full-disk encryption AES Advanced Encryption Standard DES Data Encryption Standard TDEA Triple Data Encryption Algorithm RSA Rivest–Shamir–Adleman MD5 Message-digest algorithm SHA Secure Hash Algorithm CBC Cipher Block Chaining CTR Counter GCM Galois/Counter Mode OCB Offset Codebook Mode ECB Electronic Codebook OFB Output Feedback AWS Amazon Web Services NIST National Institute of Standards and Technology ESD Every Stage of Data FHE Fully Homomorphic Encryption RVTHE Reduced Vector Technique Homomorphic Encryption SSL Secure Sockets Layer UCCS University of Colorado, Colorado Springs