Master of Science in Engineering: Computer Security May 2019

Performance Evaluation and Comparison of Standard Cryptographic Algorithms and Chinese Cryptographic Algorithms

Louise Bergman Martinkauppi Qiuping He

Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology in partial fulfilment of the requirements for the degree of Master of Science in Engineering: Computer Security. The thesis is equivalent to 20 weeks of full time studies.

The authors declare that they are the sole authors of this thesis and that they have not used any sources other than those listed in the bibliography and identified as references. They further declare that they have not submitted this thesis at any other institution to obtain a degree.

Contact Information: Author(s): Louise Bergman Martinkauppi E-mail: [email protected]

Qiuping He E-mail: [email protected]

University advisor: Senior Lecturer Dragos Ilie Department of Computer Science and Engineering

Faculty of Computing Internet : www.bth.se Blekinge Institute of Technology Phone : +46 455 38 50 00 SE–371 79 Karlskrona, Sweden Fax : +46 455 38 50 57 Abstract

Background. China is regulating the import, export, sale, and use of technology in China. If any foreign company wants to develop or release a product in China, they need to report their use of any encryption technology to the Office of State Commercial Administration (OSCCA) to gain approval. SM2, SM3, and SM4 are cryptographic standards published by OSCCA and are authorized to be used in China. To comply with Chinese cryptography laws organizations and companies may have to replace standard cryptographic algorithms in their systems with Chinese cryptographic algorithms, such as SM2, SM3, and SM4. It is important to know beforehand how the replacement of algorithms will impact performance to determine future system costs. Objectives. Perform a theoretical study and performance comparison of the stan- dard cryptographic algorithms and Chinese Cryptographic algorithms. The standard cryptographic algorithms studied are RSA, ECDSA, SHA-256, and AES-128, and the Chinese cryptographic algorithms studied are SM2, SM3, and SM4. Methods. A literature analysis was conducted to gain knowledge and collect infor- mation about the selected cryptographic algorithms in order to make a theoretical comparison of the algorithms. An experiment was conducted to get measurements of how the algorithms perform and to be able to rate them. Results. The literature analysis provides a comparison that identifies design simi- larities and differences between the algorithms. The controlled experiment provides measurements of the metrics of the algorithms mentioned in objectives. Conclusions. The conclusions are that the algorithms SM2 and ECDSA have similar design and also similar performance. SM2 and RSA have funda- mentally different designs, and SM2 performs better than RSA when generating keys and signatures. When verifying signatures, RSA shows comparable performance in some cases and worse performance in other cases. Hash algorithms SM3 and SHA- 256 have many design similarities, but SHA-256 performs slightly better than SM3. AES-128 and SM4 have many similarities but also a few differences. In the controlled experiment, AES-128 outperforms SM4 with a significant margin.

Keywords: cryptography, performance, SM2, SM3, SM4

i

Sammanfattning

Bakgrund. Kina reglerar import, export, försäljning och användning av krypter- ingsteknologi i Kina. Om ett utländskt företag vill utveckla eller släppa en produkt i Kina måste de rapportera sin användning av krypteringsteknologi till Office of State Commercial Cryptography Administration (OSCCA) för godkännande. SM2, SM3 och SM4 är kryptografiska standarder som lagligt får används i Kina. Organisationer och företag kan behöva byta ut krypteringsalgoritmerna i sina system till kinesiska krypteringsalgoritmer för att uppfylla kraven för de kinesiska lagarna. Det är därför viktigt att i förväg veta hur ersättningen av algoritmer kommer att påverka prestan- dan för att utvärdera framtida kostnader för systemet. Syfte. Genomföra en teoretisk studie och prestanda jämförelse av standard krypter- ingsalgoritmer och kinesiska krypteringsalgoritmer. De standard krypteringsalgorit- merna är RSA, ECDSA, SHA-256 och AES-128. De kinesiska krypteringsalgorit- merna är SM2, SM3 och SM4. Metod. En litteraturanalys har genomförts för att få en bättre förståelse av de valda algoritmerna. Ett experiment har genomförts för att samla mätvärden av de bestämda parametrarna och för att sedan kunna ranka mätvärdena. Resultat. Litteraturanalysen gav en jämförelse som identifierar likheter och skill- nader mellan algoritmerna. Det kontrollerade experimentet gav mätvärden av parame- trarna för algoritmerna nämnda i syftet. Slutsatser. Slutsatserna är att de digitala signatur-algoritmerna SM2 och ECDSA har liknade design och också liknade prestanda. SM2 och RSA har fundamentala skillnader i deras design, och SM2 har bättre prestanda vid nyckelgenerering samt signaturgenerering. Vid verifiering av signaturer så visar RSA likvärdig prestanda i vissa fall och sämre prestanda i andra fall. Hashfunktionerna SM3 och SHA-256 har också många likheter i sin design, men SHA-256 presterar lite bättre än SM3. AES-128 och SM3 har många design likheter men också några skillnader. I det kontrollerade experimentet så presterar AES-128 bättre än SM4 med stor marginal.

Nyckelord: kryptering, prestanda, SM2, SM3, SM4

iii

Acknowledgments

Firstly, we would like to thank our supervisor Dragos Ilie for the support and guidance throughout our master thesis project. This thesis was supported by the Ericsson M- commerce department, which we thank for giving us the opportunity to do our master thesis with them. Here, we would like to thank our supervisor at Ericsson Mattias Liljeson for the interesting thesis subject and continuous feedback. We also give thanks to Alexander Mohlin for his guidance and assistance. Lastly, we would like to thank our manager Ulf Santesson, for all support and providing the resources.

v

Nomenclature

AES Advanced Encryption Standard CBC Cipher Block Chaining Mode CPU Central Processing Unit CRT Chinese Remainder Theorem CTR Counter Mode DES ECB Electronic Codebook Mode ECC Elliptic-Curve Cryptography ECDLP Elliptic Curve Discrete Logarithm Problem ECDSA Elliptic Curve Digital Signature Algorithm FIPS Federal Information Processing Standard IEEE Institute of Electrical and Electronics Engineers IFP Integer Factorization Problem ISO International Organization for Standardization NIST National Institute of Standards and Technology OF AT One-Factor-at-a-Time OSCCA Office of State Commercial Cryptography Administration PSS Probabilistic Signature Scheme RSA Rivest–Shamir–Adleman RSS Resident Set Size SCA State Cryptography Administration SHA Secure Hash Algorithm SPN Substitution-Permutation Network UFN Unbalanced Feistel Network

vii

Contents

Abstract i

Sammanfattning iii

Acknowledgments v

Nomenclature vii

1 Introduction 1 1.1 Motivation ...... 1 1.2 Aim, Objectives, and Research Questions ...... 2 1.3 Decisions ...... 3 1.4 Scope and Limitations ...... 5 1.5 Thesis Outline ...... 6

2 Related Work 7 2.1 SM2 ...... 7 2.2 SM3 ...... 8 2.3 SM4 ...... 8 2.4 Cryptographic Algorithm Comparison ...... 9 2.5 Knowledge Gap ...... 9

3 Background 11 3.1 Cryptography Law in China ...... 11 3.2 Symmetric and Asymmetric ...... 12 3.3 Confusion and Diffusion ...... 12 3.4 Elliptic Curve Cryptography ...... 13 3.5 Mode of Operation ...... 14 3.6 Algorithm Design ...... 16

4 Method 17 4.1 Literature Analysis ...... 17 4.1.1 Databases and Search Engines ...... 17 4.1.2 Procedures and Approaches ...... 17 4.1.3 Used Approach ...... 18 4.2 Controlled Experiment ...... 19 4.2.1 Libraries and Tools ...... 19 4.2.2 System Specification ...... 20

ix 4.2.3 Experiment Design ...... 20 4.2.4 Used Approach ...... 21 4.2.5 Distribution Analysis ...... 22 4.2.6 Mann-Whitney U test ...... 22 4.3 Validity ...... 23 4.3.1 Internal ...... 23 4.3.2 External ...... 24 4.3.3 Algorithm Implementations Verification ...... 24

5 Results 27 5.1 Literature Analysis ...... 27 5.1.1 Design Comparison of SM2, RSA, and ECDSA ...... 27 5.1.2 Design Comparison of SM3 and SHA-256 ...... 30 5.1.3 Design Comparison of SM4 and AES-128 ...... 32 5.2 Algorithm Results ...... 34 5.2.1 Digital Signature Results ...... 34 5.2.2 Hash Results ...... 38 5.2.3 Block Cipher results ...... 41 5.2.4 Relative Differences Between the Algorithms ...... 45 5.3 Distribution Analysis Results ...... 47 5.4 Mann-Whitney U Test Results ...... 50

6 Analysis and Discussion 51 6.1 Overall Performance Impact ...... 51 6.2 File size ...... 51 6.3 Distribution Analysis ...... 52 6.4 Performance ...... 52 6.4.1 Digital Signature Algorithms ...... 52 6.4.2 Hash Algorithms ...... 54 6.4.3 Block Cipher Algorithms ...... 54 6.5 Memory ...... 54

7 Conclusions and Future Work 57 7.1 Conclusion ...... 57 7.2 Future work ...... 59

References 61

A Algorithm Design 69 A.1 AES ...... 69 A.2 ECDSA ...... 75 A.3 RSA ...... 78 A.4 SHA-256 ...... 80 A.5 SM2 ...... 84 A.6 SM3 ...... 88 A.7 SM4 ...... 91

x B Mann Whitney U Test 95 B.1 Digital Signature ...... 95 B.2 ...... 98 B.3 Block Cipher ...... 99

xi

List of Figures

3.1 The ECB encryption and decryption. Figure adapted from figure 1 in [1]...... 14 3.2 The CBC encryption and decryption. Figure adapted from figure 2 in [1]...... 15 3.3 The CTR encryption and decryption. Figure adapted from figure 5 in [1]...... 16

5.1 Digital signature real-time in Botan and GmSSL...... 34 5.2 Digital signature CPU time in Botan and GmSSL...... 35 5.3 Digital signature CPU cycles in Botan and GmSSL...... 35 5.4 Digital signature RSS in Botan and GmSSL...... 36 5.5 Hash algorithms real-time in Botan and OpenSSL...... 38 5.6 Hash algorithms CPU time in Botan and OpenSSL...... 39 5.7 Hash algorithms CPU cycles in Botan and OpenSSL...... 39 5.8 Hash algorithms RSS in Botan and OpenSSL...... 40 5.9 Block Ciphers real-time graphs in Botan and OpenSSL...... 41 5.10 Block Ciphers CPU time graphs in Botan and OpenSSL...... 42 5.11 Block Ciphers CPU cycles graphs in Botan and OpenSSL...... 42 5.12 Block Ciphers RSS graphs in Botan and OpenSSL...... 43 5.13 Distribution type 1...... 47 5.14 Distribution type 2...... 47 5.15 Distribution type 3...... 48 5.16 Distribution type 4...... 48

A.1 The state array input and output. Figure adapted from figure 3 in [2]. 69 A.2 The AES encryption algorithm...... 70 A.3 The ShiftRow transformation. Figure adapted from figure 8 in [2]. . . 71 A.4 The MixColums transformation. Figure adapted from figure 3.6 in [3]. 72 A.5 The AddRoundKey transformation. Figure adapted from figure 3.8 in [3]...... 73 A.6 The ECDSA signature generation...... 76 A.7 The ECDSA signature verification...... 77 A.8 Encoding with PSS...... 79 A.9 The overall stucture of SHA-256...... 80 A.10 SM2 Digital Signature Generation Algorithm. Figure adapted from figure 1 in [4]...... 86 A.11 SM2 Digital Signature Verification Algorithm. Figure adapted from figure 2 in [4]...... 87

xiii A.12 SM3 algorithm. Figure adapted from figure 1 in [5]...... 88 A.13 SM4 encryption round i...... 93

xiv List of Tables

4.1 Used library and library version ...... 19 4.2 Specifications of experiment machine...... 20 4.3 Details of the OFAT experiment design...... 21

5.1 ECDSA, SM2, and RSA properties...... 28 5.2 Security level and required lengths [6]...... 28 5.3 Ranking of digital signature algorithms based on expected time and memory performance...... 29 5.4 SM3 and SHA-256 properties...... 30 5.5 Number of logical operations in SM3 and SHA-256’s compression func- tion...... 30 5.6 Number of assigments in SM3 and SHA-256’s compression function. . 31 5.7 Ranking of hash functions based on expected time and memory per- formance...... 31 5.8 SM4 and AES-128 properties...... 32 5.9 Ranking of block ciphers based on expected time and memory perfor- mance...... 33 5.10 Digital Signature results...... 37 5.11 Hash Functions result...... 40 5.12 Block cipher result...... 43 5.13 Table of the percentage change of each algorithm...... 46 5.14 Distribution types of data samples...... 49 5.15 A summarization of the Mann-Whitney U test results...... 50

A.1 Key-Round Combinations...... 70 A.2 AES S-box...... 71 A.3 AES inverse S-box...... 74 A.4 SM4 S-box...... 92

B.1 Explanation of Mann-Whitney U test table headers...... 95 B.2 Result of Digital Signature Mann Whitnet U test...... 96 B.3 Results of Hash Functions Mann-Whitney U Tests...... 98 B.4 Results of Block Cipher Real-time Mann-Whitney U Tests...... 99 B.5 Results of Block Cipher CPU time Mann-Whitney U Tests...... 100 B.6 Results of Block Cipher CPU cycles Mann-Whitney U Tests...... 101 B.7 Results of Block Cipher RSS Mann-Whitney U Tests...... 102

xv

Chapter 1 Introduction

This master thesis investigates the performance impact when replacing standard cryptographic algorithms with the Chinese cryptographic algorithms SM2, SM3, and SM4. This thesis project was done in collaboration with Ericsson. The following section will motivate the thesis project by describing the problem to be solved.

1.1 Motivation

The Chinese IT industry has grown dramatically over the past two decades [7] and the value of e-commerce in China accounts for more than 40% of all worldwide e- commerce today [8]. Because of this, many IT companies around the world are interested in the Chinese market. If a company wants to establish itself in China, they need to follow Chinese law. If any foreign company wants to develop or release a product in China, they need to report their use of any encryption technology to the Office of State Commercial Cryptography Administration (OSCCA) to obtain approval. Only OSCCA-approved products are sanctioned for use in China [9]. SM2, SM3, and SM4 are cryptographic standards published by OSCCA and are authorized to be used within China. SM2 is a set of public key encryption algorithms based on elliptic curve cryptography, which includes a digital signature algorithm, protocol, and a public key encryption algorithm. SM3 is a hash function used in commercial cryptography as an alternative to SHA-2. SM4 is a symmetric encryption algorithm used in the Chinese National Standard for Wireless LAN [10]. A problem arises in situations where an organization or a company must use Chinese encryption algorithms to fulfil the requirements of Chinese encryption tech- nology laws. The problem is that the stakeholders do not know how the replacement of algorithms will affect the performance of their product. The algorithm replace- ment can impact time performance, energy consumption, and memory utilization. In performance-critical products, it is essential to know how performance will be im- pacted by algorithm replacement since a low performance could render the product unusable, or more hardware may be required to ensure good performance. It is neces- sary to know beforehand how the replacement of algorithms will impact performance to determine future system costs. This problem will be solved by investigating the performance impact, in terms of execution time, CPU cycles, and resident set size, when replacing standard cryp- tographic algorithms with a corresponding Chinese cryptographic algorithm. The solution to this problem is valuable to anyone who wants to establish a product in

1 2 Chapter 1. Introduction

China or integrate their product with a Chinese system since it can be necessary to replace an existing cryptographic algorithm with a Chinese correspondence. Solving the problem mentioned above is good from an ethical aspect since it improves the understanding and knowledge about different cryptographic algorithms which are used to keep sensitive information confidential. A potential ethical issue may be that the SM algorithms and/or any of the standard cryptographic algorithms, similar to Dual_EC_DRBG algorithm [11], have been deliberately weakened in order to enable eavesdropping or tampering of data. However, a detailed security analysis is needed to determine if this is the case, which is not within the scope of this study.

1.2 Aim, Objectives, and Research Questions

The thesis aim is to investigate the performance impact when replacing standard cryptographic algorithms with corresponding Chinese cryptographic algorithms.

The overall aim of the project is broken down into the following objectives:

O1 Perform a theoretical study of the design of SM2, SM3, and SM4 and perform a comparison of the corresponding cryptographic standard algorithms ECDSA and RSA, SHA-256, and AES-128.

O2 Evaluate the performance of RSA, ECDSA, SHA-256, AES-128, SM2, SM3, and SM4 by conducting an experiment.

The research questions to be answered in this master thesis are listed below. The motivation behind the choice of algorithms, metrics and modes are explained in section 1.3.

RQ1 What are the design similarities and differences between RSA, ECDSA, SHA- 256, AES-128 with the corresponding Chinese cryptographic algorithms: SM2, SM3, and SM4?

RQ1.1 What conclusions can be drawn about the expected performance of the al- gorithms based on the outcome from RQ1?

RQ2 What is the performance impact, if any, in terms of key generation-, sign- ing, and verification time, CPU clock cycles, and memory utilization, when replacing RSA or ECDSA with SM2?

RQ3 What is the performance impact, if any, in terms of the speed of the hash calculation, CPU clock cycles, and memory utilization, when replacing SHA- 256 with SM3?

RQ4 What is the performance impact, if any, in terms of encryption- and decryption time, CPU clock cycles, and memory utilization, when replacing AES-128 with SM4 using block cipher modes ECB, CBC, and CTR? 1.3. Decisions 3

1.3 Decisions This thesis project compares three types of standard cryptography algorithms, with a corresponding type of Chinese cryptographic algorithm. The types of algorithms and their algorithms are listed below.

• Block Cipher: AES-NI-128 - AES-128 - SM4

• Public Key : RSA - ECDSA - SM2

• Hash Function: SHA-256 - SM3

The algorithms listed above were chosen with considerations of the following condi- tions:

• The cryptography libraries OpenSSL/GmSSL and Botan support the algo- rithms mentioned above. A description of the cryptography libraries are ex- plained in section Library below.

• The cryptography libraries OpenSSL/GmSSL and Botan support the Block Cipher modes: ECB, CBC, and CTR.

Metrics The metrics measured in this thesis are: Real-time, CPU time, CPU cycles, and Resident set size (RSS). Time is selected as a metric since it is essential to know how fast the algorithm can process. In performance-critical systems, it is important to know the algorithm run time since a low performance could render a product unusable or more hardware is required. This thesis measures real-time and CPU time. CPU cycles are a measurement of how many CPU clock cycles are required for the algorithms’ executions. CPU cycles were chosen as a metric since it indicates energy consumption for a process. The CPU cycle measurement is also convenient for calculating the estimated CPU time for a similar processor with another clock rate than the one used in the experiments. Resident set size (RSS) shows how much memory a process is currently using in main memory (RAM). RSS can be a critical metric for smaller computing devices, such as cellular phones, smart cards, and embedded systems, that have constrained computational power, memory and battery life resources.

Library The libraries used for this thesis are: OpenSSL, GmSSL, and Botan. OpenSSL is an open-source software library for applications that serve as a toolkit for TLS/SSL. OpenSSL supports many different cryptographic algorithms, such as AES, Blowfish, MD5, SHA-1, SHA-2, RSA, ECDSA, SM2, SM3, SM4, and more. GmSSL is an open source cryptographic library that supports SM2, SM3, SM4 and other Chinese cryptographic algorithms. GmSSL is a fork of OpenSSL. The reason for using GmSSL, in addition to OpenSSL, is that GmSSL has better support 4 Chapter 1. Introduction for digital signatures and is easier to configure using the command line interface compared to OpenSSL. Botan is a cryptographic library written in C++. The library provides a wide variety of cryptographic algorithms. Botan was chosen as a second library to mea- sure algorithm performance since it supports SM2, SM3, and SM4. Additionally, by examining more than one library, more information is obtained to reason if potential differences in performance are due to the algorithms themself or their implementa- tions.

Block Ciphers Algorithms SM4 is a block cipher with a block length of 128 bits. The algorithm is a Chinese National Standard and is used in the industry for Wireless LAN WAPI. AES-128 is a block cipher algorithm. AES-128 is comparable to SM4 since both AES-128 and SM4 are block ciphers and have block length of 128 bits. AES-128 is used widely in many applications such as financial transactions, e-commerce, and social media apps [12]. Most modern Intel processors support Intel Advanced Encryption Standard (AES) New Instructions (AES-NI). Intel® AES-NI are instructions designed to accelerate the performance of the AES algorithm by 3-10 times over a complete software im- plementation [13]. It is likely that a system runs on a device where AES-NI instructions are available. If SM4 is used to replace AES in such a system, it is relevant to know how performance will be impacted. Therefore, this thesis covers performance comparison between SM4 and AES-128, both with AES-NI instructions enabled and disabled.

Modes Block ciphers use different modes to provide information security, such as confiden- tially or authenticity. The mode describes how the block cipher should process the message in encryption or decryption, and different modes have different features. Five modes that can provide confidentially according to NIST are Electronic Code Book (ECB), Cipher Block Chaining (CBC), Cipher Feedback (CFB), Output Feedback (OFB), and Counter (CTR) [1]. The thesis focuses on the ECB, CBC and CTR mode. The reason for choosing ECB is that ECB is the base of all block cipher modes, even if ECB is considered unsafe, the mode is still a good base compared with other modes. Modes such as OFB and CFB are relatively identical to CBC, but these modes are an improvement over CBC. However, block ciphers using CBC mode may still exist in legacy systems, it is therefore interesting to investigate the performance using CBC mode. CTR is the most effective and secure one compared to the modes mentioned above. Therefore, it will be interesting to compare the performance of these three modes. However, for memory utilization, only the ECB mode of AES-128, SM4, and AES-128-NI will be measured. The reason for choosing ECB to measure memory utilization is that ECB is the base of all block ciphers mode. Memory utilization of CBC and CTR modes is not measured due to time constraints within the master thesis project. 1.4. Scope and Limitations 5

Public Key Cryptosystem Public key cryptosystems are typically not used to encrypt data. Instead, its main use is in hybrid systems, where they are used to generate and transfer a session key at which point the system switches to symmetric encryption. Another main use of public key cryptosystems is digital signatures. Due to the size and scope of this master thesis project, it was decided to only focus on the digital signature algorithm. SM2 is a public key cryptosystem approved by China, and the algorithm is based on elliptic curves. SM2 can be used for digital signature, key exchange, encryption and decryption. RSA and ECDSA were chosen to be compared with SM2 since they are commonly used for digital signatures. Both SM2 and ECDSA are based on elliptic curves, while RSA is based on the integer factorization problem. To ensure a fair comparison between algorithms the curve chosen for ECDSA is NIST P-256 since it has the same prime length as the SM2 curve, and thus provides the same security level. For RSA a key length of 3072 was chosen since it then provides the same security level as SM2 [6].

Hash Functions For the hash functions, this thesis covers a comparison of SM3 and SHA-256. The reason for choosing SHA-256 is that SHA-256 is a commonly used hash algorithm [14] that outputs a 256-bit message digest, just like SM3.

1.4 Scope and Limitations

The limitation in the thesis project are:

• The databases that are used to find relevant papers are limited to the databases connected to BTH Summon and Google Scholar. • Only papers written in English are reviewed. • The performance measurements of cryptographic algorithms are limited to real- time, CPU time, CPU clock cycles, and memory utilization in terms of resident set size (RSS). • For the block cipher algorithms, the RSS is only measured when using AES-128, AES-NI-128, and SM4 in ECB mode. • The algorithm implementations tested are limited to those in OpenSS- L/GmSSL and Botan. To measure performance in OpenSSL/GmSSL the com- mand line interface (CLI) is used. To measure performance in Botan the Botan cryptography library is linked to C++ programs.

• The performance testing is limited to the operating system Ubuntu 16.04 LTS and one set of hardware. The hardware is described in section 4.2.2. 6 Chapter 1. Introduction

1.5 Thesis Outline The remainder of this thesis report is structured as follows. Chapter 2 describes related works and states the research gap. Chapter 3 contains background informa- tion. Chapter 4 describes the research methods used. Chapter 5 presents the results and chapter 6 contains an analysis and discussion of the results. Finally, chapter 7 summarizes the conclusions drawn and answers the research questions, as well as describing possible future work. Chapter 2 Related Work

This chapter presents the related work of this thesis area, and the articles were selected based on the title and keywords. The following related research presents either the algorithm design, performance evaluation, or performance comparison of an algorithm or between different algorithms.

2.1 SM2

Bai et al. [15] did a theoretical comparative study of SM2 and the international standard ECC algorithm, in regards to efficiency and security. They concluded that the SM2 algorithm is better than the international standard ECC algorithm. The reasons for the conclusion are that SM2 increases the complexity of the data to be signed, and SM2 can verify the correctness of the plaintext. However, no practical experiments were done. Feng et al. [16] designed and implemented a testing and evaluation system for trusted computing (TC) platforms. Part of this system tested the correctness and performance of cryptographic algorithms used inside the secure chips and trusted software stacks. In their test cases RSA, SM2, SM3, SM4, and HMAC were included. When comparing the performance, it is concluded that SM2 is faster than RSA. However, no data is presented, and the reproducibility is limited. Bai and Zhao [17] investigated the speed limit of SM2 point multiplication in serial and parallel architectures without considering any resource constraints. The authors also compared the performance difference between non-adjacent form (NAF) and w-NAF encoding to get more detailed information about the SM2 speed limit. However, Bai and Zhao only investigated a part of the SM2 algorithm, a complete performance evaluation would, therefore, be interesting. Zhao and Bai [18] presented a high-performance point multiplication scheme for SM2. The execution time of ECC mainly consists of point multiplication, which is a fundamental operation in ECC. The authors optimized this operation by using a one- cycle full-precision multiplier. During hardware evaluation, their SM2 architecture could perform over 49000 point multiplications per seconds, which was the highest known single core performance, according to the authors.

7 8 Chapter 2. Related Work

2.2 SM3

The previous research related to SM3 focuses mostly on either the security or hard- ware performance of SM3.

Mendel et al. [19] concluded that the design of SM3 is very similar to SHA- 256, and extended the methods for collision attacks on SHA-256 and applied them to SM3. Their results were two collision attacks on round-reduced SM3 with practical complexity. The authors managed to construct collisions for up to 20 rounds of SM3. Free-start collisions (collisions where the attacker decides the initialisation vector IV) were constructed for up to 24 rounds of SM3. Ao et al. [20] presented a compact hardware implementation of SM3 using SRAM for the message expansion function instead of shift registers. The authors compared the throughput of their SM3 hardware implementation with a SHA-256 and SM3 hardware implementation from other studies. The compact architecture which they implemented can be used in a resource-constrained system due to its low-cost and low-power. Ma et al. [21] evaluated and optimized the hardware performance of the hash algorithm SM3 on FPGA (Field-Programmable Gate Array). In this paper, Ma et al. proposed new optimization techniques for SM3 since existing optimization techniques are not applicable to SM3. Their study focuses on the hardware performance of FPGA and the optimization techniques proposed by Ma et al. can be used for the implementation of other hash algorithms, such as SHA-2.

2.3 SM4

In the paper [22], Liu et al. analyzed the SM4 algorithm and investigated the origin of SM4 S-box. The authors presented in this paper that the design of SM4 is influenced by Rijndael (the original name of AES); the major reason being that both algorithms use an inversion-based mapping. This paper’s major focus is the origin of SM4 S-box, but indirectly the authors are comparing SM4 with AES. Cheng and Ding [23] described the block ciphers DES, AES, and SM4, as well as describing design theory and the structure of block ciphers. The authors also reviewed common cryptoanalysis methods related to block ciphers. However, no outright comparisons of the block cipher algorithms were described in the paper. Several papers examine the security of SM4. In one paper by Ji et al. [24] examines algebraic attacks against SM4 and compare the resistances of SM4 with that of AES. The authors found that SM4 seems stronger than AES when it comes to algebraic attacks. Li et al. [25] implemented a new encryption and authentication scheme SM4- GCM on Field-Programmable Gate Array (FPGA). In this paper, Li et al. proved that the new design of encryption and authentication scheme has a higher throughput and lower resource consumption by conducting a hardware performance evaluation on SM4-GCM and AES-GCM. 2.4. Cryptographic Algorithm Comparison 9

2.4 Cryptographic Algorithm Comparison Many articles are presenting a comparison of standard cryptographic algorithms. These papers served as inspiration for the authors in their master thesis project. Some of these papers are described below.

Mandal et al. [26] did a performance comparison between two encryption al- gorithms, DES and AES. In this study, the authors implemented DES and AES algorithm in MATLAB and evaluated the performance based on the evaluation pa- rameters: avalanche effect, memory required and simulation time. In 2017, Bharathi et al. [27] presented a performance evaluation of three sym- metric cryptographic algorithms: Blowfish, SF block, and IDEA with different key length. The performance was evaluated based on several metrics: encryption and decryption time, the throughput of encryption and decryption, CPU process time, CPU clock, power consumption, and memory utilization. Raigoza and Jituri [28] have done a similar study where evaluated the perfor- mance of AES-128 and Blowfish-128. However, the article focused on how changes in the input will affect the performance.

2.5 Knowledge Gap Published papers and articles comparing ECDSA, RSA, SHA-256, and AES with SM2, SM3, and SM4 are limited, and none of the found papers and articles compares the algorithms design and/or performance to any higher extent. To the best of the authors knowledge, there are no research that compares the soft- ware performance of Chinese cryptographic algorithms and standard cryptographic algorithms in a comprehensive way and present numbers of the relative performance differences between them. Therefore, the authors believe that the results of this master thesis will be a contribution to the computer science community.

Chapter 3 Background

This chapter describes Chinese cryptography laws in more detail to ensure a better understanding of the thesis aim. Some important cryptography concepts are also explained, such as symmetric and asymmetric cryptography, confusion, diffusion, elliptic curve cryptography, and the block cipher modes ECB, CBC, and CTR. These cryptography concepts are described so that the reader may understand the details of the algorithms’ designs located in appendix A.

3.1 Cryptography Law in China China is regulating the import, export, sale, use, and research of commercial en- cryption in accordance with the State Council Directive No. 273 "Regulations on the Administration of Commercial Cryptography", which was released in 1999 [29]. "Commercial encryption" is defined as encryption protecting data in the commer- cial area, such as enterprises, banks, and telecommunications [30]. The regulations prohibit foreign encryption products and require all companies or individuals selling or producing commercial encryption products to gain the Office of the State Com- mercial Cryptography Administration’s (OSCCA) approval. The regulations apply to products with encryption as their core function [9, 31]. In 2007, the "Regulations on Classified Protection of Information Security", also called Multi-Level Protection Scheme (MLPS), was released by the State Encryp- tion Administration. The regulations classify information systems into five grades, where the fifth grade is deemed to have a serious impact on national interests if compromised. Information systems such as finance, industry, commerce, banking, tax, education, and communications are classified as a grade three or above and are required to rely mainly on technology that has domestic intellectual property rights [31]. The OSCCA-published algorithms SM2, SM3, SM4, SM9, and ZUC are commer- cial cryptographic algorithms mandated by the State Cryptographic Administration to be used within China [30]. The technologies are widely used in China because of the legal usage of cryptography [32]. The State Cryptography Administration (SCA) manages the OSCCA [33]. In April 2017, OSCCA published a draft Encryption Law for public comment and re- view. If the law passes, it will be the most authoritative law on cryptography in China, addressing import, export, production, management, authentication, testing, use, and research of encryption [34]. The draft Encryption Law does not address whether companies and individuals are obligated only to use pre-approved domestic

11 12 Chapter 3. Background commercial encryption products, or if they can still apply to OSCCA to use foreign- produced commercial encryption products and technologies [29]. As of January 2019, the draft Encryption Law remains in development[35]. During September and October of 2017, China’s State Council and SCA released documents revealing changes in the regulations on commercial encryption products in China. Some approval requirements for the manufacturing, sale, and use of com- mercial encryption products were removed. The remaining approval requirements are targeting the approval of commercial encryption products themselves, instead of the entities in the supply-chain [36].

3.2 Symmetric and Asymmetric Cryptosystems In a symmetric cryptosystem, the same key is used for the operations in the cryp- tosystem, e.g. encryption and decryption. The key is shared between the sender and the receiver and must be kept secret. Stream ciphers, such as RC4, and block ciphers, such as AES, are examples of symmetric cryptosystems [37]. In an asymmetric cryptosystem, also known as public key cryptosystem, different keys are used for the operations in the cryptosystem, e.g. encryption and decryption or signing and verifying. The keys consist of a private and a public key. The public key can be made public without compromising the secrecy of the private key. RSA and elliptic curve cryptography-algorithms are examples of asymmetric cryptosys- tems [38]. For asymmetric cryptosystems to work a is needed. A trapdoor function is a function that is easy to compute one way but hard to re- verse without some given trapdoor information. With the trapdoor information, the inverse becomes easy to calculate. Systems commonly employ both symmetric and asymmetric cryptosystems. Asym- metric cryptosystems allow the establishment of a key and symmetric cryptosystems can then use the secret key to protect messages and information.

3.3 Confusion and Diffusion Confusion and diffusion are two important design principles for a secure cipher de- fined by Shannon [39]. Confusion is achieved by making the bits of the depend on several parts of the key and plaintext in a complicated manner, therefore obscuring the relationship between the plaintext and ciphertext. Diffusion is the process where a change of one bit in the plaintext block should have a chance to impact half of the bits in the ciphertext. Confusion can be achieved by the use of substitutions and diffusion can be achieved by using permutations. When confusion and diffusion are used together, they create resistance against differential and linear cryptoanalysis. 3.4. Elliptic Curve Cryptography 13

3.4 Elliptic Curve Cryptography

Elliptic Curve Cryptography (ECC) was introduced in the 1980s by Neal Koblitz and Victor S.Miller. ECC is an algorithm used for public-key cryptography, including key agreement, digital signature, random number generators, encryption, and decryption. The algorithm is based on elliptic curves over finite fields (Galois Field), and on the Elliptic Curve Discrete Logarithm Problem (ECDLP). Galois Field, denoted GF is a field with a finite field order, i.e. the number of elements denoted p. The order of the finite fields is usually a prime or a power of a prime. ECDLP is defined as the following computational problem:

Theorem 1. Given P, ∈ E(Fq), where P is a point of order n, to find the integer a, 0 ≤ a ≤ n such that Q = aP .

The difficulty of ECDLP has a major role in the security of ECC because the problem can be resolved only in exponential time on a non-quantum computer [40]. Elliptic curves can be defined over a prime field Fp or a binary field F2m . Prime curves are more efficient in software implementations while binary curves are better for hard- ware implementations [41]. The elliptic curves over prime field Fp and over binary field F2m are described below.

Elliptic curves over Fp An equation defining an elliptic curve over Fp is defined as:

y2 = x3 + ax + b mod p (3.1)

3 2 where a, b ∈ Fp and (4a + 27b ) mod p 6= 0.

The elliptic curve of E(Fp) is defined as:

2 3 E(Fp) = {x, y ∈ Fp : y = x + ax + b} ∪ {O}} (3.2) where O is denoted the point at infinity.

Elliptic curves over F2m The equation of an elliptic curve over F2m is defined as:

y2 + xy = x3 + ax2 + b (3.3) where a, b ∈ F2m and b 6= 0.

The elliptic curve of E(F2m ) is denoted as:

2 3 2 E(F2m ) = {x, y ∈ F2m : y + xy = x + ax + b} ∪ {O}} (3.4) where O is denoted the point at infinity, and E(F2m ) is denoted the number of points on an elliptic, which is also called the order of E(F2m ). 14 Chapter 3. Background

3.5 Block Cipher Mode of Operation

Block cipher is a symmetric encryption algorithm that takes an input of n bits and produces a ciphertext of the same length. If the input is larger than n bits, the input will be divided further. Block cipher such as AES has five different confidentially modes of operation. A block cipher mode specifies how to handle data that spans over multiple blocks. These block modes provide data integrity and are used to im- prove the security of a block cipher algorithm. The common modes are Electronic Codebook (ECB), Cipher Block Chaining (CBC), Output Feedback (OFB), Cipher Feedback (CFB), and Counter (CTR) mode. NIST has defined the block cipher modes accordingly in their "Recommendation for Block Cipher Modes of Operation - Method and Techniques". The documentation is available at [1]. The ECB, CBC, and CTR mode are described below, because these modes are used in this master thesis.

Electronic Codebook, ECB The ECB mode is the simplest block cipher mode. In ECB encryption, the input is divided into blocks, and a cipher function is then applied to each block of the plaintext. In ECB decryption process, the inverse cipher function is applied on each block of ciphertext. A visual representation of the ECB mode is shown in figure 3.1.

Figure 3.1: The ECB encryption and decryption. Figure adapted from figure 1 in [1].

An advantage is that ECB mode is faster than other modes, due to that the encryp- tion and decryption of blocks can be parallelized. An disadvantage of ECB mode is lack of diffusion. ECB always encrypts a given plaintext blocks to same ciphertext block. For a visualisation of the insecurity of ECB, see [42]. However, due to this property, NIST does not recommend to encrypt data in ECB mode [1]. 3.5. Block Cipher Mode of Operation 15

Cipher Block Chaining, CBC The CBC mode requires an (IV) which is sent along with the message. In CBC encryption, each block of plaintext is XORed with the previous ci- phertext block, except the first round, where the IV is XORed with the first plaintext block. The IV does not need to be secret, but it must be unpredictable. The CBC decryption works the same as encryption. The decryption process starts with that the inverse cipher function is applied on the first ciphertext block, the result is then XORed with an IV to retrieve the first plaintext block. This process is repeated on remaining blocks of ciphertext, the only difference being that the previous block of ciphertext is used instead of the IV. The operation of CBC mode is shown in figure 3.2.

Figure 3.2: The CBC encryption and decryption. Figure adapted from figure 2 in [1].

In CBC mode, it is not possible to parallelize the encryption process, due to the input of next plaintext block depends (except the first block) on the previous cipher block. However, decryption does not require previous plaintext block, and parallel decryption is, therefore, possible in the CBC decryption mode.

Counter Mode, CTR The Counter (CTR) mode turns a block cipher into a by generating a keystream by encrypting the input with a value. The value is a combination of a nonce and a counter. The nonce is a random integer used for every block of an en- cryption operation, and the counter is a value that is initialized to zero and increases for every block. The CTR encryption starts with applying the cipher function to the initialized value. The result is then XORed with the corresponding plaintext block and produces the ciphertext block. The decryption process works the same as the encryption process. The CTR encryption and decryption is shown in figure 3.3. 16 Chapter 3. Background

Figure 3.3: The CTR encryption and decryption. Figure adapted from figure 5 in [1].

The CTR mode uses a combination of a nonce and counter. This combination ensures that the same value will not be used more than once and that it will not be used in the same encryption algorithm in the same encryption session, which results in higher security. In CTR mode, every operation is computed separately, which means that parallelism of encryption and decryption is possible.

3.6 Algorithm Design For readers who want to get a detailed description about the algorithms, more infor- mation can be found in appendix A. Appendix A contains the details about the stan- dard cryptographic algorithms AES, ECDSA, RSA, and SHA-256, and the Chinese cryptographic algorithms SM2, SM3, and SM4. Otherwise the reader can continue to chapter 4. Chapter 4 Method

This chapter describes the research methods used during the master thesis project. Two research methods were selected and used to answer the research questions. To answer research question RQ1 and RQ1.1 a literature analysis was conducted. To answer RQ2, RQ3, and RQ4 a controlled experiment was conducted.

4.1 Literature Analysis A literature analysis was conducted to answer research questions RQ1 and RQ1.1. The motivation for choosing this method was to ensure a thorough understanding, gain knowledge, and collect information about the selected cryptographic algorithms. The literature analysis was also chosen to aquire a good aspect of the research field, gain knowledge of experiment design, processes, and tools which can be adopted in the master thesis project. The literature analysis also helped to identify the knowledge gap described in section 2.5.

4.1.1 Databases and Search Engines Search engines such as BTH Summon and Google Scholar were used to find relevant articles, journals, and conference proceedings. BTH Summon is a search engine provided by Blekinge Institute of Technology (BTH). Google Scholar is a search engine provided by Google. Both BTH Summon and Google Scholar offers a discovery service to find academic papers in several digital libraries, such as IEEE Xplore and ACM Digital Library.

4.1.2 Procedures and Approaches This section describes the procedures and approaches used during the literature analysis. The snowballing and three-pass approach, described below, allows one to perform a thorough collection of academic articles. The snowballing procedure ensures that more relevant articles can be found than if only database searches are made to find articles. The database search results rely on the keywords in the search string. If the search string does not cover essential keywords, some articles might not be discovered.

17 18 Chapter 4. Method

Snowballing Snowballing is a search approach that is used to find relevant articles on a topic [43]. Before the snowballing procedure can start a start set of papers has to be identified. The snowballing procedure is divided into two parts, backwards snowballing and forward snowballing. In backward snowballing, the reference list is used to discover new papers to include. While in forward snowballing, new papers are discovered by identifying papers that cite the current examined paper. The snowballing procedure is applied in iterations. In the first iteration, the start set of papers are processed using the snowballing procedure. In the second iteration, any newly discovered pa- pers are processed. The iterations continue until no new papers are discovered.

Three-pass approach The three-pass approach [44] consists of reading the paper in up to three passes, described below.

• First pass: In the first pass the title, abstract, introduction, headers, and conclusion is read to get a general idea about the paper. Mathematical contents and references are also skimmed through.

• Second pass: In the second pass the paper is read more thoroughly. The point of this pass is to grasp the content of the paper, but not the details.

• Third pass: In the third pass the paper is practically re-implemented by making the same assumptions as the authors and re-creating their work. This pass is made to fully understand the paper, which is needed for reviewers.

4.1.3 Used Approach The snowballing procedure was repeated for each algorithm (AES, ECDSA, RSA, SHA-2, SM2, SM3, and SM4) to find relevant papers. The steps that were taken to find the papers are:

1. Identify a start set of papers using one of the following keywords and search strings in the search engine and databases mentioned in section 4.1.1:

• (AES) AND ((encryption) OR (design) OR (algorithm)) • (ECDSA) AND ((signature) OR (verification)) • (RSA) AND (crypto* OR "public key") • ((SHA-2) OR (SHA-256)) AND (hash*) • (SM2) AND (algorithm) • ((SM4) OR (SMS4)) AND ((crypto*) OR (encrypt*) OR ("block cipher")) • (SM3) AND (hash*)

The search strings above were used as a base. Additional keywords were ap- pended to the base search string, such as AND (performance), AND (design), AND (), where algorithm-name was the name of a corre- sponding algorithm to be compared with. 4.2. Controlled Experiment 19

2. Perform backwards- and forward snowballing on each paper in the start set. Exclude identified papers based on the following criteria: • The language in the paper is not English. • The title of the paper is irrelevant. • The paper is not published. • The paper is not available online. Use the three-pass method on remaining papers to determine whether to include or exclude the paper for the next iteration of snowballing. Note that the third pass in the three-pass method was not used since the first and second pass was sufficient to decide on including or excluding a paper. 3. Continue with iterations of snowballing until no new papers are found. The snowballing procedure was complemented with database searches to potentially find more papers to include. For the database searches, the same keyword and search string as used in Step 1 was used. The papers found through snowballing and database searches were used to do a theoretical comparison of the algorithms, which allowed us to answer RQ1 and RQ1.1.

4.2 Controlled Experiment A controlled experiment was conducted to answer research questions RQ2, RQ3, and RQ4. The motivation for choosing an experiment was because it will support or disprove the theories made about the algorithm performances based on the theoretical study of the algorithms. An experiment was also chosen since it will provide actual performance measurements when using the algorithms in practice which can be used to compare and rate the algorithms.

4.2.1 Libraries and Tools The libraries and library version used are shown in table 4.1.

Table 4.1: Used library and library version

Library Version OpenSSL OpenSSL 1.1.1b GmSSL 2.5.0 (fork of OpenSSL 1.1.0d) Botan 2.11.0

Three tools were used to measure real-time, CPU time, CPU cycles, and resident set size (RSS). Perf is a performance analysis tool for Linux [45]. The subcommand perf stat [46] counts software and hardware events related to a particular process or command. Perf stat will run the program or command and gather performance counter statis- tics. CPU time and CPU cycles were measured using perf stat tracking the event 20 Chapter 4. Method cpu-clock and cpu-cycles. Perf stat also measures the elapsed execution time (real- time). The file /proc//status is part of the /proc file system which provides access to data structures, parameters and statistics in the running Linux kernel. The file contents are generated by the Linux kernel when the file is accessed; there- fore the file contents reflect the kernel data at the access moment [47]. The file proc//status contains status and statistical information about the process with a specified PID. Among this information, the fields VmHWM and VmRSS rep- resents the peak RSS, and the current RSS respectively. The tool /usr/bin/time runs a specified program and summarizes the system re- source usage. The command will report the maximum resident set size of the process during its lifetime in Kbytes [48]. /usr/bin/time collect most of its information from the wait3 system call [49], which returns the resource usage information about the child process using the getrusage system call. Getrusage retrieves process statistics from the kernel [47].

4.2.2 System Specification The experiment was performed on an HP EliteBook 850 G5 laptop. The specifica- tions can be seen below in table 4.2.

Table 4.2: Specifications of experiment machine.

Processor Intel Core i7-8650U CPU @ 1.90GHz x 8 RAM 32 GB DDR4, Speed: 2400MHz OS Ubuntu 16.04 LTS 64-bit Architecture x86

4.2.3 Experiment Design When designing the experiment a one-factor-at-a-time (OFAT) design was used. The OFAT method involves testing the factors one at a time, instead of testing multiple factors simultaneously. This type of experiment-design was chosen since there was only one factor to be tested, which is the cryptographic algorithm. The goal was to find out how the different cryptographic algorithms impacted the response variables real-time, CPU time, CPU cycles and RSS. To estimate an average value of the response variables, a sample of measurements were needed. The sample represents the result of x amount of experiment replications. A sample size around 100 was chosen since it is both a relatively large sample and the tests execute in a reasonable amount of time. All experiments were replicated 120 times to allow removal of the first 20-30 samples, to ensure fair cache contents. The details of the experiment design can be seen in table 4.3. 4.2. Controlled Experiment 21

Table 4.3: Details of the OFAT experiment design.

Experiment 1 Experiment 2 Experiment 3 Key generation time, Hash time (real-time Encryption/decryption time signature generation time, Response Variables and CPU time), CPU cycles, (real-time & CPU time), signature verification time, RSS CPU cycles, RSS CPU cycles, RSS Primary Factor Digital signature algorithm Hash algorithm Encryption algorithm ECDSA with curve prime256v1 SHA-256 AES-128 (ECB, CBC, CTR) Levels SM2 with curve sm2p256v1 SM3 SM4 (ECB, CBC, CTR) RSA with key length 3072 120 (for each algorithm, 120 (for each algorithm, 120 (for each algorithm Replication operation, mode, operation, and and measurement) and measurement) measurement)

4.2.4 Used Approach The experiment was carried out by creating bash scripts for each algorithm type (digital signature algorithms, hash functions, block ciphers). The bash scripts ac- cepted arguments specifying different input parameters, such as input/output file, library, mode, operation, and the number of iterations. The bash script would then use either the OpenSSL/GmSSL CLI or executable files to encrypt, decrypt, hash, generate keys, sign or verify. The executable files were created by writing C++ programs using the Botan library to implement the algorithms, which were then compiled with g++. The bash scripts would alternate between the algorithms while executing as well as giving the new process the highest priority and lowest nice value. The source code of the used scripts and C++ programs can be found in [50]. The tools described in section 4.2.1 were used to collect the measurements and write them to files. The file to be encrypted, decrypted and hashed was a 1GB file with random content created using the following command:

head -c 1000000000 file1G A file with random content was used to simulate an arbitary file in a real scenario. Before starting the script for experiment 1, files containing ECDSA, SM2, and RSA keys were placed on a ramdisk. A ramdisk was used to exclude as much as possible of the disk I/O time from the performance measurements. The file to be signed containing a 256-bit hash value was also placed on the ramdisk as well as three signed files which were signed with ECDSA, SM2, and RSA. Since key generation, signing, and verification are very fast operations; the operations were carried out several times to get a longer run-time. The run-time was then divided with the number of key generations/signatures/verifications made. In experiment 2, the generated 1GB file was placed on a ramdisk. The script for experiment 2 was then started. In experiment 3, before starting the script, the 1GB file as well as three encrypted versions of the 1GB file were placed on a ramdisk. The three encrypted versions were encrypted with AES-128, AES-128-NI, and SM4. 22 Chapter 4. Method

4.2.5 Distribution Analysis To determine if the differences in the collected data sampels are statistically signif- icant a hypothesis test was planned. The t-test, which the typical parametric test, assumes that data comes from a normally distributed population. The purpose of a distribution analysis is, therefore, to test if this assumption holds. To examine if the data samples from the experiment had a normal distribution a distribution analysis was conducted. Histograms and QQ plots of all samples were created. The histograms divided the sample data into 50 ranges of values, also known as bins.

4.2.6 Mann-Whitney U test The data samples from the experiment were compared using a nonparametric test, Mann-Whitney U test, to see if there are any statistical differences between the samples. Mann-Whitney U test is a nonparametric test, which is used to compare difference between two independent samples. The Mann-Whitney U test starts by combining two data samples of size n1 and size n2 into a single sample. The combined sample is then ranked ordered to determine if the ordered ranks of the two samples are randomly mixed or clustered at opposite ends. If the rank orders are randomly mixed it would mean that the two samples are not different, while a clustered rank order indicates a difference between the samples. The rank sums R1 and R2 are then calculated by adding the ranks together for each sample. The U-values are then calculated using the following formula:

ni(ni + 1) X U = n n + − R i 1 2 2 i where ni is the sample size of the sample of interest, n1 is the sample size of the first P sample, n2 is the sample size of the second sample, and Ri is the sum of ranks from the sample of interest, i.e R1 or R2. The U-values will be numbers between 0 and n1 · n2. The maximum possible U-value is equal to the product of both samples sizes, , since the rank sum will be equal to ni(ni+1) when all ranks of the n1 · n2 Ri 2 sample of interest are lower than the ranks of the other sample, i.e. when the ordered ranks are not mixed at all. In this case the U-value of the other sample will be equal to 0. The next step of the Mann-Whitney U test is to find the two-tailed probability estimate, p-value, based on the U-values. The significance level α is set to 0.05, which means that there is a 95% chance that an observed statistical difference will be real and not caused by randomness. The calculated p-value is compared with α. If p < α the null hypothesis will be rejected, which means that there is a significant difference between the compared algorithms. If instead p > α the null hypothesis will not be rejected. 4.3. Validity 23

The null hypothesis in the thesis is defined as

H0: There is no significant difference between the algorithm samples. and the research hypothesis is

Ha: There is a significant difference between the algorithm samples.

4.3 Validity

The following sections describe identified validity threats and how they have been mitigated.

4.3.1 Internal Literature Analysis A validity threat when conducting the literature analysis is that not all relevant articles and sources are found. When using the snowballing search approach, there is a possibility that a cluster of papers is found since many of the same references are found in several papers. If a cluster of papers is encountered, no new papers will be discovered. This threat was mitigated by combining the snowballing approach with normal database searches. This approach increases the probability that a relevant paper that was not discovered using snowballing is found.

Controlled Experiment A source of error that can impact the experiment results is process scheduling done by the operating system. If the operating system schedules another process the CPU will do a context switch, which will impact the performance measurements. This error is reduced by giving the measured process highest priority together with the lowest nice value using the command sudo chrt -f 99 nice -n -20. This approach will minimize the number of involuntary context switches. However, some context switches will still happen. To avoid one process causing context switches for only one algorithm the execution of algorithms was alternated, e.g. the execution of hash functions looks like SM3, SHA-256, SM3, SHA-256... This approach will make the results more fair and comparable. Another source of error impacting time measurements is that the CPU has to wait for I/O operation to complete. When running the algorithms, there will be several I/O operations reading and writing files to the disk, e.g. reading the file to be encrypted or hashed. The relative performance difference between the algorithms are of interest since the relative difference can be generalized to other systems which have other specifications compared to the experiment machine. The I/O time is not dependent on the cryptographic algorithms and is, therefore, minimized using a ramdisk. A ramdisk is a portion of the computers RAM which can be used as a 24 Chapter 4. Method disk drive. This storage method is much faster than a hard disk drive and therefore minimizes the time of I/O operations. A third error source is the possibility that some algorithms are implemented in a more optimized way than other algorithms in the programming library. This threat was mitigated by testing the algorithms using two different programming libraries. If a similar relative performance difference is apparent using both programming li- braries, the chances are higher that the performance difference depends on the algo- rithms and not only the algorithm implementations. The use of two programming libraries makes the results more reliable than if only one programming library were used. Other actions taken to improve the validity and reliability of the experiment results are to use the same programming language and same approach (within each programming library), same hardware, and same software during the experiment.

4.3.2 External To improve the external validity the algorithm implementations tested are present in the legitimate open-source cryptogrphy libraries OpenSSL, GmSSL, and Botan. This improves the validity since the experiment outcome can be translated into practical situations in the real world where OpenSSL, GmSSL, Botan, and other open-source cryptography libraries are used. This is a better approach than, for example, imple- menting a new cryptography library from scratch. The exact numerical numbers that are measured in the experiment are not very generalizable. The exact system specifications and conditions would be required to retrieve the same exact results. However, the relative difference between algorithms will be generalizable to other systems with x86 architecture, regardless of operating system or system specifications.

4.3.3 Algorithm Implementations Verification The algorithm implementations were verified by reviewing the source code in the li- braries OpenSSL, GmSSL, and Botan. The code was checked against the algorithms’ documentations to verify that the correct algorithm was implemented. For the ECC based algorithms (ECDSA and SM2) the curve parameters were also verified against the documentations. Additionally, the block ciphers SM4 and AES-128 were also verified by encrypting a file with one library and then decrypting the output file with the other library, e.g. encrypting with Botan and decrypting with OpenSSL and vice versa. These verification tests gave the correct output for both SM4 and AES-128. The hash algorithms SM3 and SHA-256 were verified by hashing a file with the libraries OpenSSL, GmSSL, and Botan, and verifying that all libraries output iden- tical hash values when using the same hash algorithm. The implementations of SM3 and SHA-256 passed the verification tests. The public key algorithms ECDSA, SM2, and RSA, did not go through the same verification tests between libraries as the block ciphers and hash algorithms. This was due to inconsistencies with the key formats between the libraries, e.g. Botan could not load the GmSSL generated keys and GmSSL could not load the Botan 4.3. Validity 25 generated keys. This problem led to that the planned verification tests (signing with one library and verify with another) could not be performed. Therefore, the public key algorithms were only verified by reviewing the source code.

Chapter 5 Results

In this chapter, the results are presented. The results of the literature analysis described in section 5.1 are first presented and then the results of the experiment described in section 5.2 are presented. Section 5.3 presents result of the distribution analysis and section 5.4 present the results of Mann-Whitney U test.

5.1 Literature Analysis

This section presents a comparison of the algorithms included in this study, based on the information obtained during the literature analysis. All properties and details about the algorithms are described in appendix A. Each subsection ends with a table ranking the algorithms based on expected performance. A rank of 1 indicates that the algorithm has better performance than the following ranks.

5.1.1 Design Comparison of SM2, RSA, and ECDSA The design similarities and differences of ECDSA, SM2, and RSA are shown in table 5.1. ECDSA, SM2, and RSA are all asymmetric, i.e. they have a public key and a corresponding private key. The ECDSA and SM2 public key, Q and P , are points on the defined elliptic curve, while RSA’s public key is a composed pair of two positive integers hN, ei. In ECDSA and SM2, the private key is a random integer which satisfies either d ∈ [1, n − 1] (ECDSA) or d ∈ [1, n − 2] (SM2). The RSA private key is composed of a pair of positive integers hN, di. The ECC key generation consists of choosing a point G in the elliptic curve, generating a random integer d, and calculating P = [d]G. The RSA key generation consists of generating two large prime integers p and q, calculate N = pq, choose an encryption exponent e relatively prime to ρ(N), and then find the multiplicative inverse d of e. The operations in ECC key generation are fast and straightforward to calculate, compared to RSA key generation, which works with large integers and has to generate large prime integers. Therefore, the ECC key generation should be faster than that of RSA. The SM2 and ECDSA key generations differ only in the defined curve, which means that the base point G will differ. However, the same calculations are required to generate the SM2 and ECDSA key. Therefore, the SM2 and ECDSA key generation will be equally fast, provided that the curves used are equivalent, e.g. SM2 curve and P-256.

27 28 Chapter 5. Results

Table 5.1: ECDSA, SM2, and RSA properties.

ECDSA SM2 RSA Asymmetric key Asymmetric cryptosystem Asymmetric cryptosystem Type algorithm Elliptic curve discrete Elliptic curve discrete Integer factorization Based on logarithm problem logarithm problem problem Digital signatures Digital signatures Used for Digital signatures Encryption & decryption Encryption & decryption Key exchange Key exchange Public key Q = [d]G P = [d]G hN, ei, N = p · q Private key d (random integer) d (random integer) hN, di, N = p · q P-256 Private key: 256 Public key: 512 SM2 curve Recommended Private key: 256 2048 key length (bits) P-384 Public key: 512 Private key: 384 Public key: 768 Hash function (SHA-1 Hash function (SM3) or SHA-2) PSS Digital signature Random number Random number Prime number generator auxiliary functions generator generator

The security of ECDSA and SM2 is based on the elliptic curve discrete logarithm problem (ECDLP) while the security of RSA is based on the integer factorization problem (IFP). There are several known algorithms to solve the IFP that have a subexponential running time, such as the general number field sieve (GNFS). On the other hand, the fastest known algorithm to solve to ECDLP is Pollard’s rho method which has a fully exponential running time [51]. This leads to RSA needing larger keys than ECDSA and SM2 keys to provide the same security level. The required key lengths for ECC and RSA for different security levels can be seen in table 5.2.

Table 5.2: Security level and required key lengths [6].

Security level ECC key length RSA key length (bits) (bits) (bits) 80 160 1024 112 224 2048 128 256 3072 192 384 7680 256 512 15360

In RSA a small public exponent e is commonly chosen in order to reduce the time of encryption and signature verification. The RSA decryption and signature generation operation is relatively computationally heavy since both d and N are large integers. The workload between encryption/signature verification and decryp- tion/signature generation are therefore often unbalanced in RSA. In ECC the point multiplication operation has higher time complexity than other operation in ECC [52], and it requires the most execution time [18]. Since the signature verification for both ECDSA and SM2 performs two point multiplication, compared with one point multiplication in the signature generation, the verification 5.1. Literature Analysis 29 is likely slower than the generation. Studies comparing ECDSA with RSA have shown that ECDSA’s key generation is significantly faster than RSA’s key generation for all key lengths. The signature generation speed of ECDSA and RSA are comparable at smaller key sizes, while at larger key sizes ECDSA signing is much faster than RSA signing. The signature verification speed is faster for RSA than for ECDSA [53] [54]. Other studies indicate that ECDSA signature generation is faster than RSA signature generation at smaller key sizes as well [55]. Since both ECDSA and SM2 are based on ECC, it is expected that SM2 will show the similar performance differences to RSA as between ECDSA and RSA. The algorithms for generating and verifying signatures are similar for ECDSA and SM2, but not identical. This will likely lead to small differences in performance between the two. ECDSA and SM2 will most likely have comparable memory consumption as well, due to that they are both based on ECC and having a similar design. The memory consumption of RSA will consist of a few large integers, such as key values. While SM2 will have several smaller values in memory, such as the key, curve parameters, and intermediate values used when calculating the signature. SM2 has more steps to perform in its signature generation and signature verification algorithm, this can lead to longer program code than RSA. RSA only has one step to generate and verify a signature, however RSA implementations tend to contain extra code to speed up the calculation and encode the message, such as PSS. Therefore, it is difficult to draw a conclusion about which algorithm will be the most memory efficient. A summary of the hypothesis about the digital signature algorithms expected performance based on the theoretical algorithm comparison can be seen in table 5.3.

Table 5.3: Ranking of digital signature algorithms based on expected time and mem- ory performance.

Metric Operation Ranks 1. ECDSA/SM2 Key generation 2. RSA Time 1. ECDSA/SM2 Sign 2. RSA 1. RSA Verify 2. ECDSA/SM2 Key generation - Memory Sign - Verify - 30 Chapter 5. Results

5.1.2 Design Comparison of SM3 and SHA-256 The properties of the SM3 and SHA-256 algorithm can be seen in table 5.4. SM3 and SHA-256 have many similarities. Both hash algorithm takes an input M with length l, where 0 ≤ l ≤ 264. The message is padded using Merkle-Damgård construction. It is then divided into blocks of the same block length. These blocks are processed one at a time by a compression function, where the output serves as input to the compression of the next block. The compression function operates 64 rounds for each message blocks and outputs a 265 bits message digest.

Table 5.4: SM3 and SHA-256 properties.

SM3 SHA-256 Structure Merkle-Damgård Merkle-Damgård Compression Davies-Meyer Davies-Meyer (based on) function Input (bits) 0 ≤ l ≤ 264 0 ≤ l ≤ 264 Output (bits) 256 256 Rounds 64 64 ADD, XOR, NOT, OR, ADD, XOR, NOT, Operations ADD (mod 232), ADD (mod 232) , SHR, Concatenation, ROTL Concatenation, ROTR Constants 2 64 (words)

SM3 uses a Davies-Meyer compression function while SHA-256 uses a compression function similar to Davies-Meyer. The difference lies in how the chaining value is combined with the block cipher output to create the next chaining value. In SM3 an XOR operation is used, and in SHA-256 an ADD operation is used. Du and Li [56] stated that the compression function of SM3 is more complicated than compression functions in other common hash algorithms. The SM3 compression function limits the algorithm’s throughput, because of the rotational shift, ADD, and XOR operations which leads to a long circuit delay. The total amount of logical operations done during the 64-rounds in the com- pression functions are the same in SM3 and SHA-256, see table 5.5.

Table 5.5: Number of logical operations in SM3 and SHA-256’s compression function.

XOR ADD ROTL ROTR AND NOT OR Total SM3 256 512 512 0 192 48 144 =1664 SHA-256 448 448 0 384 320 64 0 =1664

The Intel Optimization Reference Manual [57] states that all logical operations in table 5.5 requires one clock cycle to complete the execution of the instruction on Intel processors using the Skylake microarchitecture. Other processor microarchitectures, such as Broadwell, Haswell, and Ivy Bridge, also require one clock cycle to execute the logical instructions with the exception of NOT, which requires two clock cycles. The 5.1. Literature Analysis 31 throughput, i.e. the number of clock cycles to wait before the same instruction can be executed, is slightly higher for ROTL and ROTR than for the other instructions, 0.5 clock cycles and 0.25 clock cycle respectively. In conclusion, all the logical operations are fast on Intel processors, and the small performance differences will not make a huge difference in the hash-algorithm speed. SM3 uses four temporary variables, SS1, SS2, TT 1, and TT 2, in its compression function while SHA-256 uses two temporary variables, T1 and T2. This results in more assignments-operations, using move instructions, for SM3 than for SHA-256, as seen in table 5.6.

Table 5.6: Number of assigments in SM3 and SHA-256’s compression function.

Assignments SM3 12·64 = 768 SHA-256 10·64 = 640

During the 64-rounds of the compression function, this will result in SM3 doing 128 move instructions more than SHA-256. Although the move instruction is simple and quick, with large files many blocks need to be processed, and this could result in that the hash-calculation of SM3 is slightly slower than that of SHA-256.

SM3 uses the constant Tj during the rounds in the compression function. Tj has one predefined 32-bit value for the first 16 rounds, while another predefined 32-bit value is used for the remaining 48 rounds, see definition A.31. SHA-256 uses similar constants Kt in each round of its compression function. However, the Kt constants change every round, which means that there are 64 constants in SHA-256. SM3 uses 8 bytes to store constants while SHA-256 uses 256 bytes. The message schedule and message expansion step of SHA-256 and SM3 respectively, are very similar. In these steps, the 512-bit message blocks are expanded into 32-bit words. SHA-256 expands the block to 64 words , while SM3 expands the block to 132 words and 0. Wt Wj Wj Overall SM3 uses 272 bytes (68 words) more to store the values of and 0. Wj Wj Due to the design and properties of SM3 and SHA-256 being very similar it is likely that the time and memory performance differences between SM3 and SHA-256 will be minimal. A summary of the hypothesis about the hash functions expected performance based on the theoretical algorithm comparison can be seen in table 5.7.

Table 5.7: Ranking of hash functions based on expected time and memory perfor- mance.

Metric Operation Ranks Time Hash calculation 1. SHA-256/SM3 Memory Hash calculation 1. SHA-256/SM3 32 Chapter 5. Results

5.1.3 Design Comparison of SM4 and AES-128 The design of SM4 and AES have many similarities, but also a few differences. The summarized properties of SM4 and AES can be seen in table 5.8.

Table 5.8: SM4 and AES-128 properties.

SM4 AES-128 Type Block cipher Block cipher Unbalanced Feistel Substitution–permutation Structure Network (UFN) network (SPN) Field(s) GF (28) and GF (2) GF (28) and GF (2) Block size (bits) 128 128 Key length (bits) 128 128 Round keys 32 keys á 32 bits 11 keys á 128 bits Number of rounds 32 10 S-box Inversion-based mapping Inversion-based mapping Number of S-box 128 160 lookups XOR, Sbox, cyclic bit shifts, XOR, Sbox, cyclic bit shifts Operations modular multiplication

Liu et al. [22] studied the S-box design of SM4 and concluded that the design is similar to the design of the AES S-box. Both S-boxes use inversion-based mapping, can be represented as an affine transformation, and is a fixed 16×16 table where the input and output are 8 bits. SM4 has one S-box, while AES has two S-boxes, one for encryption and one for decryption. SM4 is based on Unbalanced Feistel Network (UFN) while AES is based on Substitution-permutation network (SPN). SPNs have more built-in parallelism [58] in general, which means that operations during current transformations can be exe- cuted at the same time. The AES operations AddRoundKey, SubBytes, ShiftRows, and MixColumns operate on the state either column-by-column, row-by-row, or byte- by-byte. Since each column, row, or byte can be operated upon independently, there are many parallelization capabilities in AES. In SM4, four S-box lookups and four cyclic bit shifts can be parallelized each round. SM4 has more rounds than AES, and the number of operations that can be parallelized in each round is less. Since AES is based on an SPN, which comes with greater parallelization capabilities, it is likely that an AES implementation is faster than an SM4 implementation. AES and SM4 have the operations XOR, S-box, and cyclic bit shifts in com- mon. AES also performs modular multiplications. XOR and cyclic bit shifts can be implemented using a single machine instruction, while S-box lookups and mod- ular multiplications require several instructions. XOR and cyclic bit shifts are fast compared to S-box lookups and modular multiplications. AES performs 160 S-box lookups in total during encryption, while SM4 performs 128 S-box lookups. However, the S-box lookups are not dependant on each other and could, therefore, be paral- lelized. In theory, all 16 S-box lookups in each round of AES could be parallelized. In SM4 there are 4 S-box lookups in each round that can be parallelized. Since AES 5.1. Literature Analysis 33 has 10 rounds and SM4 has 32 rounds, the total time spend on S-box lookups can be lower for AES because of the parallelization possibilities. SM4 has identical encryption and decryption algorithms. Only the order of the round keys has to be reversed for decryption. For AES the decryption process has to apply the inverse operations InvSubBytes, InvShiftRows, InvMixColums, and In- vAddRoundKey in the reverse order compared to the encryption process. These differences in the encryption and decryption processes require the encryption and decryption algorithms to be implemented separately. Due to the facts that AES needs separate encryption and decryption algorithms and has two S-boxes, while SM4 has one algorithm for both encryption and decryption and one S-box, it is probable that an AES implementation uses more memory than an SM4 implemen- tation. A summary of the hypothesis about the block ciphers expected performance based on the theoretical algorithm comparison can be seen in table 5.9.

Table 5.9: Ranking of block ciphers based on expected time and memory perfor- mance.

Metric Operation Ranks 1. AES-128 Encryption Time 2. SM4 1. AES-128 Decryption 2. SM4 1. SM4 Encryption Memory 2. AES-128 1. SM4 Decryption 2. AES-128 34 Chapter 5. Results

5.2 Algorithm Results The following sections present the results from the experiment described in section 4.2. The bar graphs in every section present the mean of the metrics of each algo- rithm. The error bars in each graph represent the standard deviation, which is a statistical measurement of how spread the data sets’ values are. Note that the performance results in Botan should not be compared with the performance results in OpenSSL/GmSSL since this would be an unfair comparison. The comparison should be made between algorithms within a cryptography library, and not across the libraries.

5.2.1 Digital Signature Results The results of key generation, signing, and verification using ECDSA, SM2, and RSA in Botan and GmSSL are seen in table 5.10. This table includes the mean value, median, standard deviation and minimum and maximum value. Figure 5.1, 5.2, 5.3, and 5.4 visually represents the results in bar graphs. Figure 5.1 shows the real-time required to generate a key, sign a 256-bit message and verify the signature for ECDSA, SM2, and RSA. RSA key generation takes substantially longer time than that of SM2 in both Botan and GmSSL. Signing with RSA also takes longer time than for SM2. However, the libraries differ in how much longer time. In GmSSL signing with RSA takes approximately double the time compared to signing with SM2, while in Botan it takes about 40 times longer. Verification with RSA takes almost double the time compared to verification with SM2 in Botan, while in GmSSL the RSA verification is slightly faster than that of SM2. The time taken for key generation, signing, and verification shows small differ- ences between SM2 and ECDSA both in Botan and GmSSL. SM2 key generation is a little bit slower that ECDSA key generation in Botan, while in GmSSL the reverse is seen. Signing with SM2 is slightly faster than signing with ECDSA in Botan, while in GmSSL the times are roughly the same. Verification times for SM2 and ECDSA are very similar in both Botan and GmSSL.

Figure 5.1: Digital signature real-time in Botan and GmSSL. 5.2. Algorithm Results 35

Figure 5.2 shows the CPU times for key generation, signing, and verification using ECDSA, SM2, and RSA. Similar relationships between the algorithms can be seen in this figure as in figure 5.1 showing real-time. The biggest difference between the figures is that the CPU time is higher than the real-time when signing with RSA. This is due to more than one CPU core being utilized during the RSA signing operation.

Figure 5.2: Digital signature CPU time in Botan and GmSSL.

Figure 5.3 shows the number of CPU cycles when generating a key, signing a message, and verifying a signature using ECDSA, SM2, and RSA. This graph looks similar to figure 5.2 since the measurements of CPU time and CPU cycles are directly related to each other.

Figure 5.3: Digital signature CPU cycles in Botan and GmSSL.

Figure 5.4 shows the RSS when using ECDSA, SM2, and RSA to generate a key, sign a message, and verify a signature. It can be seen that RSA in both Botan and GmSSL have a smaller RSS for all operations compared to SM2. The difference in RSS between SM2 and ECDSA, on the other hand, is smaller. 36 Chapter 5. Results

Figure 5.4: Digital signature RSS in Botan and GmSSL. 5.2. Algorithm Results 37 Table 5.10: Digital Signature results.

Library Operation Algorithm Mean Median Std. Deviation Min Max Real-time (ms) ECDSA 5.372 5.373 0.0706 5.259 5.869 Key Generation SM2 5.157 5.146 0.0901 5.050 5.668 RSA 226.628 219.105 48.692 88.070 351.619 ECDSA 2.782 2.779 0.0397 2.748 3.116 GmSSL Sign SM2 2.786 2.786 0.0169 2.751 2.856 RSA 5.795 5.798 0.0359 5.716 5.881 ECDSA 2.400 2.302 0.157 2.252 3.232 Verify SM2 2.389 2.297 0.139 2.246 2.900 RSA 2.321 2.225 0.134 2.184 2.771 ECDSA 0.124 0.121 0.00684 0.121 0.145 Key Generation SM2 0.174 0.171 0.00966 0.170 0.207 RSA 365.972 375.066 68.839 209.654 509.734 ECDSA 0.247 0.235 0.0167 0.234 0.307 Botan Sign SM2 0.186 0.178 0.0113 0.176 0.206 RSA 7.296 6.903 0.538 6.837 8.596 ECDSA 0.457 0.456 0.00383 0.451 0.479 Verify SM2 0.482 0.480 0.0108 0.476 0.579 RSA 0.931 0.929 0.0108 0.919 0.989 CPU time (ms) ECDSA 5.428 5.428 0.0711 5.312 5.929 Key Generation SM2 5.213 5.202 0.0911 5.104 5.728 RSA 226.472 218.569 48.644 88.092 351.641 ECDSA 2.808 2.804 0.0401 2.774 3.144 GmSSL Sign SM2 2.812 2.812 0.0170 2.777 2.884 RSA 5.820 5.823 0.0362 5.741 5.907 ECDSA 2.423 2.323 0.159 2.272 3.261 Verify SM2 2.412 2.319 0.140 2.269 2.928 RSA 2.344 2.245 0.135 2.204 2.798 ECDSA 0.124 0.121 0.00683 0.121 0.145 Key Generation SM2 0.174 0.171 0.00964 0.170 0.207 RSA 365.953 375.050 68.836 209.640 509.719 ECDSA 0.247 0.235 0.0166 0.234 0.307 Botan Sign SM2 0.186 0.178 0.0113 0.176 0.206 RSA 9.897 9.362 0.722 9.295 11.665 ECDSA 0.457 0.456 0.00383 0.451 0.479 Verify SM2 0.482 0.480 0.0108 0.476 0.579 RSA 0.931 0.929 0.0108 0.919 0.989 CPU cycles ECDSA 1.619e7 1.618e7 0.0105e7 1.599e7 1.664e7 Key Generation SM2 1.568e7 1.567e7 0.0107e7 1.547e7 1.606e7 RSA 7.689e8 7.358e8 1.753e8 2.793e8 12.258e8 ECDSA 8.095e6 8.083e6 0.0513e6 8.015e6 8.343e6 GmSSL Sign SM2 8.151e6 8.143e6 0.0495e6 8.068e6 8.307e6 RSA 1.705e7 1.704e7 0.00890e7 1.688e7 1.740e7 ECDSA 8.119e6 8.037e6 0.134e6 7.973e6 8.694e6 Verify SM2 8.089e6 8.008e6 0.122e6 7.937e6 8.488e6 RSA 7.847e6 7.775e6 0.111e6 7.717e6 8.159e6 ECDSA 4.733e5 4.716e5 0.0488e5 4.700e5 5.027e5 Key Generation SM2 6.657e5 6.645e5 0.0398e5 6.621e5 6.864e5 RSA 1.396e9 1.417e9 0.257e9 0.816e9 1.964e9 ECDSA 9.137e5 9.129e5 0.0358e5 9.079e5 9.325e5 Botan Sign SM2 6.891e5 6.884e5 0.0266e5 6.851e5 7.002e5 RSA 3.636e7 3.633e7 0.0276e7 3.616e7 3.867e7 ECDSA 1.531e6 1.530e6 0.00415e6 1.525e6 1.545e6 Verify SM2 1.639e6 1.637e6 0.00604e6 1.633e6 1.666e6 RSA 3.197e6 3.184e6 0.0281e6 3.173e6 3.289e6 RSS (kB) ECDSA 4871 4878 60 4728 4972 Key Generation SM2 4861 4857 52 4760 4988 RSA 4670 4660 83 4522 4880 ECDSA 4942 4940 61 4836 5064 GmSSL Sign SM2 4916 4912 70 4804 5064 RSA 4646 4648 106 4476 4828 ECDSA 4954 4932 69 4808 5076 Verify SM2 4946 4940 61 4844 5076 RSA 4709 4704 101 4554 4884 ECDSA 6279 6292 137 6064 6572 Key Generation SM2 6128 6148 113 5936 6352 RSA 5702 5676 78 5596 5900 ECDSA 7216 7202 100 7068 7444 Botan Sign SM2 7296 7338 91 7116 7504 RSA 6645 6620 92 6452 6824 ECDSA 7201 7200 85 7052 7368 Verify SM2 7180 7164 64 7064 7400 RSA 6521 6532 64 6408 6640 38 Chapter 5. Results

5.2.2 Hash Results The results including mean value, median, standard deviation, the minimum and maximum value of the hash algorithm tests are shown in table 5.11. Figure 5.5, 5.6, 5.7, and 5.8 visually represents the results in bar graphs. In figure 5.5 it can be seen that the real-time required to hash a 1GB file is greater for SM3 than SHA-256 in both Botan and OpenSSL. The difference between the algorithms is larger in OpenSSL than in Botan. In OpenSSL, the real-time taken to hash the file is almost double for SM3 compared to SHA-256, while Botan shows a smaller difference between the algorithms.

Figure 5.5: Hash algorithms real-time in Botan and OpenSSL.

Figure 5.6 shows similar differences between the algorithms as figure 5.5. This is because the real-time and CPU time values for mean and standard deviation differs by less than 1ms, as seen in table 5.11. Real-time and CPU time measurements are very similar due to the algorithm implementations only using one CPU when executing, and I/O operation wait time is minimal, due to the file being hashed residing on a ramdisk and efficient reading of the file. 5.2. Algorithm Results 39

Figure 5.6: Hash algorithms CPU time in Botan and OpenSSL.

Figure 5.7 displays the number of CPU cycles when hashing the file using SM3 and SHA-256. The same differences between the algorithms as seen figure 5.7 is apparent in this figure due to the CPU time and CPU cycles being directly related to each other. If the number of CPU cycles increases the CPU time will increase proportionally with it and vice versa.

Figure 5.7: Hash algorithms CPU cycles in Botan and OpenSSL.

Figure 5.8 shows the RSS for SM3 and SHA-256. There are minimal differences in the RSS for the algorithms in both libraries. In Botan, SM3 has a slightly higher RSS, while in OpenSSL SHA-256 has a slightly higher RSS. 40 Chapter 5. Results

Figure 5.8: Hash algorithms RSS in Botan and OpenSSL.

Table 5.11: Hash Functions result.

Library Algorithm Mean Median Std. Deviation Min Max Real-time (ms) SM3 4212.255 4203.665 38.509 4150.957 4380.519 OpenSSL SHA-256 2331.208 2326.336 22.379 2302.920 2429.163 SM3 4689.731 4656.395 102.521 4386.370 5207.648 Botan SHA-256 4047.862 4016.524 115.354 3985.705 4920.531 CPU time (ms) SM3 4212.057 4203.683 38.537 4150.824 4380.372 OpenSSL SHA-256 2331.040 2326.137 22.387 2302.760 2429.022 SM3 4689.613 4656.212 102.532 4386.356 5207.596 Botan SHA-256 4047.732 4016.440 115.363 3985.680 4920.464 CPU cycles SM3 1.467e10 1.465e10 0.00342e10 1.463e10 1.483e10 OpenSSL SHA-256 8.127e9 8.119e9 0.240e9 8.113e9 8.274e9 SM3 1.512e10 1.501e10 0.0276e10 1.495e10 1.633e10 Botan SHA-256 1.304e10 1.294e10 0.0286e10 1.289e10 1.433e10 RSS (kB) SM3 4183 4196 97 4028 4392 OpenSSL SHA-256 4194 4202 105 4028 4392 SM3 5137 5092 111 4996 5352 Botan SHA-256 5075 5076 105 4936 5352 5.2. Algorithm Results 41

5.2.3 Block Cipher results The results from all block cipher tests showed that SM4 is much slower than AES-128 and AES-128-NI regardless of operation or mode. The figures 5.9, 5.10, 5.11, and 5.12 presents real-time, CPU time, CPU cycles respective RSS when encrypting and decrypting with different block cipher modes using Botan and OpenSSL. A more detailed result including mean, median, standard derivation, min, and max value of every block cipher test can be found in table 5.12. All block cipher test runs on one CPU. Figure 5.9 shows that AES-128-NI is the fastest algorithm in Botan; this is due to that the AES-NI instruction has sped up the algorithm. AES-128 is the second fastest algorithm, and SM4 is the slowest algorithm. ECB mode and CTR mode do have similar behaviour, in both encryption and decryption using Botan, while CBC mode is slightly slower than ECB and CTR. Similar behaviour is seen in the library OpenSSL. AES-128-NI is the fastest, then AES-128, and lastly SM4. However, the ECB mode in encryption is still the fastest mode while CBC is the fastest in decryption. The result of ECB and CTR decryption do not differ that much, and both modes are slower than CBC decryption.

Figure 5.9: Block Ciphers real-time graphs in Botan and OpenSSL.

The result of the measurement CPU time is shown in figure 5.10. The graph of CPU time is slightly identical to the graph of real-time and the result of real-time and CPU time do have very minimal differences. More detailed information can be found in table 5.12. 42 Chapter 5. Results

Figure 5.10: Block Ciphers CPU time graphs in Botan and OpenSSL.

Figure 5.11 presents the number of CPU cycles using Botan and OpenSSL. The graphs for Botan and OpenSSL are similar to the real-time and CPU time graphs.

Figure 5.11: Block Ciphers CPU cycles graphs in Botan and OpenSSL.

Figure 5.12 shows the memory usage in encryption and decryption of a 1GB file in ECB mode using Botan and OpenSSL. The difference of RSS of AES-128, AES-128- NI, and SM4 in both libraries are minimal. 5.2. Algorithm Results 43

Figure 5.12: Block Ciphers RSS graphs in Botan and OpenSSL.

Table 5.12: Block cipher result.

Library Operation Mode Algorithm Mean Median Std. Derivation Min Max Real-time (ms) AES-128 3840.293 3815.107 98.836 3757.551 4425.497 ECB SM4 9189.681 9129.536 226.699 8992.929 10442.132 AES-128-NI 1001.395 985.698 52.661 873.248 1192.678 AES-128 3884.786 3842.651 145.228 3753.418 4598.503 Encryption CBC SM4 9707.531 9584.626 361.774 9410.227 11833.949 AES-128-NI 1665.069 1636.893 97.396 1597.773 2393.721 AES-128 3000.521 2969.095 172.695 2853.775 4294.005 CTR SM4 10435.955 10307.054 523.666 9994.136 13188.772 AES-128-NI 1095.423 1082.315 67.757 991.301 1502.523 OpenSSL AES-128 5575.290 5556.570 324.155 4790.649 6747.196 ECB SM4 11599.877 11536.973 490.970 10391.768 13943.292 AES-128-NI 1281.634 1293.193 124.098 963.663 1524.880 AES-128 3180.383 3299.707 659.942 2330.051 4524.501 Decryption CBC SM4 9759.177 9832.040 1536.648 7816.454 12053.873 AES-128-NI 1027.312 1100.179 272.714 659.023 1600.337 AES-128 3979.311 4116.269 444.362 3058.383 5422.020 CTR SM4 14022.738 1445.823 1060.775 11822.927 16403.879 AES-128-NI 1380.728 1435.111 234.734 941.576 1839.394 AES-128 5840.372 5940.882 330.747 4944.393 6293.555 ECB SM4 10821.179 10597.984 648.532 9845.301 13363.696 AES-128-NI 2350.346 2319.378 200.292 1295.993 2811.426 AES-128 6976.174 6945.075 162.215 6646.897 7597.640 Encryption CBC SM4 15208.109 15184.840 146.546 14800.377 15864.744 AES-128-NI 3944.324 3916.202 212.743 2831.929 4482.209 AES-128 6331.770 6278.439 179.226 6134.961 7276.994 CTR SM4 11640.988 11602.212 462.085 10286.824 12958.696 AES-128-NI 2792.004 2780.198 227.575 1610.168 3315.071 Botan AES-128 6838.400 6798.056 156.158 6636.745 7773.173 ECB SM4 10654.944 10549.198 370.213 10124.219 12258.366 AES-128-NI 2358.010 2312.992 147.955 2133.701 2780.059 AES-128 7317.342 7338.778 266.994 5035.849 7713.271 Decryption CBC SM4 11078.039 11067.491 195.546 10655.217 11737.221 AES-128-NI 2604.981 2527.851 230.848 2334.115 3178.142 AES-128 6757.957 6723.039 168.679 6376.157 7652.903 CTR SM4 11964.657 11879.217 630.478 10725.548 14301.022 AES-128-NI 2958.301 2923.417 299.849 2438.849 3838.463 CPU time (ms) AES-128 3839.941 3814.863 98.835 98.835 4425.066 ECB SM4 9179.317 9120.746 223.258 8992.416 10441.441 OpenSSL Encryption AES-128-NI 1001.159 985.467 52.657 873.070 1192.549 Continued on next page 44 Chapter 5. Results

Table 5.12 – continued from previous page Library Operation Mode Algorithm Mean Median Std. Derivation Min Max AES-128 3884.872 3842.533 145.353 3753.091 4598.869 CBC SM4 9708.060 9584.944 361.806 9409.711 11833.451 AES-128-NI 1664.991 1636.945 97.399 1597.511 2393.306 AES-128 3000.232 2963.811 172.680 2853.523 4293.619 CTR SM4 10435.645 10306.760 523.659 9993.886 13188.466 AES-128-NI 1095.195 1082.097 67.747 991.103 1502.228 AES-128 5564.724 5553.310 315.730 4790.262 6747.645 ECB SM4 11594.1940 11536.430 468.497 10391.181 23500.306 AES-128-NI 1281.360 1292.914 124.088 963.427 1524.640 AES-128 3180.202 3299.492 659.908 2329.934 4524.934 Decryption CBC SM4 9759.062 9831.925 1536.658 7816.372 12053.770 AES-128-NI 1027.105 1099.983 272.695 658.844 1600.113 AES-128 3978.997 4115.961 444.340 3058.089 5421.570 CTR SM4 14022.326 14458.427 1060.757 11822.537 16403.352 AES-128-NI 1380.441 1434.822 234.718 941.336 1839.083 AES-128 5840.031 5940.546 330.732 4944.114 6293.236 ECB SM4 10820.797 10597.626 648.508 9845.017 13363.208 AES-128-NI 2350.050 2319.092 200.285 1295.748 2811.153 AES-128 6970.878 6944.896 149.318 6646.809 7449.920 Encryption CBC SM4 15207.685 15184.435 146.517 14800.037 15864.298 AES-128-NI 3944.042 3915.895 212.741 2831.659 4481.918 AES-128 6326.447 6277.579 177.523 6134.690 7276.700 CTR SM4 11640.569 11601.558 462.041 10286.503 12957.861 AES-128-NI 2791.704 2779.979 227.569 1609.906 3314.773 Botan AES-128 6823.193 6794.652 144.519 6636.489 7779.417 ECB SM4 10645.554 10548.814 370.205 10123.836 12257.936 AES-128-NI 2357.719 2312.720 147.944 2133.444 2779.784 AES-128 7307.035 7332.389 261.922 5035.619 7712.933 Decryption CBC SM4 11078.039 11067.491 195.546 10655.217 11737.221 AES-128-NI 2604.981 2527.851 230.848 2334.115 3178.142 AES-128 6752.571 6721.168 164.722 6375.813 7652.570 CTR SM4 11958.612 11878.832 615.219 10725.184 14300.620 AES-128-NI 2958.008 2923.094 299.834 2438.680 3838.192 CPU cycles (ms) AES-128 1.155e10 1.150e10 0.0182e10 1.138e10 1.261e10 ECB SM4 3.069e10 3.065e10 0.0129e10 3.052e10 3.108e10 AES-128-NI 2.871e9 2.871e9 0.131e9 2.716e9 3.482e9 AES-128 1.189e10 1.183e10 0.0145e10 1.174e10 1.246e10 Encryption CBC SM4 3.268e10 3.264e10 0.0137e10 3.257e10 3.347e10 AES-128-NI 4.922e9 4.891e9 0.0782e9 4.840e9 5.199e9 AES-128 8.585e9 8.555e9 0.147e9 0.147e9 0.147e9 CTR SM4 3.338e10 3.334e10 0.0188e10 3.3160e10 3.435e10 AES-128-NI 2.954e9 2.933e9 0.898e8 2.815e9 3.344e9 OpenSSL AES-128 1.432e10 1.427e10 0.464e10 1.353e10 1.658e10 ECB SM4 3.127e10 3.125e10 0.3125e10 3.061e10 3.219e10 AES-128-NI 3.229e9 3.249e9 0.244e9 2.715e9 4.020e9 AES-128 0.985e10 0.100e10 0.0570e10 0.907e10 1.187e10 Decryption CBC SM4 3.126e10 3.143e10 0.0521e10 3.041e10 3.239e10 AES-128-NI 3.102e9 3.264e9 0.403e9 2.562e9 4.331e9 AES-128 0.934e10 0.936e10 0.0554e10 0.811e10 1.073e10 CTR SM4 3.405e10 3.411e10 0.0410e10 3.308e10 3.528e10 AES-128-NI 3.196e9 3.199e9 0.38e9 2.508e9 4.016e9 AES-128 1.511e10 1.505e10 0.0238e10 1.476e10 1.641e10 ECB SM4 2.443e10 2.418e10 0.0900e10 2.303e10 2.781e10 AES-128-NI 5.121e9 5.066e9 0.423e9 3.457e9 6.410e9 AES-128 1.738e10 1.729e10 0.046e10 1.665e10 1.919e10 Encryption CBC SM4 3.550e10 3.551e10 0.025e10 3.492e10 3.634e10 AES-128-NI 0.875e10 0.869e10 0.0505e10 0.727e10 1.0140e10 AES-128 1.610e10 1.597e10 0.049e10 1.563e10 1.912e10 CTR SM4 2.573e10 2.574e10 0.069e10 2.395e10 2.774e10 AES-128-NI 6.024e9 6.009e9 0.512e9 4.049e9 7.604e9 Botan AES-128 1.760e10 1.749e10 0.035e10 1.727e10 1.948e10 ECB SM4 2.420e10 2.412e10 0.067e10 2.286e10 2.604e10 AES-128-NI 5.184e9 5.067e9 0.399e9 4.629e9 6.418e9 AES-128 1.892e10 1.887e10 0.035e10 1.716e10 2.056e10 Decryption CBC SM4 2.598e10 2.592e10 0.056e10 2.479e10 2.793e10 AES-128-NI 5.916e9 5.688e9 0.648e9 5.185e9 7.553e9 CTR AES-128 1.655e10 1.649e10 0.032e10 1.600e10 1.751e10 Continued on next page 5.2. Algorithm Results 45

Table 5.12 – continued from previous page Library Operation Mode Algorithm Mean Median Std. Derivation Min Max SM4 2.604e10 2.597e10 0.073e10 2.436e10 2.929e10 AES-128-NI 6.306e9 6.150e9 0.630e9 5.316e9 7.910e9 RSS (kB) AES-128 4169 4172 88 4028 4336 Encryption ECB SM4 4910 4202 105 4032 4392 AES-128-NI 4194 4210 105 4032 4392 OpenSSL AES-128 4175 4172 89 4028 4336 Decryption ECB SM4 4221 4234 97 4032 4392 AES-128-NI 4199 4200 103 4028 4392 AES-128 5198 5196 112 4996 5440 Encryption ECB SM4 5191 5210 77 4980 5320 AES-128-NI 5159 5178 88 4976 5380 Botan AES-128 5225 5192 124 5016 5444 Decryption ECB SM4 5180 5192 85 4980 5324 AES-128-NI 5161 5144 84 4976 5328

5.2.4 Relative Differences Between the Algorithms Table 5.13 shows the real-time percentage change when replacing the standard algo- rithms with the corresponding Chinese cryptographic algorithm. Percentage differ- ences of CPU time, CPU cycles are not calculated, since the result of CPU time and CPU cycles show similar relative difference between algorithms, which result in sim- ilar percentage change. The percentage changes are defined in three scales marked as colours, red, yellow, and green. The following list describes the colour and its meaning:

• Red: A percentage change higher than +20%. Fields marked with red means that replacement of the standard algorithm will impact the performance neg- atively.

• Yellow: The percentage change is between -20% — +20%. Fields marked with yellow means that the replacement with SM algorithms will cause a relatively small performance impact.

• Green: A percentage change lower than -20%. Fields marked with green means that the replacement of the standard algorithm will improve the performance. 46 Chapter 5. Results

Table 5.13: Table of the percentage change of each algorithm. 5.3. Distribution Analysis Results 47

5.3 Distribution Analysis Results The distribution analysis resulted in 196 histograms and 196 QQ plots. All graphs were studied, and four distinct types of distribution graphs were found. Figure 5.13, 5.14, 5.15, and 5.16 shows the histograms and QQ plots of the found distribution types. Type 1 is normal distributed since the histogram has a normal distributed-shape. Another characteristic that indicates that type 1 is normal distribution is that the points in the QQ plot are very close over the diagonal line, which represent the ideal normal distribution. This, can be seen in figure 5.13.

(a) Histogram of type 1. (b) QQ plot of type 1. Figure 5.13: Distribution type 1.

The distribution in figure 5.14 resembles a right-skewed distribution since it has a large number of data samples on the left side of the graph and fewer number of data samples on the right side. For data having a right-skewed distribution, the mean value is generally greater than the median. This behaviour can be seen for the hash algorithms (real-time, CPU time, and CPU cycles) in table 5.11 which all had a distribution belonging to type 2.

(a) Histogram of type 2. (b) QQ plot of type 2. Figure 5.14: Distribution type 2.

Distributions belonging to type 3 all have two or three peaks, as seen in figure 5.15. In each cluster of data values, the distribution resembles that of type 2, with a quite 48 Chapter 5. Results

flat line in the QQ plot. This is typically evidence of a multimodal distribution.

(a) Histogram of type 3. (b) QQ plot of type 3. Figure 5.15: Distribution type 3.

Type 4 plots make it difficult to determine the underlying distribution because it has several peaks, and does not have any apparent patterns, as seen in figure 5.16. There are many flat lines in the QQ plot and points are not continuous. The majority of RSS data samples belongs to type 4, which can be seen in table 5.14.

(a) Histogram of type 4. (b) QQ plot of type 4. Figure 5.16: Distribution type 4. 5.3. Distribution Analysis Results 49

Table 5.14 presents the results from the distribution analysis. The table shows which type each data sample belongs to. The metric column indicates which metric the data sample measured, where rt is an abbreviation for real-time, ct is an abbreviation for CPU time, and cc is an abbreviation for CPU cycles. Table 5.14: Distribution types of data samples.

Type Algorithm Library Operation Mode Metric 1 2 3 4 Digital Signature Algorithms ECDSA Botan Key - rt,ct,cc X RSA Botan Key - rt,ct,cc X ECDSA, SM2, RSA Botan Sign - cc X ECDSA, SM2, RSA Botan Sign - rt,cp X ECDSA, SM2, RSA Botan Verify - rt,ct,cc X ECDSA, SM2, RSA Botan Key, Sign, Verify - rss X SM2 GmSSL, Botan Key - rt,ct,cc X ECDSA, SM2, RSA GmSSL Key - rss X RSA, ECDSA GmSSL Key - rt,ct,cc X SM2, RSA GmSSL Sign - rt,ct,cc X ECDSA GmSSL Sign - cc X ECDSA, SM2, RSA GmSSL Verify - rt,ct,cc X ECDSA, SM2, RSA GmSSL Sign, Verify - rss X ECDSA GmSSL Key, Sign, Verify - rt,ct X Hash Algorithms SM3, SHA-256 Botan, OpenSSL - - rt,ct,cc X SM3, SHA-256 Botan, OpenSSL - - rss X Block Cipher Algorithms SM4 Botan Encryption CBC rt,ct X AES-128-NI, SM4 Botan Encryption CTR rt,ct,cc X AES-128 Botan Encryption CTR rt,ct,cc X SM4 Botan Encryption CBC cc X AES-128, AES-128-NI Botan Decryption CBC rt,ct,cc X SM4 Botan Decryption CBC rt,ct,cc X AES-128, AES-128-NI, SM4 Botan Decryption CTR rt,ct,cc X SM4 OpenSSL Encryption CBC rt,ct,cc X AES-128, AES-128-NI, SM4 OpenSSL Encryption CTR rt,ct,cc X AES-128, AES-128-NI, SM4 OpenSSL Decryption CTR cc X AES-128, AES-128-NI, SM4 OpenSSL Decryption CBC rt,ct,cc X AES-128, AES-128-NI, SM4 OpenSSL Decryption CTR rt,ct X AES-128, AES-128-NI Botan, OpenSSL Encryption CBC rt,ct,cc X AES-128, AES-128-NI, SM4 Botan, OpenSSL Encryption, Decryption ECB rt,ct,cc X AES-128, AES-128-NI, SM4 Botan, OpenSSL Encryption, Decryption ECB rss X 50 Chapter 5. Results

5.4 Mann-Whitney U Test Results The distribution analysis presented in section 5.3 showed that the data samples had several different distributions. Due to all data not being normally distributed and significant deviations in the distributions the nonparametric Mann-Whitney U test was used to compare the independent data samples and test the null hypothesis. The results of the Mann-Whitney U tests are summarized in table 5.15. When comparing real-time, CPU time, and CPU cycles, the null hypothesis was rejected for all samples. The null hypothesis was also rejected for the RSS samples when comparing RSA with SM2. When comparing RSS samples for ECDSA and SM2, hash functions, and block ciphers the null hypothesis was only rejected for half of the cases.

Table 5.15: A summarization of the Mann-Whitney U test results.

Algorithm 1 Algorithm 2 Library Operations Modes Result Real-time, CPU time, and CPU cycles samples SM2 RSA Botan, GmSSL Key, Sign, Verify - H0 rejected SM2 ECDSA Botan, GmSSL Key, Sign, Verify - H0 rejected SM3 SHA-256 Botan, OpenSSL - - H0 rejected SM4 AES-128 Botan, OpenSSL Encryption, Decryption ECB, CBC, CTR H0 rejected SM4 AES-128-NI Botan, OpenSSL Encryption, Decryption ECB, CBC, CTR H0 rejected RSS samples SM2 RSA Botan, GmSSL Key, Sign, Verify - Reject H0 SM2 ECDSA GmSSL Key - Fail to reject H0 SM2 ECDSA Botan Key - Reject H0 SM2 ECDSA Botan, GmSSL Sign - Reject H0 SM2 ECDSA Botan, GmSSL Verify - Fail to reject H0 SM3 SHA-256 OpenSSL - - Fail to reject H0 SM3 SHA-256 Botan - - Reject H0 SM4 AES-128 Botan, OpenSSL Encryption ECB Fail to reject H0 SM4 AES-128 Botan, OpenSSL Decryption ECB Reject H0 SM4 AES-128-NI OpenSSL Encryption, Decryption ECB Fail to reject H0 SM4 AES-128-NI Botan Encryption, Decryption ECB Reject H0

The details of the results from the Mann-Whitney U tests when comparing all algo- rithms can be seen in appendix B. Chapter 6 Analysis and Discussion

In this chapter, the outcome of the theoretical analysis and the experiment will be analyzed and discussed. The first and second section presents a discussion of how reliable the results from the thesis are. The third section includes a discussion of the types found in the distribution analysis. The fourth section presents an analysis and a discussion of the performance of digital signature algorithms, hash functions, and block ciphers. Lastly, the fifth section presents a discussion about memory utilization of the algorithms.

6.1 Overall Performance Impact The overall performance impact on a system running the standard cryptographic algorithms and replacing these algorithms with SM2, SM3, and SM3 will depend on the amount of cryptographic operations performed on the system in question. The total amount of time that the system spends hashing, encrypting, decrypting, generating keys, signing, and verifying can be measured, and then the relative per- formance increase or decrease can be calculated using the results from this thesis. It is, however, important to note that the algorithm performance will always depend on its implementation. A poor algorithm implementation could shift the algorithm per- formance relationships drastically. In this thesis two algorithm implementations were tested, those in Botan and those in OpenSSL/GmSSL. Similar results were obtained from both libraries, which indicates that it is likely that other well-established cryp- tographic libraries would show similar differences between the algorithms as shown in this thesis report. The outcome of the result is to some degree generalizable. Performing the test on a system with x86 architecture will probably not show the exact numbers as presented in this thesis. However, the relative difference between algorithms will most of the time be similar to the result shown in this thesis regardless of operating system or specifications.

6.2 File size The experiment in this thesis measured the time it takes to encrypt, decrypt, and hash a 1GB file. The studies [59] and [60] have measured the time it takes to encrypt and decrypt differently sized files using several encryption algorithms, such as AES, DES, Blowfish, and RSA. The results from both studies show that similar relative

51 52 Chapter 6. Analysis and Discussion time differences between the algorithms are apparent for all tested file sizes bigger than 2MB. For files smaller than 2MB the differences between the algorithms are minimal, and the relative difference between the algorithms diverse from the relative differences seen for files with a size over 2MB. A possible reason is that it may take some time for an algorithm to reach its average throughput, i.e. bits processed per second. Therefore, it is likely that our results regarding relative time performance differ- ences, when encrypting, decrypting, and hashing files are generalizable to other file sizes as well, except for very small files.

6.3 Distribution Analysis The distribution analysis resulted in four different types of distributions. Distributions belonging to type 3 all had 2 or 3 peaks visible in the histogram. By studying the sample data and corresponding CPU frequency, a possible reason was found for the peaks. The reason is that the CPU was running using different frequencies during the tests. For example, the CPU was running on a lower frequency for the first half of the test, and on a higher frequency for the second half of the test. This behaviour results in two peaks in the histogram. However, other than checking some of the data samples, this assertion has not been verified. Most of the RSS data samples had a distribution of type 4. Type 4 displays several peaks and the QQ plot points are not continuous. This distribution may stem from that the RSS samples are not continuous. They will always be a multiple of the page size, which is 4096 bytes. Since the reason behind the distribution analysis were to determine if the sample data was normally distributed or not and based on this choose a statistical test no further investigations on the distributions have been done.

6.4 Performance

6.4.1 Digital Signature Algorithms RSA & SM2 The experiment results showed that RSA key generation and signing is slower than SM2 key generation and signing. This is consistent with the conclusions drawn during the theoretical comparison of the algorithms in section 5.1.1. The key generation of RSA is substantially slower than SM2 because of the op- erations required to generate a key is more complex. RSA key generation requires two large prime integers to be generated. This results in very dispersed time values since several integers may have to go through the primality tests before a integer passes and is considered a prime. Working with large integers also adds to the time taken for RSA key generation. SM2’s key generation is much faster since the only operations required is one and one point multiplication. The signing with RSA was shown to be slower than the SM2 signing. These results correspond to the results in other studies [53, 54] comparing RSA with ECDSA. 6.4. Performance 53

These studies also found that the verification speed of RSA is faster than ECDSA. The results from the experiment in this report showed that RSA verification is slower than SM2 verification in Botan and faster in GmSSL.

ECDSA & SM2 According to the theoretical comparison of SM2 and ECDSA, the steps required to generate a key pair are identical. Therefore, in theory, there should be no difference between the key generation times of the algorithms. However, the experiment results showed that SM2 key generation took around 40% longer than ECDSA key generation in Botan. When studying the source code of Botan, it was discovered that this time difference is likely due to that another value, which is used later during signing, is calculated during key generation for SM2. This is the only difference from the ECDSA key generation implementation. This implementation detail may also be the reason why SM2 signing is faster than ECDSA signing in Botan. The difference in key generation time in GmSSL is much smaller, SM2 is around 4% faster than ECDSA. This difference is likely due to implementations differences in console commands used to invoke the key generation for ECDSA and SM2. During signing and verification, the time differences between SM2 and ECDSA are relatively small in both libraries. Since the experiment results showed small differences between the speed of ECDSA and SM2 signing and verification, it is reasonable to assume that the performance differences will mostly depend on the implementation. No conclusions can be drawn whether the SM2 sign and verify operations are generally faster or slower than that of ECDSA since the two libraries are showing different results.

Signing and verifying large files As a part of the algorithm for signing and verifying a message, the message will be hashed in both ECDSA and SM2. RSA can only sign integers in the range [0, n-1] where n is the key length. To improve security and allow larger messages to be signed, a message representative, usually a hash, is signed instead [61]. For large files, the time required to hash the file will be substantially larger than the time it takes to actually do the signing or verifying. The relative difference between ECDSA and SM2 or between RSA and SM2 will then go towards the difference seen between SM3 and SHA-256 (provided that those are the hash algorithms used).

Validity of results in GmSSL Due to limitations using the GmSSL CLI the signing and verification tests in GmSSL includes reading the key files and "restarting" GmSSL several times to collect one data value. This makes the collected data for signing and verification in GmSSL less accurate, since the measured time for signing and verifying includes time from reading key files and restarting GmSSL. Since the signing and verification operations are relatively quick, compared to for example hashing a 1GB file, the reading of key files and restarting GmSSL will affect the sampled data more since it will make up a larger percentage of the total time measured. 54 Chapter 6. Analysis and Discussion

6.4.2 Hash Algorithms The theoretical analysis of SHA-256 and SM3 showed the design structure of both algorithms are almost identical, except for the number of assignments in the compres- sion function. The experiment result showed that SM3 is 81% slower than SHA-256 in OpenSSL, while SM3 is 16% slower than SHA-256 in Botan. As mentioned in sec- tion 5.1.2 Design Comparison of SM3 and SHA-256 SM3 has 128 more assignment operations compared to SHA-256, which means that SM3 have 128 move instructions than SHA-256. This characteristic may not affect when hashing small files, but for larger files, it may have an impact on the performance, since there are 128 more move instructions per 64 bytes. In addition, SHA-256 is a more well-known hashing algorithm, and OpenSSL has been supporting SHA-256 since version OpenSSL 0.9.8o relesed in 2010 [62]. OpenSSL recently added the support of SM3 in version OpenSSL 1.1.1b released in 2018 [63]. Therefore, OpenSSL may have optimized SHA-256 more than SM3.

6.4.3 Block Cipher Algorithms AES is, in general, faster than SM4. The result from literature analysis showed that AES has better parallelization ability since each column, row or byte in the operations shiftRow, mixColumns, and addRoundkey can be operated independently. Algorithms based on SPN structure have a built-in parallelism capability [58], which means that AES will process faster than SM4 in both encryption and decryption. The result from the experiment proved the conclusion from literature analysis. AES- 128 and AES-128-NI were much faster than SM4 regardless of operation, mode, or library. However, the observed performance differences do not depend on the parallelism ability, because, all block cipher test ran on only one core. Also, because the block cipher tests were executed on a processor with x86 architecture, this may have worked towards AES advantage. AES may have been able to use registers in a more optimized way since the round of AES are 128 bits while round keys of SM4 are 32 bits.

6.5 Memory

For all memory samples, it can be seen that the RSS is always a multiple of 4096 bytes. This is because computer memory is divided into pages. This is the smallest unit of data for memory. On the system where the experiment took place, the page size is 4096 bytes; therefore the RSS will always be a multiple of 4096 bytes. The experiment showed that for most algorithms the memory difference was small and often varied between libraries, e.g. in Botan SM3 has a higher RSS, and in OpenSSL, SHA-256 has the higher RSS. One exception is the RSS when running RSA. RSA consistently have lower RSS than ECDSA and SM2 in both Botan and GmSSL. The Mann-Whitney U test results display a low p-value (p <9e-31) when comparing SM2 with RSA. When comparing RSS samples between the hash, block cipher algorithms, and between SM2 and ECDSA, the p-value is higher. The p- value ranges from 1.15e-12 to 0.977. Half of these Mann-Whitney U tests between 6.5. Memory 55

RSS samples reject the null hypothesis while the other half fails to reject the null hypothesis. The conclusion drawn from this is that almost all algorithms, except RSA, have a very similar RSS to its algorithm counterpart. Since the p-value is relatively high and the null hypothesis is often rejected in one library but not the other, no conclusions can be drawn whether one algorithm generally has a lower RSS. It is possible that the differences observed are not statistically significant.

Chapter 7 Conclusions and Future Work

The following chapter presents the conclusions of this thesis and future work. The conslusions described below are grouped by research questions.

7.1 Conclusion

RQ 1: What are the design similarities and differences between RSA, ECDSA, SHA-256, AES-128 with the corresponding Chinese cryptographic algorithms: SM2, SM3, and SM4? ECDSA and SM2 have many design similarities. They are based on the Elliptic Curve Discrete Logarithm Problem (ECDLP). Both algorithms have a private key which is a random integer, and a corresponding public key which is a point over a defined Galois field. The only difference between SM2 and ECDSA is the curve the algorithms are using, as well as minor differences in the way the algorithms generates and verifies signatures. The design of RSA and SM2 have fundamental differences. RSA is based on Integer factorization problem, while SM2 is based on ECDLP. Both the public and private key of RSA consist of a pair of positive integers. The recommended key size of an RSA key pair is 2048 bits, while the key size of SM2 is 256 bit for the private key and 512 bit for the public key. SHA-256 and SM3 have many similarities. They both have a Merkle-Damgård structure and use a compression function based on Davies-Meyer. Both algorithms take an input between 0 to 264 bits and outputs a 256-bit hash value. However, the algorithms are not identical. The difference lies in the compression function of the algorithms, e.g. SM3 uses a ROTL operation in one step, while an ADD operation is used in SHA-256. The number of assignments also differs between SM3 and SHA-256. SM3 has 768 assignments while SHA-256 has 640 assignments. Another difference is that SHA-256 uses 64 constants in the hash computation stage, and SM3 only has 2 constants to work with. Both AES-128 and SM4 are block ciphers with block size and key length of 128 bits. AES has a Substitution-Permutation Network (SPN) structure, and SM4 has an Unbalanced Feistel network structure. AES-128 and SM4 have the operations XOR, S-box lookup, and cyclic shifts in common. In addition, AES-128 also has modular multiplications compared to SM4. The number of S-box lookups also dif- fers between the algorithm. AES-128 has 160 number of S-box lookups, and SM4 have 128 number of S-box lookups. Decryption in AES-128 has to apply the inverse of all operation, resulting in different algorithms for encryption and decryption. While

57 58 Chapter 7. Conclusions and Future Work

SM4 has identical encryption and decryption algorithms.

RQ 1.1: What conclusions can be drawn about the expected performance of the algorithms based on the outcome from RQ1? ECC key generation is most likely faster than RSA key generation. The operations of ECC key generation are simple and fast to calculate, and RSA key generation will be slower since RSA works with larger integers and have to generate large prime integers. It is difficult to draw a conclusion about which algorithm will be most memory efficient, since RSA’s key size is larger than ECDSA and SM2, which requires more memory to store the keys. On the other side, SM2 has more steps to complete when generating and verifying the signature, which leads to longer program code than RSA. Since the hashing algorithms SM3 and SHA-256 are almost identical; it is diffi- cult to draw any conclusion about which algorithm is the most time efficient. For example, the logic operations used in the hashing algorithm such as RORL and ROTR differ approximately 0.25 clock cycles per instruction. Modern Intel proces- sor process instructions are so fast [64] that even if there are differences between the logic operations, they will still not have a big overall impact on the hash algorithm speed. However, as mentioned before, SM3 has 128 assignments compared to SHA- 256, which results in 128 more move instructions. When hashing small files, 128 additional instructions will probably not have any significant impact on the speed. However, when hashing larger files, such as 1GB, the impact may be noticeable. Due to the design and properties of SHA-256 and SM3 being very similar, it is likely that the speed and memory utilization between the algorithms will be minimal. For block ciphers, AES-128 is probably faster than SM4 because AES-128 is based on SPN and SM4 is based on UFN. SPN has more built-in parallelism, which means that operations during transformations can be executed at the same time. Both algorithms have S-box lookups where, the operations are not dependant on other values, and can, therefore, be parallelized. AES-128 encrypts a full message block in ten 128-bits rounds, and SM4 encrypts a message block in 32 rounds working on 32-bit block chunks. This results in that SM4 is likely slower than AES-128 since more operations are required to encrypt a block. AES-128 has in total two S-boxes, one for encryption and one for decryption. SM4 uses the same S-box for both encryp- tion and decryption. Due to the facts that AES-128 has separated encryption and decryption algorithms, and two S-boxes, AES-128 is likely to consume more memory than SM4.

RQ 2: What is the performance impact, if any, in terms of key generation-, signing, and verification time, CPU clock cycles, and memory utilization, when replacing RSA or ECDSA with SM2? As seen in table 5.13 when replacing RSA with SM2 in GmSSL and Botan a 97% – 99% increase in speed performance during key generation can be expected. For signing, the speed performance is increased by 50% – 97% when replacing RSA with SM2. The performance impact varies when verifying signatures depending on the library when replacing RSA with SM2. In GmSSL, the performance decreases with approximately 3%. While in Botan, the performance increases with 48%. Generally, the memory consumption decreases with approximately 4% when replacing RSA with 7.2. Future work 59

SM2. The percentage change differs between the libraries for the replacement of ECDSA with SM2, as seen in table 5.13. In GmSSL, the replacement does not have a big impact; the performance is increased with approximately 4% or decreased by 0,10% depending on the operation. While in Botan, the speed performance decreases by 40% during key generation. When signing a signature, the speed performance in- creases by 25% in Botan, and the speed performance increases by 5% when verifying a signature. The memory consumption between ECDSA and SM2 is very minimal and is negligible.

RQ 3: What is the performance impact, if any, in terms of the speed of the hash calculation, CPU clock cycles, and memory utilization, when replacing SHA-256 with SM3? When replacing SHA-256 with SM3 a 16% – 80% performance loss in hashing time can be expected, as seen in table 5.13. The memory consumption in terms of RSS will not be significantly impacted.

RQ 4: What is the performance impact, if any, in terms of encryption- and decryption time, CPU clock cycles, and memory utilization, when replacing AES-128 with SM4 using block cipher modes ECB, CBC, and CTR? Replacement of either AES-128 or AES-128-NI with SM4 shows a significant per- formance loss in encryption and decryption time, regardless of mode or library, as seen in table 5.13. Depending on the mode and library the speed performance will decrease from 51% – 252% when replacing AES-128 with SM4, and when replacing AES-128-NI with SM4 the speed performance will decrease by 286% – 916 %. Mem- ory utilization of AES-128, AES-128-NI, and SM4 is negligible because the differences between the algorithms are minimal.

7.2 Future work This thesis provides an overview of the replacement of a standard cryptography algorithm with a corresponding Chinese cryptographic algorithm. However, more studies are necessary to get a more detailed view of the replacement impact. This thesis only estimates the performance through the command line of OpenSS- L/GmSSL. The Botan cryptography library is linked to a C++ program to measure the performance in Botan. It would be interesting to investigate the performance using the same method for OpenSSL/GmSSL and Botan. In addition, to get a more comprehensive and reliable result, comparing more than two cryptography libraries may be of interest in the future. A way of continuing this thesis could be to study if the performance change is the same when replacing the algorithms on an established system versus in the constructed experiment environment. This could increase the validity of the results in this master thesis if the same performance change is observed. Another area for future work that would be interesting is to study the perfor- mance of SM4 on another architecture, such as a mobile device, and compare it with 60 Chapter 7. Conclusions and Future Work

AES. Since SM4 divides the 128-bit block into four 32-bit parts, then making most calculations using 32-bit values, it is possible that SM4 would be more competitive compared to AES on an architecture with smaller registers. References

[1] Morris Dworkin. Recommendation for block cipher modes of operation: Methods and techniques. Technical report, National Institute of Standards and Tech- nology, December 2001. https://nvlpubs.nist.gov/nistpubs/Legacy/SP/ nistspecialpublication800-38a.pdf, [Accessed April 29 2019].

[2] National Institute of Standards and Technology. Advanced encryption stan- dard (AES). https://www.nist.gov/publications/advanced-encryption- standard-aes, November 2001. [Accessed 12 February 2019].

[3] Joan Daemen and Vincent Jijmen. The Design of Rijndael. Springer, 2001.

[4] Cryptography Standardization Technical Committee. Public key cryp- tographic algorithm SM2 based on elliptic curves Part 2: Digi- tal signature algorithm. http://www.gmbz.org.cn/upload/2018-07-24/ 1532401673138056311.pdf, 2012. [Accessed May 20, 2019].

[5] Ye Hu, Liji Wu, An Wang, and Beibei Wang. Hardware design and implemen- tation of SM3 hash algorithm for financial IC card. In 2014 Tenth International Conference on Computational Intelligence and Security. IEEE, November 2014.

[6] Elaine Barker. Recommendation for key management part 1: General. Technical report, January 2016.

[7] FocusEconomics. China economy overview. https://www.focus- economics.com/countries/china, Retrieved: 8 February 2019.

[8] Kevin Wei Wang, Jonathan Woetzel, Jeongmin Seong, James Manyika, Michael Chui, and Wendy Wong. Digital china: Powering the economy to global compet- itiveness. https://www.mckinsey.com/featured-insights/china/digital- china-powering-the-economy-to-global-competitiveness, McKinsey Global Institute, December 2017. [Accessed February 8, 2019].

[9] Freshfields Bruckhaus Deringer. China’s rules on encryption: what for- eign companies need to know. https://www.freshfields.com/en-gb/our- thinking/campaigns/digital/data/china-rules-on-encryption/, Retrieved: 8 May 2019.

[10] Ronald Tse and Wai Wong. Oscca extensions for openPGP. https://tools. ietf.org/html/draft-openpgp-oscca-00, August 2017.

61 62 References

[11] Kim Zetter. How a Crypto ’Backdoor’ Pitted the Tech World Against the NSA. https://www.wired.com/2013/09/nsa-backdoor/, Septermber 2013. [Accessed May 22, 2019].

[12] NIST. Nist’s Encryption Standard Has Minimum $250 Billion Eco- nomic Benefit, According to New Study. https://www.nist.gov/news- events/news/2018/09/nists-encryption-standard-has-minimum-250- billion-economic-benefit, April 2018. [Accessed April 29, 2019].

[13] Jeffrey Rott. Intel® Advanced Encryption Standard Instructions (AES- NI). https://software.intel.com/en-us/articles/intel-advanced- encryption-standard-instructions-aes-ni, February 2012. [Accessed April 29, 2019].

[14] Jscrambler. Hashing algorithms. https://blog.jscrambler.com/hashing- algorithms/, October 2016. [Accessed June 12, 2019].

[15] Liantao Bai, Yuegong Zhang, and Guoqiang Yang. SM2 cryptographic algo- rithm based on discrete logarithm problem and prospect. In 2012 2nd Inter- national Conference on Consumer Electronics, Communications and Networks (CECNet). IEEE, April 2012.

[16] Dengguo Feng, Yu Qin, Wei Feng, and Jianxiong Shao. The theory and practice in the evolution of trusted computing. Chinese Science Bulletin, 59(32):4173– 4189, August 2014.

[17] Zhenwei Zhao and Guoqiang Bai. Exploring the speed limit of SM2. In 2014 IEEE 3rd International Conference on Cloud Computing and Intelligence Sys- tems. IEEE, November 2014.

[18] Zhenwei Zhao and Guoqiang Bai. Ultra high-speed SM2 ASIC implementation. In 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications. IEEE, September 2014.

[19] Florian Mendel, Tomislav Nad, and Martin Schläffer. Finding collisions for round-reduced SM3. In Topics in Cryptology – CT-RSA 2013, pages 174–188. Springer Berlin Heidelberg, 2013.

[20] Tianyong Ao, Zhangqing He, Jinli Rao, Kui Dai, and Xuecheng Zou. A compact hardware implementation of SM3 hash function. In 2014 IEEE 13th Interna- tional Conference on Trust, Security and Privacy in Computing and Communi- cations. IEEE, September 2014.

[21] Yuan Ma, Luning Xia, Jingqiang Lin, Jiwu Jing, Zongbin Liu, and Xingjie Yu. Hardware performance optimization and evaluation of SM3 hash algorithm on FPGA. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 7618 LNCS:105– 118, 2012. References 63

[22] Fen Liu, Wen Ji, Lei Hu, Jintai Ding, Shuwang Lv, Andrei Pyshkin, and Ralf- Philipp Weinmann. Analysis of the SMS4 block cipher. In Information Security and Privacy, pages 158–170. Springer Berlin Heidelberg.

[23] Hai Cheng and Qun Ding. Overview of the block cipher. In 2012 Second Inter- national Conference on Instrumentation, Measurement, Computer, Communi- cation and Control. IEEE, December 2012.

[24] Wen Ji, Lei Hu, and Haiwen Ou. Algebraic attack to SMS4 and the comparison with AES. In 2009 Fifth International Conference on Information Assurance and Security. IEEE, 2009.

[25] Li Li, Feng Yang, Yaoming Pan, Weihua Mao, and Cuijie Liu. An implemen- tation method for SM4-GCM on FPGA. Proceedings of 2017 IEEE 2nd Ad- vanced Information Technology, Electronic and Automation Control Conference, IAEAC 2017, pages 1921–1925, 2017.

[26] Akash Kumar Mandal, Chandra Parakash, and Archana Tiwari. Performance evaluation of cryptographic algorithms: DES and AES. 2012 IEEE Students’ Conference on Electrical, Electronics and Computer Science, pages 1–5, March 2012.

[27] Bharathi Balasubramanian, G.Manivasagam, and M.Anand Kumar. Metrics for performance evaluation of encryption algorithms. International Journal of Advance Research in Science and Engineering, 6, 2017.

[28] Jaime Raigoza and Kapil Jituri. Evaluating performance of symmetric encryp- tion algorithms. Proceedings - 2016 International Conference on Computational Science and Computational Intelligence, CSCI 2016, pages 1378 – 1379, 2016.

[29] Covington. China releases draft encryption law for public com- ment. https://www.cov.com/en/news-and-insights/insights/2017/05/ china-releases-draft-encryption-law-for-public-comment, May 2017. [Accessed February 8, 2019].

[30] Di Li and Yan Liu. Introduction to the commercial cryptography scheme in china. https://icmconference.org/wp-content/uploads/C03b-Li.pptx. pdf, May 2016. [Accessed February 8, 2019]. [31] Adam Segal. China, encryption policy, and international influence. Hoover Working Group on National Security, Technology, and Law, Aegis Series Paper No. 1610, November 2016. Available at https://lawfareblog.com/china- encryption-policy-and-international-influence. [32] BusinessWire. Ribose contributes implementations of chinese crypto- graphic algorithms to . https://www.businesswire.com/news/home/ 20180913005432/en/Ribose-Contributes-Implementations-Chinese- Cryptographic-Algorithms-OpenSSL, September 2018. [Accessed February 8, 2019]. 64 References

[33] Bert-Jaap Koops. People’s Republic of China, Crypto Law Survey. http: //www.cryptolaw.org/cls2.htm, February 2013. [Accessed February 8, 2019]. [34] Hogan Lovells Publications. Decrypting china’s first crack at a cryptogra- phy law. https://www.hoganlovells.com/en/publications/decrypting- chinas-first-crack-at-a-cryptography-law, May 2017. [Accessed Febru- ary 9, 2019].

[35] Mingli Shi. What china’s 2018 internet governance tells us about what’s next. https://www.newamerica.org/cybersecurity-initiative/digichina/ blog/what-chinas-2018-internet-governance-tells-us-about-whats- next/, January 2019. [Accessed February 9, 2019]. [36] Yan Luo. China revises proposals on regulation of commercial en- cryption. https://www.insideprivacy.com/international/china/china- revises-proposals-on-regulation-of-commercial-encryption/, October 2017. [Accessed February 9, 2019].

[37] Burt Kaliski. Symmetric Cryptosystem, pages 1271–1271. Springer US, Boston, MA, 2011.

[38] Burt Kaliski. Asymmetric Cryptosystem, pages 49–50. Springer US, Boston, MA, 2011.

[39] C. E. Shannon. Communication theory of secrecy systems. Bell System Technical Journal, 28(4):656–715, October 1949.

[40] Mohsen Bafandehkar, Sharifah Md Yasin, Ramlan Mahmod, and Zurina Mohd Hanapi. Comparison of ECC and RSA algorithm in resource constrained devices. In 2013 International Conference on IT Convergence and Security (ICITCS). IEEE, December 2013.

[41] Andrew D. Fernandes. Elliptic-curve cryptography. http://www.drdobbs.com/ security/elliptic-curve-cryptography/184411133, December 1999. [Ac- cessed 20 May 2019].

[42] Filippo Valsorda. The ecb penguin. https://blog.filippo.io/the-ecb- penguin/, November 2013. [Accessed 20 May 2019]. [43] Claes Wohlin. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering - EASE '14. ACM Press, 2014. https://doi.org/10.1145/2601248.2601268. [44] S. Keshav. How to read a paper. Singapore Medical Journal, 46:108–115, 2005. DOI: 10.1145/1273445.1273458.

[45] PERF(1) perf Manual. http://man7.org/linux/man-pages/man1/perf.1. html, September 2017. [Accessed April 29, 2019]. References 65

[46] PERF-STAT(1) perf Manual. http://man7.org/linux/man-pages/man1/ perf-stat.1.html, March 2019. [Accessed April 29, 2019].

[47] Mark Mitchell and Alex Samuel. Advanced Linux Programming. New Riders Publishing, Thousand Oaks, CA, USA, 2001.

[48] TIME(1) Linux User’s Manual. http://man7.org/linux/man-pages/man1/ time.1.html, March 2019. [Accessed April 29, 2019].

[49] TIME(1) General Commands Manual. https://manpages.debian.org/ jessie/time/time.1.en.html, March 2014. [Accessed April 29, 2019].

[50] Louise Bergman Martinkauppi and Qiuping He. BTH-TE2502-MasterThesis. https://github.com/qiupinghe/BTH-TE2502-MasterThesis, June 2019. [Ac- cessed June 12, 2019].

[51] Masaya Yasuda, Takeshi Shimoyama, Jun Kogure, and Tetsuya Izu. On the Strength Comparison of the ECDLP and the IFP. In Ivan Visconti and Roberto De Prisco, editors, Security and Cryptography for Networks, pages 302–325, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg.

[52] Arumugam Sakthivel and Raju Nedunchezhian. Analyzing the point multi- plication operation of elliptic curve cryptosystem over prime field for parallel processing. The International Arab Journal of Information Technology, 11(4), July 2014. ttp://iajit.org/PDF/vol.11,no.4/5008.pdf.

[53] Nicholas Jansma and Brandon Arrendondo. Performance Compar- ison of Elliptic Curve and RSA Digital Signatures. 2004. http: //www.nicj.net/files/performance_comparison_of_elliptic_curve_ and_rsa_digital_signatures.pdf.

[54] Al Imem Ali. Comparison and evaluation of digital signature schemes employed in NDN network. International Journal of Embedded Systems and Applications, 5(2):15–29, June 2015. https://doi.org/10.5121/ijesa.2015.5202.

[55] Vipul Gupta, Sumit Gupta, Sheueling Chang, and Douglas Stebila. Performance analysis of elliptic curve cryptography for SSL. In Proceedings of the ACM workshop on Wireless security - WiSE '02. ACM Press, 2002. https://doi. org/10.1145/570681.570691.

[56] Xiaojing DU and Shuguo LI. The ASIC implementation of SM3 hash algo- rithm for high throughput. IEICE Transactions on Fundamentals of Elec- tronics, Communications and Computer Sciences, E99.A(7):1481–1487, 2016. https://doi.org/10.1587/transfun.e99.a.1481.

[57] Anwar Ali. Intel 64 and IA-32 Architectures Optimization Reference Manual. Intel Technology Journal, 09(03), August 2005. https://doi.org/10.1535/ itj.0903.05. 66 References

[58] Bart Preneel, Vincent Rijmen, and Antoon Bosselaers. Algorithm alley. http: //www.drdobbs.com/algorithm-alley/184410756, 1998. [Accessed February 19, 2019].

[59] Priyadarshini Patil, Prashant Narayankar, Narayan D.G., and Meena S.M. A comprehensive evaluation of cryptographic algorithms: DES, 3des, AES, RSA and blowfish. Procedia Computer Science, 78:617–624, 2016.

[60] Diaa Salama Abdul. Elminaam, Hatem M. Abdul Kader, and Mohie M. Had- houd. Performance evaluation of symmetric encryption algorithms on power consumption for wireless devices. International Journal of Computer Theory and Engineering, pages 343–351, 2009.

[61] Burt Kaliski. RSA Digital Signature Scheme, pages 1061–1064. Springer US, Boston, MA, 2011. https://doi.org/10.1007/978-1-4419-5906-5_432.

[62] OpenSSL. SSL_library_init. https://www.openssl.org/docs/man1.0.2/ man3/SSL_library_init.html. [Accessed May 21, 2019].

[63] OpenSSL. Changelog. https://www.openssl.org/news/changelog.html# x44. [Accessed May 21, 2019]. [64] Intel Corporation. Intel core i7 desktop processors comparison chart. https://www.intel.com/content/dam/support/us/en/documents/ processors/core/intel-core-i7-comparison-chart.pdf, 2019. [Accessed May 14, 2019].

[65] Alexander Stanoyevitch. Introduction to cryptography with mathematical foun- dations and computer implementations. Taylor & Francis Group, 6000 Broken Sound Parkway NW, Suite 300, Boca Raton USA, 2013.

[66] Yevgeniy Dodis, Jonathan Katz, John Steinberger, Aishwarya Thiruvengadam, and Zhe Zhang. Provable security of substitution-permutation networks. https: //eprint.iacr.org/2017/016.pdf, 2017. [Accessed February 14, 2019]. [67] National Institute of Standards and Technology. Digital Signature Standard (DSS). Fips Pub 186-4, (July):1–119, 2013.

[68] Don Johnson, Alfred Menezes, and Scott Vanstone. The elliptic curve digital signature algorithm (ECDSA). International Journal of Information Security, 1(1):36–63, August 2001.

[69] Elaine B. Barker and Quynh H. Dang. Recommendation for key management part 3: Application-specific key management guidance. Technical report, Jan- uary 2015.

[70] R. L. Rivest, A. Shamir, and L. Adleman. A method for obtaining digital signa- tures and public-key cryptosystems. Communications of the ACM, 21(2):120– 126, February 1978. References 67

[71] Dan Boneh and H Shacham. Fast variants of RSA. CryptoBytes, pages 1–10, 2002.

[72] DI Management. Rsa algorithm. https://www.di-mgt.com.au/rsa_alg.html, June 2018. [Accessed 20 May 2019].

[73] Mihir Bellare and Phillip Rogaway. The exact security of digital signatures-how to sign with RSA and rabin. In Advances in Cryptology — EUROCRYPT ’96, pages 399–416. Springer Berlin Heidelberg, 1996.

[74] RSA Laboratories. PKCS #1 v2.2: RSA Cryptography Standard. 2012.

[75] Yusuke Naito, Kazuki Yoneyama, Lei Wang, and Kazuo Ohta. Davies-meyer merkle-damgard revisited: Variants of indifferentiability and random oracles. April 2012.

[76] Quynh H. Dang. Secure Hash Standard. Technical report, July 2015. https: //doi.org/10.6028/nist.fips.180-4, [Accessed 28 February 2019].

[77] China Internet Network Information Center. SM2 algorithm. https: //cnnic.com.cn/ScientificResearch/LeadingEdge/soea/SM2/201312/ t20131204_43349.htm. [Accessed February 19, 2019]. [78] Cryptography Standardization Technical Committee. GM/T 0003-2012 SM2 elliptic curve public key cryptography algorithm standard En- glish text release. http://www.gmbz.org.cn/main/postDetail.html?id= 20180724110812, 2012. [Accessed February 19, 2019]. [79] Yaozhao Shen, Dongxia Bai, and Hongbo Yu. Improved of step- reduced . Science China Information Sciences, 61, 2017. https://doi. org/10.1007/s11432-017-9119-6. [80] Harshvardhan Tiwari. Merkle-Damgård constructionunlikely method and al- ternatives: A review. Journal of Information and Organizational Sciences, 41(2):283–304, 2017.

[81] Bart Preneel. Encyclopedia of Cryptography and Security: Davies–Meyer, pages 312–313. Springer US, Boston, MA, 2011. https://doi.org/10.1007/978-1- 4419-5906-5_569. [82] Office of State Commercial Cryptography Administration. GM/T 0004- 2012: SM3 Cryptographic Hash Algorithm. http://www.gmbz.org.cn/main/ postDetail.html?id=20180724105928, 2012. [Accessed February 28, 2019]. [83] Alex Biryukov. Encyclopedia of Cryptography and Security: , pages 455–455. Springer US, Boston, MA, 2011. https://doi.org/10.1007/ 978-1-4419-5906-5_577. [84] Bruce Schneier and John Kelsey. Unbalanced Feistel networks and block cipher design. In Dieter Gollmann, editor, Fast Software Encryption, pages 121–144, Berlin, Heidelberg, 1996. Springer Berlin Heidelberg. 68 References

[85] Whitfield Diffie and George Ledin (translators). SMS4 Encryption Algorithm for Wireless Networks. Cryptology ePrint Archive, Report 2008/329, 2008. https: //eprint.iacr.org/2008/329. [86] Office of State Commercial Cryptography Administration. GM/T 0002-2012: SM4 Block Cipher Algorithm. http://www.gmbz.org.cn/main/postDetail. html?id=20180404044052, 2012. [Accessed February 12, 2019]. Appendix A Algorithm Design

This chapter describes the cryptographic algorithms AES, ECDSA, RSA, SHA-256, SM2, SM3, and SM4. The properties and operations of the algorithms are described using text, images, and tables. This chapter aims to give the reader fundamental knowledge about the algorithms to ensure that the resulting theoretical comparison of algorithms in chapter 5 can be understood.

A.1 AES

Advanced Encryption Standard (AES), also known by the original name Rijndael, is an encryption specification, where data is encrypted and decrypted in blocks of 128-bits. The algorithm has three choices for the key length: 128, 192, and 256 bits. Other block sizes and key lengths are not supported [2]. The overall structure of AES is a Substitution-Permutation Network (SPN) [65]. An SPN is a combination of nonlinear substitute transformation and linear replacement transformation, which are relevant to create confusion and diffusion. Block ciphers such as AES, , and PRESENT, are SPN-based [66]. In SPN, the input will go through several rounds of confusion and diffusion to produce the ciphertext. Common operations used in SPNs are XOR and bitwise rotation. AES is a key-iterated block cipher and consists of repeated operations of a round transformation on the state. The state is a two-dimensional array of bytes with four rows and columns. Each byte in the state array has two indices, with its row number r in the range 0 ≤ r < 4 and the column number c in the range 0 ≤ c < Nb [2], where Nb is the number of columns and is equal to the block size divided by 32 [3]. Every individual byte of the state is referred to as Sr,c in figure A.1.

Figure A.1: The state array input and output. Figure adapted from figure 3 in [2].

69 70 Appendix A. Algorithm Design

The number of rounds executed in the AES encryption algorithm depends on the key length used in the encryption process. The key length and the corresponding number of rounds are shown in table A.1. Table A.1: Key-Round Combinations.

Key length (Bits) Number of Rounds 128 10 192 12 256 14

AES-128 Encryption In the beginning of the encryption process, the plaintext is copied to the state using the following scheme:

s[r, c] = in[r + 4c] for 0 ≤ r < 4 and 0 ≤ c < Nb (A.1) where the state array is denoted by s and the input array denoted by in. The AES encryption algorithm has four main operations: SubBytes, ShiftRows, MixColums, and AddRoundKey. The encryption process is shown in figure A.2, where n in the figure denote the number of rounds.

Figure A.2: The AES encryption algorithm.

SubBytes The first step in the AES algorithm is byte substitution, SubBytes. SubByte trans- formation is the only non-linear transformation of AES. It creates confusion by using a substitution table, S-box. The SubBytes transformation operates on every byte in the state using the S-box, shown in table A.2. For example, if S0,0 = 19, the new value is determined by the intersection of the row with index ‘1’ and the column with index ‘9’. The new value of S0,0 is d4. This process is repeated for the remaining 15 bytes of the state.

ShiftRows The next transformation is ShiftRows. The bytes in the last three rows are shifted over a number of bytes, depending on the row number. If the row number is 2 then A.1. AES 71

Table A.2: AES S-box.

y 0 1 2 3 4 5 6 7 8 9 a b c d e f 0 63 7c 77 7b f2 6b 6f c5 30 01 67 2b fe d7 ab 76 1 ca 82 c9 7d fa 59 47 f0 ad d4 a2 af 9c a4 72 c0 2 b7 fd 93 26 36 3f f7 cc 34 a5 e5 f1 71 d8 31 15 3 04 c7 23 c3 18 96 05 9a 07 12 80 eb 27 b2 75 4 09 83 2c 1a 1b 6e 5a a0 52 3b d6 b3 29 e3 2f 84 5 53 d1 00 ed 20 fc b1 5b 6a cd be 39 4a 4c 58 cf 6 d0 ef aa fb 43 4d 33 85 45 f9 02 7f 50 3c 9f a8 x 7 51 a3 40 8f 92 9d 38 f5 bc b6 da 21 10 ff f3 d2 8 cd 0c 13 ec 5f 97 44 17 c4 a7 7e 3d 64 5d 19 73 9 60 81 4f dc 22 2a 90 88 46 ee b8 14 de 5e 0b db a e0 32 3a 0a 49 06 24 5c c2 d3 ac 62 91 95 e4 79 b e7 c8 37 6d 8d d5 4e a9 6c 56 f4 ea 65 7a ae 08 c ba 78 25 2e 1c a6 b4 c6 e8 dd 74 1f 4b bd 8b 8a d 70 3e b6 66 48 03 f6 0e 61 35 57 b9 86 c1 1d 9e e e1 f8 98 11 69 d9 8e 94 9b 1e 87 e9 ce 55 28 df f 8c a1 89 0d bf e6 42 68 41 99 2d 0f b0 54 bb 16 the bytes in row 2 is shifted two bytes. The transformation proceedes as follows:

Sr,c = Sr,(c + shift(r,Nb)) mod Nb for 0 < r < 4 and 0 ≤ c < Nb (A.2) where the shift(r,Nb) depends on the row number, r. A visual representation of the ShiftRows transformation is shown in figure A.3.

Figure A.3: The ShiftRow transformation. Figure adapted from figure 8 in [2]. 72 Appendix A. Algorithm Design

MixColums The next transformation is MixColumns, which operates on the state column-by- column. The columns are considered as a four-term polyminal over GF (28) and multiplied modulo x4 + 1 with the fixed polyminal c(x):

c(x) = 03 · x3 + 01 · x2 + 01 · x + 02 (A.3)

The new value of each byte is:

s0(x) = c(x) · s(x)(mod x4 + 1) (A.4)

The equation A.3 can be represented in matrix format as follows:

0  S0,c   02 03 01 01   S0,c  S0  1,c   01 02 03 01   S1,c  for (A.5)  0  =   ·   0 ≤ c < Nb  S2,c   01 01 02 03   S2,c  0 S3,c 03 01 01 02 S3,c A visual representation of MixColums transformation is shown in figure A.4

Figure A.4: The MixColums transformation. Figure adapted from figure 3.6 in [3].

MixColumns, together with ShiftRows, provides full diffsion over two rounds of AES [3].

AddRoundKey The final transformation of one round is AddRoundKey. In this transformation, a subkey is added to the state by an XOR operation. The subkey consists of Nb words and is generated from the key expansion. The illustration of RoundKey transforma- tion is shown in figure A.5. A.1. AES 73

Figure A.5: The AddRoundKey transformation. Figure adapted from figure 3.8 in [3].

Key Expansion The key expansion expands the cipher key into subkeys used in every round of the encryption process. The key expansion introduces asymmetry in the round keys. This prevents symmetry in the 10 round transformations that could otherwise lead to weaknesses in the cipher. The key expansion also provides resistance against attacks where the cipher key is partially known [3]. During the key expansion Nb·(Nr+1) words are created, where Nr is the number of rounds and Nb is the number of columns. The key expansion consists of a cyclic bit shift of the bytes in the RotWord operation, followed by an S-box table lookup for every byte of the bytes in the SubWord operation. It starts with the encryption key, which is used as the first subkey. The first four bytes of the encryption key constitute the word w0, and the next four bytes in the encryption key constitute the word w1, and so on. This results the subkey: [w1, w2, w3, w4] which is 16 bytes, and is equivalent to 128 bits. The rest of the words are determined as follows:

wi = wi−1 ⊕ wi−4 (A.6) if the value of i is not a multiple of 4. For other words that are a multiple of 4, the following process will be executed:

1. RotWord: Take the first byte and place it in the end in the array. [w2, w3, w4, w1] 2. SubWord: Take the four byte and apply the SubByte to each byte and produce a new array, 0 Wi−1 3. The result of RotWord and SubWord will be XOR’ed with a round constant, Rcon[i], using the formula: 0 Wi = Wi−4 ⊕ Wi−1 ⊕ Rcon[i]

The next round key will consist of [Wi Wi+1 Wi+2 Wi+3]. 74 Appendix A. Algorithm Design

AES Decryption The AES decryption algorithm is the same as the encryption algorithm, it consist of the following steps:

InvShiftRows Same as ShiftBytes transformation. The bytes in the last three rows are shifted over different number of bytes, depending on the row number.

InvSubBytes Same as SubByte transformation. Except that an inverse S-box is used in the In- vSubByte transformation. The inverse S-box is shown in table A.3.

Table A.3: AES inverse S-box.

y 0 1 2 3 4 5 6 7 8 9 a b c d e f 0 52 09 6a d5 30 36 a5 38 bf 40 a3 9e 81 f3 d7 fb 1 7c e3 39 82 9b 2f ff 87 34 8e 43 44 c4 de e9 cb 2 54 7b 94 32 a6 c2 23 3d ee 4c 95 0b 42 fa c3 4e 3 08 2e a1 66 28 d9 24 b2 76 5b a2 49 6d 8b d1 25 4 72 f8 f6 64 86 68 98 16 d4 a4 5c cc 5d 65 b6 92 5 6c 70 48 50 fd ed b9 da 5e 15 46 57 a7 8d 9d 84 6 90 d8 ab 00 8c bc d3 0a f7 e4 58 05 b8 b3 45 06 x 7 d0 2c 1e 8f ca 3f 0f 02 c1 af bd 03 01 13 8a 6b 8 3a 91 11 41 4f 67 dc ea 97 f2 cf ce f0 b4 e6 73 9 96 ac 74 22 e7 ad 35 85 e2 f9 37 e8 1c 75 df 6e a 47 f1 1a 71 1d 29 c5 89 6f b7 62 0e aa 18 be 1b b fc 56 3e 4b c6 d2 79 20 9a db c0 fe 78 cd 5a f4 c 1f dd a8 33 88 07 c7 31 b1 12 10 59 27 80 ec 5f d 60 51 7f a9 19 b5 4a 0d 2d e5 7a 9f 93 c9 9c ef e a0 e0 3b 4d ae 2a f5 b0 c8 eb bb 3c 83 53 99 61 f 17 2b 04 7e ba 77 d6 26 e1 69 14 63 55 21 0c 7d

InvMixColumns Same as MixColumns transformation. Except that the fixed polynomial c−1(x) is given by: c−1(x) = 0b · x3 + 0d · x2 + 09 · x + 0e (A.7) In matrix form: 0  S0,c   0e 0b 0d 09   S0,c  S0  1,c   09 0e 0b 0d   S1,c  for (A.8)  0  =   ·   0 ≤ c < Nb  S2,c   0d 09 0e 0b   S2,c  0 S3,c 0b 0d 09 0e S3,c

InvAddRoundKey The InvAddRoundKey transformation is the inverse of AddRoundKey. A.2. ECDSA 75

A.2 ECDSA Elliptic Curve Digital Signature Algorithm (ECDSA)[67] is an elliptic curve variant of the Digital Signature Algorithm (DSA). The algorithm was proposed in 1992 by Scott Vanstone [68]. Today, the ECDSA is included in the ISO standard, ANSI standard, IEEE standard and FIPS standard. ECDSA is defined for two types of arithmetic fields; prime field Fp, and binary field F2m . NIST [69] recommends curve P-256 and P-384 to be used in ECDSA.

Key generation An ECDSA key pairs consist of a private key d and a public key Q. To generate the key pair the elliptic curve domain parameters D must be set. The domain parameters are defined as D = (q, F R, a, b, G, n, h), where q is the field size, FR is indicating the basis used, a and b are coefficients defining the elliptic curve equation, G is the base point, n is the order of the base point, and h is the cofactor (the order of the curve divided by n). The domain parameters can be made public. The private key can then be created by generating a random integer d, where d ∈ [1, n − 1]. The public key is then set to

Q = [d]G. (A.9)

ECDSA Digital Signature Generation The ECDSA digital signature should be generated using: domain parameters (D), a private key (d), a secret integer (k), an approved hash function (SHA-1 or SHA-2), and an approved random bit generator. Suppose that A is the signer, and M is the message to sign. To generate a signa- ture for M, A must do the following:

Step 1: Generate a random integer k ∈ [1, n − 1]

Step 2: Calculate [k]G = (x1, y1) and convert x1 to an integer.

Step 3: Calculate r = x1 mod n. If r = 0, repeat step 1-2.

Step 4: Calculate k−1 mod n.

Step 5: Calculate hash(M), and convert it to an integer.

Step 6: Calculate s = k−1(hash(M) + dr) mod n. If s = 0, repeat step 1-5.

Result: A’s digital signature for message M is (r, s). The ECDSA signature generation algorithm is shown in figure A.6. 76 Appendix A. Algorithm Design

Figure A.6: The ECDSA signature generation.

ECDSA Digital Signature Verification To verify A’s signature (r, s) on the message M the verifier B should do the following:

Step 1: Obtain an authentic copy of A’s domain parameters D and the public key Q.

Step 2: Verify that r and s are integers and r, s ∈ [1, n − 1].

Step 3: Calculate hash(M) and convert it to an integer.

Step 4: Calculate w = s−1 mod n

Step 5: Calculate u1 = hash(M)w mod n and u2 = rw mod n A.2. ECDSA 77

Step 6: Calculate the point T = u1G + u2Q. If T is an infinity point reject the signature.

Step 7: Convert the x coordinate of T to an integer and calculate v = xT mod n. Accept the signature only is v = r.

Result: Verification of the signature success or failed. The ECDSA signature verification algorithm is shown in figure A.7.

Figure A.7: The ECDSA signature verification. 78 Appendix A. Algorithm Design

A.3 RSA RSA is a commonly used public key cryptosystem [70] invented by Ron Rivest, Adi Shamir, and Len Adleman. There are two keys in a public key cryptosystem, one public key for encryption and one private key for decryption. The public key can be shared over an insecure channel since there is no practical way to derive the private key from the public key. RSA is based on the factoring problem; the difficulty of factoring the product of two large prime integers.

Key generation An RSA key pair consists of two keys; a public key and a private key. The first step of generating the RSA key pair is to generate two discrete large primes p and q. If the key length is chosen to be n bits, then each prime is n/2 bits. RSA can be used with any key length n, although a larger key increases the security of the encryption at the cost of algorithm speed. NIST recommends a key length of 2048 bits for RSA [69]. Next, p and q is used to calculate N as follows:

N = pq (A.10)

Then the encryption exponent e, which is relatively prime to

ρ(N) = (p − 1)(q − 1), (A.11) is chosen, i.e. e will satisfy gcd(e, ρ(N)) = 1. The value of e is often chosen as 3 or 65537 [71]. These values are chosen to make the modular exponetiation operation faster, since 3 and 65537 only contains two bits of value 1 in their binary form [72]. The decryption exponent d is then the multiplicative inverse of e modulo ρ(N). This can be written as e · d ≡ 1 mod ρ(N). (A.12) Both the public and private key are composed of a pair of positive integers and can be written as hN, ei and hN, di respectively.

Digital Signature Scheme RSA can be used to generate and verify digital signatures. A message M is signed by computing S = M d mod N (A.13) using the private key hN, di. The signature can then be verified by using the signer’s corresponding public key hN, ei to compute

M = Se mod N. (A.14)

In practical implementations of RSA digital signature schemes, the message M is usually a message representative, not the message itself. A message representative can be created by encoding the message M with Probabilistic Signature Scheme (PSS) [73]. An approach based on PSS is used in the RSA standard PKCS (Public- Key Cryptography Standards) #1 v2.2 [74]. A.3. RSA 79

PSS PSS [73] is an encoding method for signatures with appendix. PSS does not embed the message M in the encoded message representative m. Therefore the message M must be transmitted along with the signature S, as an appendix. The encoding method consists of a hash function H, a mask generation function G, and a r. The value k is the RSA key length. Two values k0 and k1 are predefined. The encoding operation first generates a random salt r, then w is set to

w = H(Mkr) (A.15) where M is the message to be signed. Next, r∗ is set to

r∗ = G1(w) ⊕ r (A.16) where G1(x) is a function that outputs the first k0 bits of G(x). The encoded message representative m is then computed by

m = 0kwkr ∗ kG2(w) (A.17) where G2(x) is a function that outputs the last k − k0 − k1 bits of G(x). The PSS procedure is shown in figure A.8.

Figure A.8: Encoding with PSS. 80 Appendix A. Algorithm Design

A.4 SHA-256 The Secure Hash algorithm, known as SHA hash functions, is standardized by NIST as part of the Secure Hash Standard (SHS). SHA hash functions are designed by the United States National Security Agency (NSA). SHA-2 is a set of the crypto- graphic hash algorithm specified in SHS. Hash algorithms such as SHA-256, SHA- 384, and SHA-512 are members of the SHA-2 family. SHA-256 uses Merkle-Damgård structure, and the compression function is based on Davies-Meyer [75]. The SHA documentation (FIPS 180-4) provided by NIST is available at [76]. SHA-256 is an iterative, one-way hash function that takes any input M, with the length l, where 0 ≤ l ≤ 264, to produce a 256-bit message digest. SHA-256 hash algorithm is divided into two stages, preprocessing and hash computation. In the preprocessing stage, the message to be hashed is padded and parsed into 512-bit blocks. Then the hashed message is calculated in the hash computation stage. The following variables are used in the preprocessing and hash computation:

• A message schedule of 64 32-bits words, denoted W0,W1,...,W63 • Eight working variable of 32 bits, denoted a, b, c, d, e, f, g, and h • A hash value of 8 32-bits words, denote (i) (i) (i) H0 ,H1 ,...,H7 A visual representation of SHA-256 hash algorithm is shown in figure A.9.

Figure A.9: The overall stucture of SHA-256. A.4. SHA-256 81

SHA-256 Operations SHA-256 operates on 32 bits words. The following operations are used in SHA-256 algorithm:

1. Bitwise logical operations: AND (∧), OR (∨), XOR (⊕), and NOT (¬).

2. Addition (+) modulo 232.

3. Right shift of x by n bits is denoted as SHRn(x)

4. Right rotation of x by n bits is denoted as ROTRn(x)

5. Concatenation of binary words A and B is denoted as AkB

SHA-256 Functions Six logical functions are used in the hash computation stage, listed below.

ch(x, y, z) = (x ∧ y) ⊕ (¬x ∧ z) (A.18)

maj(x, y, z) = (x ∧ y) ⊕ (x ∧ z) ⊕ (y ∧ z) (A.19)

X{256} (x) = ROTR2(x) ⊕ ROTR13(x) ⊕ ROTR22(x) (A.20) 0

X{256} (x) = ROTR6(x) ⊕ ROTR11(x) ⊕ ROTR25(x) (A.21) 1

{256} ROTR7 ROTR18 SHR3 (A.22) σ0 (x) = (x) ⊕ (x) ⊕ (x)

{256} ROTR17 ROTR19 SHR10 (A.23) σ1 (x) = (x) ⊕ (x) ⊕ (x)

SHA-256 Constants SHA-256 uses 64 constant 32 bit words, {i} {i} {i}. These constants are K0 ,K1 ,...,K63 used for the iteration t in the hash computation stage. The constants in hexadecimal are listed below (from left to right). 428a2f98 71374491 b5c0fbcf e9b5dba5 3956c25b 59f111f1 923f82a4 ab1c5ed5 d807aa98 12835b01 243185be 550c7dc3 72be5d74 80deb1fe 9bdc06a7 c19bf174 e49b69c1 efbe4786 0fc19dc6 240ca1cc 2de92c6f 4a7484aa 5cb0a9dc 76f988da 983e5152 a831c66d b00327c8 bf597fc7 c6e00bf3 d5a79147 06ca6351 14292967 27b70a85 2e1b2138 4d2c6dfc 53380d13 650a7354 766a0abb 81c2c92e 92722c85 a2bfe8a1 a81a664b c24b8b70 c76c51a3 d192e819 d6990624 f40e3585 106aa070 19a4c116 1e376c08 2748774c 34b0bcb5 391c0cb3 4ed8aa4a 5b9cca4f 682e6ff3 748f82ee 78a5636f 84c87814 8cc70208 90befffa a4506ceb bef9a3f7 c67178f2 82 Appendix A. Algorithm Design

SHA-256 Preprocessing

The SHA-256 preprocessing consist of three steps: message , message parsing, and setting the initial hash value H(0). The purpose of the padding process is to ensure that the message has length multiple of 512 bits. It starts with first converting the message M with length l, to bits. The bit "1" is then appended to the end of the message, followed by k zero bits, where k is the smallest positive integer which satisfies the equation:

l + 1 + k ≡ 448 mod 512 (A.24)

Finally, a 64-bit block that is equal to l (in binary) is appended to the end of the mes- sage. The next step is parse the message into N 512-bits blocks, M (1),M (2),...,M (N). Each block is expressed as 16 32-bit block, the first 32-bits of the message block i are denoted as (i), the next 32-bits are denoted as (i), and so on up to (i). M0 M1 M15 The initial hash value, H(0) need to be set before hash computation. It consist of eight 32-bit words, the eight variabels in hexadecimal are:

(0) H0 = 6a09e667 (0) H1 = bb67ae85 (0) H2 = 3c6ef3727 (0) H3 = a54ff53a (A.25) (0) H4 = 510e527f (0) H5 = 9b05688c (0) H6 = 1f83d9ab (0) H7 = 5be0cd19

SHA-256 Hash Computation

SHA-256 hash computation uses operations, functions and constants described in previously sections. The message blocks M (1),M (2),...,M (N) are processed one at a time using the following steps:

1. Prepare the message schedule, Wt:

( (i) Mt 0 ≤ t ≤ 15 Wt = {256} {256} σ1 (Wt−2) + Wt−7 + σ0 (Wt−15) + Wt−16 16 ≤ t ≤ 63 A.4. SHA-256 83

2. Initializing the eight working variable, a, b, c, d, e, f, g, h with the (i − 1)th in- termediate hash value (i−1) a = H0 (i−1) b = H1 (i−1) c = H2 (i−1) d = H3 (i−1) e = H4 (i−1) f = H5 (i−1) g = H6 (i−1) h = H7 3. Compression: for t = 0 to 63 do

X{256} {256} T1 = h + (e) + ch(e, f, g) + Kt + Wt 1 X{256} T2 = (a) + maj(a, b, c) 1 h = g g = f f = e

e = d + T1 d = c c = b b = a

a = T1 + T2

4. Calculate the ith intermediade hash value H(i)

(i) (i−1) H0 = a + H0 (i) (i−1) H1 = b + H1 (i) (i−1) H2 = c + H2 (i) (i−1) H3 = d + H3 (i) (i−1) H4 = e + H4 (i) (i−1) H5 = f + H5 (i) (i−1) H6 = g + H6 (i) (i−1) H7 = h + H7

Result: The hash (H) of the message M is determined as follows:

(N) (N) (N) (N) (N) (N) (N) (N) (A.26) H = H0 kH1 kH2 kH3 kH4 kH5 kH6 kH7 84 Appendix A. Algorithm Design

A.5 SM2 SM2 is an asymmetric public key cryptosystem based on elliptic curves and is pub- lished by the Chinese State Cryptography Administration in 2010. SM2 is based on the elliptic curve discrete logarithm problem (ECDLP); the difficulty of finding an integer l which satisfies Q = [l]P where P and Q are points on the elliptic curve. The cryptosystem consists of three parts: digital signature algorithm, key exchange protocol, and public key encryption algorithm [77]. An English translation of SM2 is available at [78].

Digital signature algorithm The SM2 digital signature algorithm is an elliptic curve digital signature algorithm. It generates a digital signature by a signer with the signers private key, and the signature can then be verified by a verifier with a public key. The algorithm can be defined over either prime fields Fp or binary fields F2m . The definition of Fp and F2m are defined in section 3.4 Elliptic Curve Cryptography.

System parameters of elliptic curve System parameters of an elliptic curve are inputs to different algorithms in the SM2 cryptosystem. The parameters can be public but do not have an impact on the security of the algorithms [78]. An elliptic curve can be defined over a prime field respectively a binary field, which means that the system parameters of the elliptic curve also varies. The required system parameters of a elliptic curve over Fp are listed below.

• The field size q = p, where p is a prime greater than 3.

• The coefficient a, b ∈ F q which defines in the equation of E(Fp)

• The base point G = (xG, yG) and G 6= 0

191 1 • The base point order n, which satisfy n > 2 and n > 4p 2

The required system parameters of a elliptic curve over F2m are listed below.

m • The field size q = 2 , where m is the irreducible polymial over F2 of degree m

2 3 2 • The coefficient a, b in the equation y + xy = x + ax + b which defines E(F2m )

• The base point G = (xG, yG) ∈ E(F2m )

191 2+ m • The base point order n, which satisfy n > 2 and n > 2 2

Key generation An SM2 key pair consist of a private key and a public key. The key pair is determined as follows:

Chose an elliptic curve E in the finite field. Select a point G ∈ E = (xG, yG) to be a fixed point of order m where m is a prime. Then generate an integer d ∈ [1, n − 2] A.5. SM2 85 using a random number generator. The public key is calculated using the following formula:

P = [d] G = (xP , yP ) (A.27) The recommended key length is 256 bits for a private key, and 512 bits for a public key [78]. However, the key length is not fixed and can vary.

Other information about the signer

Let A denote the signer. User A holds the private key dA and the corresponding public key PA. User A have also an identifier IDA of length lenA bits. LENA is result in bytes converted from integer lenA. The hash value of user A’s identifier is determined as follows:

ZA = H256(LENAkIDAkakbkxGkyGkxAkyA) (A.28) where a,b are the parameters defined in the elliptic curve equation, (xG, yG) denote the coordinates of G, and (xA, yA) is the coordinates of user A’s public key. Note that a, b, xG, yg, xA, yA are in bit format.

Digital signature generation algorithm

Suppose that M is the message to be signed. To generate the signature for message M, user A should follow the steps described below. Step 1: Set M = ZA k M, where M is the concatenation of ZA and M, and ZA is the hash value of an identifier of user A.

Step 2: Calculate e = Hv(M) and convert the data type of e to be an integer, where Hv is a hash function with v bits message digest.

Step 3: Generate a random integer k ∈ [1, n − 1].

Step 4: Calculate (x1, y1) = [k] G. Then convert x1 to an integer. In this step, the base point is denoted as G, and [k] G is a new point which is k times of point G, e.g. [k] G = G + G + ··· + G. | {z } k G’s

Step 5: Calculate r = (e + x1) mod n. If r = 0 or r + k = n, repeat step 3-4 until the condition is not met.

−1 Step 6: Calculate s = ((1 + dA) · (k − r · dA)) mod n. If s = 0, repeat step 3-5 until the condition is not met.

Step 7: Convert the data type of r, s to bit strings.

Result: The signature message M = (r, s). The signature generation algorithm is shown in figure A.10. 86 Appendix A. Algorithm Design

Figure A.10: SM2 Digital Signature Generation Algorithm. Figure adapted from figure 1 in [4].

Digital Signature Verification Algorithm

Let M 0 denote the received message and with the digital signature (r0, s0). If a verifier wants to verify the received message, then the verifier should follow the process described below. Step 1: Verify if r0 ∈ [1, n − 1]. The verification is failed if r0 ∈/ [1, n − 1].

Step 2: Verify if s0 ∈ [1, n − 1]. The verification is failed if s0 ∈/ [1, n − 1].

0 Step 3: Set M = ZA k M . Note that if ZA is not the user A’s hash value (from the signature generation algorihtm), then the verification will fail.

0 0 0 Step 4: Calculate e = Hv(M ) and convert the data type of e to an integer. A.5. SM2 87

Step 5: Convert the data type of r0, s0 to integers and calculate t = (r0 + s0). The verification is failed if t = 0.

Calculate 0 0 0 . Step 6: (x1, y1) = [s ] G + [t] PA

Convert the data type of 0 to an integer. Then, determine 0 Step 7: x1 R = (e + 0 mod and check if 0. The verification succeeds if 0, otherwise the x1) n R = r R = r verification failed.

Result: Verification success or failed. The signature verification algorithm is shown in figure A.11.

Figure A.11: SM2 Digital Signature Verification Algorithm. Figure adapted from figure 2 in [4]. 88 Appendix A. Algorithm Design

A.6 SM3 SM3 is a hash algorithm, published by OSCCA in 2010, that produces a hash value of 256 bits from a message M of length l (l ≤ 264). The SM3 algorithm has a Merkle- Damgård construction [79]. A Merkle-Damgård construction first applies padding to a message, and then divides the message into blocks of equal length. The blocks are processed one at a time by a compression function where the output serves as input to the compression of the next block [80]. The compression function of SM3 follows a Davies-Meyer design [19]. A Davies-Meyer compression function encrypts the chaining value Vi using a message block as a block cipher key. The output ciphertext is then XOR:ed with the chaining value Vi to produce the next chaning value Vi+1 [81]. The overall structure the SM3 algorithm can be seen in figure A.12.

Figure A.12: SM3 algorithm. Figure adapted from figure 1 in [5].

An English translation of the SM3 specification is available at [82].

Message preprocessing The message preprocessing starts with padding the message M of length l so that the length will be a multiple of 512 bits. The padding works by first appending the bit "1" to the end of message M. Then k "0" bits are appended, where k is the smallest positive integer satisfying l + 1 + k ≡ 448 mod 512. (A.29) Finally, the 64-bit binary representation of the message length l is appended. The length of the padded message M 0 will be a multiple of 512. The padded message is then divided into n 512-bit blocks, denoted M 0 = B(0)B(1)...B(n−1). Before the hash computation, consisting of message expansion and message com- pression, can begin the initial value is set to

IV = 7380166f 4914b2b9 172442d7 da8a0600 a96f30bc 163138aa e38dee4d b0fb0e4e (A.30) A.6. SM3 89

Message expansion The message expansion function expands each block B(i) into 132 32-bit words 0 0 . The words are generated according to the following W0,W1, ..., W67,W0, ..., W63 steps:

(i) Step 1: Split block B into 16 32-bit words W0,W1, ..., W15.

Step 2: Generate W16,W17, ..., W67 by: for j = 16 to 67 do Wj = P1(Wj−16 ⊕ Wj−9 ⊕ (Wj−3 ≪ 15)) ⊕ (Wj−13 ≪ 7) ⊕ Wj−6 end for

where P1(X) = X ⊕ (X ≪ 15) ⊕ (X ≪ 23) and x ≪ n denotes the circular shift of x by n bits to the left.

Generate 0 0 0 by: Step 3: W0,W1, ..., W63 for j = 0 to 63 do 0 Wj = Wj ⊕ Wj+4 end for

Message compression The chaining variable V (i) is first initialized to the predefined 256-bit value IV . The iterative compression process then alters the chaining variable V (i) based on the ex- panded message blocks, until all blocks have been used. The message compression will process each expanded block according to: for i = 0 to n − 1 do V (i + 1) = CF (V (i),B(i)) end for The compression function CF consists of 64 iterations, where A, B, C, D, E, F , G, and H are word registers, and SS1, SS2, TT 1, and TT 2 are intermediate variables. During the compression of the first block, the registers ABCDEF GH are initialized to the predefined 256-bit value IV . The compression function CF is defined as: ABCDEF GH = V (i) for j = 0 to 63 do SS1 = ((A ≪ 12) + E + (Tj ≪ (j mod 32))) ≪ 7 SS2 = SS1 ⊕ (A ≪ 12) 0 TT 1 = FFj(A, B, C) + D + SS2 + Wj TT 2 = GGj(E,F,G) + H + SS1 + Wj D = C C = B ≪ 9 B = A A = TT 1 H = G G = F ≪ 19 F = E E = P0(TT 2) end for 90 Appendix A. Algorithm Design

V (i + 1) = ABCDEF GH ⊕ V (i) where the constant Tj is defined as ( 96cc4519, for 0 ≤ j ≤ 15 T = (A.31) j 7a879d8a, for 16 ≤ j ≤ 63 and the functions FFj, GGj, and P0 are defined as ( X ⊕ Y ⊕ Z, for 0 ≤ j ≤ 15, FF (X,Y,Z) = j (X ∧ Y ) ∨ (X ∧ Z) ∨ (Y ∧ Z), for 16 ≤ j ≤ 63 ( X ⊕ Y ⊕ Z, for 0 ≤ j ≤ 15, (A.32) GG (X,Y,Z) = j (X ∧ Y ) ∨ (¬X ∧ Z), for 16 ≤ j ≤ 63

P0(X) = X ⊕ (X ≪ 9) ⊕ (X ≪ 17)

When all n blocks have been processed by the compression procedure it will output the 256-bit hash value equal to V (n). A.7. SM4 91

A.7 SM4

SM4, formerly SMS4, is a symmetric block cipher with block size and a key length of 128 bits. The overall structure of SM4 is an unbalanced Feistel network. A Feistel network is a cryptographic technique used in the construction of several block ciphers, including DES, Blowfish, and RC5. In a Feistel network, the input- block is divided into two parts, L and R. An F-function is then applied to R, and the result is XORed with L. After each round L and the modified R swap sides [83]. A Feistel network is "unbalanced" if the length of L and R are not equal [84]. An English translation of the SM4 specification is available at [85] and [86].

Round Function The round function F is defined as

F (X0,X1,X2,X3, rk) = X0 ⊕ T (X1 ⊕ X2 ⊕ X3 ⊕ rk) (A.33) where rk is a round key, derived from the 128bit cipherkey, and T is a permutation defined as T (x) = L(τ(x)). The round keys rk are created using the key expansion algorithm described in the next section. The permutation T is an invertible trans- formation which consists of the nonlinear transformation τ and the linear transfor- mation L. The nonlinear transformation τ creates confusion by applying S-boxes to the input, while the linear transformation L creates diffusion using cyclic shifts as permutation. τ consists of four S-box operations and is defined as

τ(A) = (Sbox(a0), Sbox(a1), Sbox(a2), Sbox(a3)) (A.34) where A = (a0, a1, a2, a3) and each ai is 8 bits. The S-box can be seen in table A.4. The S-box takes an 8-bit input xy and outputs the corresponding 8-bit value in the S-box at row x, column y. The linear transformation L is defined as

L(B) = B ⊕ (B ≪ 2) ⊕ (B ≪ 10) ⊕ (B ≪ 8) ⊕ (B ≪ 24). (A.35) where B is a 32 bit word and B ≪ i denotes the circular shift of B by i bits to the left.

Key Expansion Algorithm The round keys are generated from the 128bit cipher key by the key expansion algorithm. The key expansion creates asymmetry in the round keys and obscures the relationship between the cipher key and round keys, in order to resist some attacks, such as related-key attacks.

Let the cipher key be MK = (MK0,MK1,MK2,MK3) where each MKi is 32 bits. The key expansion can then be described as

(K0,K1,K2,K3) = (MK0 ⊕ FK0,MK1 ⊕ FK1,MK2 ⊕ FK2,MK3 ⊕ FK3) 0 rki = Ki+4 = Ki ⊕ T (Ki+1 ⊕ Ki+2 ⊕ Ki+3 ⊕ CKi) where i = 0, 1, ..., 31 (A.36) 92 Appendix A. Algorithm Design

Table A.4: SM4 S-box.

y 0 1 2 3 4 5 6 7 8 9 a b c d e f 0 d6 90 e9 fe cc e1 3d b7 16 b6 14 c2 28 fb 2c 05 1 2b 67 9a 76 2a be 04 c3 aa 44 13 26 49 86 06 99 2 9c 42 50 f4 91 ef 98 7a 33 54 0b 43 ed cf ac 62 3 e4 b3 1c a9 c9 08 e8 95 80 df 94 fa 75 8f 3f a6 4 47 07 a7 fc f3 73 17 ba 83 59 3c 19 e6 85 4f a8 5 68 6b 81 b2 71 64 da 8b f8 eb 0f 4b 70 56 9d 35 6 1e 24 0e 5e 63 58 d1 a2 25 22 7c 3b 01 21 78 87 x 7 d4 00 46 57 9f d3 27 52 4c 36 02 e7 a0 c4 c8 9e 8 ea bf 8a d2 40 c7 38 b5 a3 f7 f2 ce f9 61 15 a1 9 e0 ae 5d a4 9b 34 1a 55 ad 93 32 30 f5 8c b1 e3 a 1d f6 e2 2e 82 66 ca 60 c0 29 23 ab 0d 53 4e 6f b d5 db 37 45 de fd 8e 2f 03 ff 6a 72 6d 6c 5b 51 c 8d 1b af 92 bb dd bc 7f 11 d9 5c 41 1f 10 5a d8 d 0a c1 31 88 a5 cd 7b bd 2d 74 d0 12 b8 e5 b4 b0 e 89 69 97 4a 0c 96 77 7e 65 b9 f1 09 c5 6e c6 84 f 18 f0 7d ec 3a dc 4d 20 79 ee 5f 3e d7 cb 39 48

FK is a constant system parameter defined as

FK0 = A3B1BAC616,FK1 = 56AA335016

FK2 = 677D919716,FK3 = B27022DC16 T 0 is defined as T 0(x) = L0(τ(x)) where

L0(B) = B ⊕ (B ≪ 13) ⊕ (B ≪ 23). (A.37)

CKi is a constant parameter, which can be calculated by the formulas: CK = (ck , ck , ck , ck ) i i,0 i,1 i,2 i,3 (A.38) cki,j = (4i + j) × 7(mod256) Encryption and Decryption The encryption process alters the plaintext block by repeating the round function

32 times and then applying the reverse transformation R. If (X0,X1,X2,X3) is the 128bit plaintext block, (Y0,Y1,Y2,Y3) is the 128bit cipher block, and rki is a 32bit round key, the encryption process can then be represented as

X = F (X ,X ,X ,X , rk ) where i = 0, 1, ..., 31 i+4 i i+1 i+2 i+3 i (A.39) (Y0,Y1,Y2,Y3) = R(X32,X33,X34,X35) = (X35,X34,X33,X32) A visual representation of one round of the encryption algorithm can be seen in figure A.13. The decryption process is the same as the encryption, except that the round keys are used in reverse order, i.e. if the encryption round keys are (rk0, rk1, ..., rk31) then the decryption round keys are (rk31, rk30, ..., rk0). A.7. SM4 93

Figure A.13: SM4 encryption round i.

Appendix B Mann Whitney U Test

Table B.1 provides explanations of the table headers.

Table B.1: Explanation of Mann-Whitney U test table headers.

Table header Explanation Library The cryptographic library used Operation Specifies the algorithm operation Algorithm 1 The algorithm of sample 1 Algorithm 2 The algorithm of sample 2

N1 Sample size of algorithm 1 N2 Sample size of algorithm 2 R1 Rank sum of algorithm 1’s samples R2 Rank sum of algorithm 2’s samples U1 U value of algorithm 1’s samples U2 U value of algorithm 2’s samples p The value used to compare α = 0.05 with

B.1 Digital Signature

In table B.2 it can be seen that H0 was rejected when comparing the real-time, CPU time, and CPU cycles samples in both Botan and GmSSL during key generation, signing, and verification. The indicates that there is evidence that suggests that the data samples have a significant difference between them, which means that the difference in mean and median values presented in section 5.2.1 are significant.

The Mann-Whitney U test results for the RSS in table B.2 rejects H0 for key generation, singing and verification when comparing SM2 with RSA. This result suggests that there is a significant difference between the RSS of SM2 and RSA. When comparing SM2 with ECDSA, however, the results are inconsistent between the libraries. In GmSSL H0 is rejected for signing but not for key generation and verification, while in Botan H0 is rejected for key generation and signing but not for verification.

95 96 Appendix B. Mann Whitney U Test

Table B.2: Result of Digital Signature Mann Whitnet U test.

Library Operation Algorithm 1 Algorithm 2 N1 N2 R1 R2 U1 U2 p (two-sided) Real-time 4.89e-31 Key SM2 RSA 90 90 4095 12195 8100 0 p < α (reject H0) 4.89e-31 Sign SM2 RSA 90 90 4095 12195 8100 0 p < α (reject H0) 1.53e-07 Verify SM2 RSA 90 90 9980 10310 2215 5885 GmSSL p < α (reject H0) 3.72e-27 Key SM2 ECDSA 90 90 4372 11918 7823 277 p < α (reject H0) 0.0082 Sign SM2 ECDSA 90 90 9070 7220 3125 4975 p < α (reject H0) 0.076 Verify SM2 ECDSA 90 90 7524 8766 4671 3429 p < α (reject H0) 4.89e-31 Key SM2 RSA 90 90 4095 12195 8100 0 p < α (reject H0) 4.89e-31 Sign SM2 RSA 90 90 4095 12195 8100 0 p < α (reject H0) 4.89e-31 Verify SM2 RSA 90 90 4095 12195 8100 0 Botan p < α (reject H0) 4.89e-31 Key SM2 ECDSA 90 90 12195 4095 0 8100 p < α (reject H0) 4.89e-31 Sign SM2 ECDSA 90 90 4095 12195 8100 0 p < α (reject H0) 1.42e-31 Verify SM2 ECDSA 90 90 12163 4127 32 8068 p < α (reject H0) CPU time 4.89e-31 Key SM2 RSA 90 90 4095 12195 8100 0 p < α (reject H0) 4.89e-31 Sign SM2 RSA 90 90 4095 12195 8100 0 p < α (reject H0) 1.53e-31 Verify SM2 RSA 90 90 9980 10310 2215 5885 GmSSL p < α (reject H0) 3.96e-27 Key SM2 ECDSA 90 90 4374 11916 7821 279 p < α (reject H0) 0.0026 Sign SM2 ECDSA 90 90 9199 7091 2996 5104 p < α (reject H0) 0.081 Verify SM2 ECDSA 90 90 7535 8755 4660 3440 p < α (reject H0) 4.89e-31 Key SM2 RSA 90 90 4095 12195 8100 0 p < α (reject H0) 4.89e-31 Sign SM2 RSA 90 90 4095 12195 8100 0 p < α (reject H0) 4.89e-31 Verify SM2 RSA 90 90 4095 12195 8100 0 Botan p < α (reject H0) 4.89e-31 Key SM2 ECDSA 90 90 12195 4095 8100 0 p < α (reject H0) 4.89e-31 Sign SM2 ECDSA 90 90 12195 4095 8100 0 p < α (reject H0) 1.37e-31 Verify SM2 ECDSA 90 90 12164 4126 31 8069 p < α (reject H0) CPU cycles 4.89e-31 GmSSL Key SM2 RSA 90 90 4095 12195 8100 0 p < α (reject H0) 4.89e-31 Sign SM2 RSA 90 90 4095 12195 8100 0 p < α (reject H0) 9.13e-25 Verify SM2 RSA 90 90 11737 4553 458 7642 p < α (reject H0) 6.60e-31 Key SM2 ECDSA 90 90 4104 12186 8091 9 p < α (reject H0) 7.14e-13 Sign SM2 ECDSA 90 90 10654 5636 1541 10559 p < α (reject H0) 0.00067 Verify SM2 ECDSA 90 90 10956 9334 5239 2861 p < α (reject H0) 4.89e-31 Key SM2 RSA 90 90 4095 12195 8100 0 p < α (reject H0) Continued on next page

Botan B.1. Digital Signature 97

Library Operation Algorithm 1 Algorithm 2 N1 N2 R1 R2 U1 U2 p (two-sided) 4.89e-31 Sign SM2 RSA 90 90 4095 12195 8100 0 p < α (reject H0) 4.89e-31 Verify SM2 RSA 90 90 4095 12195 8100 0 p < α (reject H0) 4.89e-31 Key SM2 ECDSA 90 90 12195 4095 0 8100 p < α (reject H0) 4.89e-31 Sign SM2 ECDSA 90 90 4095 12195 8100 0 p < α (reject H0) 4.89e-31 Verify SM2 ECDSA 90 90 12195 4095 0 8100 p < α (reject H0) RSS 8.16e-30 Key SM2 RSA 100 100 14692 5408 358 9642 p < α (reject H0) 2.29e-33 Sign SM2 RSA 100 100 14974.5 5125.5 75.5 9924.5 p < α (reject H0) 5.01e-32 Verify SM2 RSA 100 100 14869 14869 181 9819 GmSSL p < α (reject H0) 0.059 Key SM2 ECDSA 100 100 9275.5 10824.5 5774.5 4225.5 p > α (fail to reject H0) 0.0057 Sign SM2 ECDSA 100 100 8920.5 11179.5 10129.5 3870.5 p < α (reject H0) 0.19 Verify SM2 ECDSA 100 100 9518 10582 5532 4468 p > α (fail to reject H0) 2.45e-34 Key SM2 RSA 100 100 15050 5050 0 10000 p < α (reject H0) 2.396-34 Sign SM2 RSA 100 100 15050 5050 0 10000 p < α (reject H0) 2.228e-34 Verify SM2 RSA 100 100 15050 5050 0 10000 Botan p < α (reject H0) 1.15e-12 Key SM2 ECDSA 100 100 7140 12960 7910 2090 p < α (reject H0) 4.834e-8 Sign SM2 ECDSA 100 100 12282.5 7817.5 2767.5 7232.5 p < α (reject H0) 0.055 Verify SM2 ECDSA 100 100 9265 10835 5785 4215 p > α (fail to reject H0) 98 Appendix B. Mann Whitney U Test

B.2 Hash Function

In table B.3 it can be seen that H0 was rejected when comparing real-time, CPU time, and CPU cycles samples from SM3 and SHA-256. The U values of SM3 are very low or even 0 in all these tests, which means that almost all or all data samples for SM3 is larger than that of SHA-256. These Mann-Whitney U test results suggest that the differences in mean and median presented in section 5.2.2 are significant. The Mann-Whitney U tests comparing the RSS samples of SM3 and SHA-256

results in rejection of H0 in Botan but not in OpenSSL. This means that the data suggest that there is a significant difference in RSS in Botan between the hash algo-

rithms. However, for OpenSSL H0 cannot be rejected. Table B.3: Results of Hash Functions Mann-Whitney U Tests.

Library Algorithm 1 Algorithm 2 N1 N2 R1 R2 U1 U2 p (two sided) Real-time 4.89e-31 OpenSSL SM3 SHA-256 90 90 12195 4095 0 8100 p< α (reject H0) 8.38e-30 Botan SM3 SHA-256 90 90 12109 4181 86 8014 p< α (reject H0) CPU time 4.89e-31 OpenSSL SM3 SHA-256 90 90 12195 4095 0 8100 p< α (reject H0) 8.38e-30 Botan SM3 SHA-256 90 90 12109 4181 86 8014 p< α (reject H0) CPU cycles 4.89e-31 OpenSSL SM3 SHA-256 90 90 12195 4095 0 8100 p< α (reject H0) 4.89e-31 Botan SM3 SHA-256 90 90 12195 4095 0 8100 p< α (reject H0) RSS 0.374 OpenSSL SM3 SHA-256 100 100 9686.5 10413.5 5363.5 4636.5 p> α (fail to reject H0) 5.74e-5 Botan SM3 SHA-256 100 100 11695 8405 3355 6645 p< α (reject H0) B.3. Block Cipher 99

B.3 Block Cipher

Every tests of block cipher in table B.4, B.5 and B.6 shows that p < α and H0 is rejected when comparing the real-time, CPU time and CPU cycles of SM4 with AES-128 and SM4 with AES-128-NI. The tables also shows that the U values of SM4 are all 0, which indicates that all data samples of SM4 is larger than AES-128 or AES-128-NI. This means that there is a significant difference between the algorithms in both libraries. Table B.4: Results of Block Cipher Real-time Mann-Whitney U Tests. (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 α α α α α α α α α α α α α α α α α α α α α α α α (two sided) p p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < 2 U 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 U 2 R 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 1 R 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 2 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 N 1 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 N 2 Real-time AES-128 AES-128 AES-128 AES-128 AES-128 AES-128 AES-128 AES-128 AES-128 AES-128 AES-128 AES-128 AES-128-NI AES-128-NI AES-128-NI AES-128-NI AES-128-NI AES-128-NI AES-128-NI AES-128-NI AES-128-NI AES-128-NI AES-128-NI AES-128-NI Algorithm 1 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 Algorithm de ECB ECB ECB ECB CBC CBC CBC CBC CTR CTR CTR CTR Mo eration Decryption Decryption Encryption Encryption Op enSSL Botan Library Op 100 Appendix B. Mann Whitney U Test

Table B.5: Results of Block Cipher CPU time Mann-Whitney U Tests. (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 α α α α α α α α α α α α α α α α α α α α α α α α (two sided) p p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < 2 U 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 U 2 R 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 1 R 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 2 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 N 1 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 N 2 time CPU AES-128 AES-128 AES-128 AES-128 AES-128 AES-128 AES-128 AES-128 AES-128 AES-128 AES-128 AES-128 AES-128-NI AES-128-NI AES-128-NI AES-128-NI AES-128-NI AES-128-NI AES-128-NI AES-128-NI AES-128-NI AES-128-NI AES-128-NI AES-128-NI Algorithm 1 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 Algorithm de ECB ECB ECB ECB CBC CBC CBC CBC CTR CTR CTR CTR Mo eration Decryption Decryption Encryption Encryption Op enSSL Botan Library Op B.3. Block Cipher 101

Table B.6: Results of Block Cipher CPU cycles Mann-Whitney U Tests. (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) (reject H0) 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 4.89e-31 α α α α α α α α α α α α α α α α α α α α α α α α (two sided) p p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < p < 2 U 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 8100 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 U 2 R 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 4095 1 R 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 12195 2 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 N 1 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 N 2 cycles CPU AES-128 AES-128 AES-128 AES-128 AES-128 AES-128 AES-128 AES-128 AES-128 AES-128 AES-128 AES-128 AES-128-NI AES-128-NI AES-128-NI AES-128-NI AES-128-NI AES-128-NI AES-128-NI AES-128-NI AES-128-NI AES-128-NI AES-128-NI AES-128-NI Algorithm 1 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 SM4 Algorithm de ECB ECB ECB ECB CBC CBC CBC CBC CTR CTR CTR CTR Mo eration Decryption Decryption Encryption Encryption Op enSSL Botan Library Op 102 Appendix B. Mann Whitney U Test

The results from table B.7 shows that SM4 requires more memory than AES-128 and AES-128-NI while encrypting and decrypting with OpenSSL. SM4 uses more memory

than AES-128 when encrypting with Botan. However, H0 is rejected in encryption of SM4 and AES-128-NI and decryption of SM4 with AES-128 and AES-128-NI. This indicates that AES-128 and AES-128-NI requires more memory than SM4 in Botan.

Table B.7: Results of Block Cipher RSS Mann-Whitney U Tests.

Library Operation Algorithm 1 Algorithm 2 N1 N2 R1 R2 U1 U2 p (two sided) RSS 0.255 SM4 AES-128 100 100 10515.5 9584.5 4534.5 5465.5 p> α (fail to reject H0) Encryption 0.892 SM4 AES-128-NI 100 100 9994 10106 5056 4944 p> α (fail to reject H0) OpenSSL 0.00306 SM4 AES-128 100 100 11261.5 8838.5 3788.5 6211.5 p< α (reject H0) Decryption 0.184 SM4 AES-128-NI 100 100 10593.5 9506.5 4456.5 5543.5 p> α (fail to reject H0) 0.977 SM4 AES-128 100 100 10037.5 10062.5 5012.5 4987.5 p> α (fail to reject H0) Encryption 0.00264 SM4 AES-128-NI 100 100 11280 8820 3770 6230 p< α (reject H0) Botan 0.0360 SM4 AES-128 100 100 9192 10908 5858 4142 p< α (reject H0) Decryption 0.0397 SM4 AES-128-NI 100 100 10891.5 9208.5 4158.5 5841.5 p< α (reject H0)

Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden