Downloaded on 2017-02-12T10:18:32Z Cryptographic Coprocessors for Embedded Systems
Total Page:16
File Type:pdf, Size:1020Kb
Title Cryptographic coprocessors for embedded systems Author(s) Hamilton, Mark Publication date 2014 Original citation Hamilton, M. 2014. Cryptographic coprocessors for embedded systems. PhD Thesis, University College Cork. Type of publication Doctoral thesis Rights © 2014, Mark Hamilton http://creativecommons.org/licenses/by-nc-nd/3.0/ Embargo information No embargo required Item downloaded http://hdl.handle.net/10468/1770 from Downloaded on 2017-02-12T10:18:32Z Cryptographic Coprocessors for Embedded Systems Mark Hamilton Department of Electrical and Electronic Engineering National University of Ireland, Cork Research Supervisor : Dr. William P. Marnane Head of Department : Prof. Nabeel Riza A thesis submitted for the degree of Doctor of Philosophy January 24, 2014 Abstract In the field of embedded systems design, coprocessors play an important role as a component to increase performance. Many embedded systems are built around a small General Purpose Processor (GPP). If the GPP cannot meet the performance requirements for a certain operation, a coprocessor can be included in the design. The GPP can then offload the computationally in- tensive operation to the coprocessor; thus increasing the performance of the overall system. A common application of coprocessors is the acceleration of cryptographic algorithms. The work presented in this thesis discusses co- processor architectures for various cryptographic algorithms that are found in many cryptographic protocols. Their performance is then analysed on a Field Programmable Gate Array (FPGA) platform. Firstly, the acceleration of Elliptic Curve Cryptography (ECC) algorithms is investigated through the use of instruction set extension of a GPP. The performance of these algorithms in a full hardware implementation is then investigated, and an architecture for the acceleration the ECC based digital signature algorithm is developed. Hash functions are also an important component of a cryptographic system. The FPGA implementation of recent hash function designs from the SHA- 3 competition are discussed and a fair comparison methodology for hash functions presented. Many cryptographic protocols involve the generation of random data, for keys or nonces. This requires a True Random Number Generator (TRNG) to be present in the system. Various TRNG designs are discussed and a secure implementation, including post-processing and failure detection, is introduced. Finally, a coprocessor for the acceleration of operations at the protocol level will be discussed, where, a novel aspect of the design is the secure method in which private-key data is handled. I, Mark Hamilton, certify that this thesis is my own work and I have not obtained a degree in this university or elsewhere on the basis of the work submitted in this thesis. Mark Hamilton Acknowledgements I would like to thank all those who have helped me during the course of completing this thesis, including, but not limited to, my supervisor Liam Marnane for giving me the opportunity to pursue this PhD, and for his guidance over the past few years; Christophe Negre and Emanuel Popovici for taking the time to read this thesis; the staff of the electrical engineering department in UCC; all of the postgraduate students and postdocs that I have worked with over the past few years; Arnaud Tisserand for inviting me to collaborate with him in Lannion; and my family for their support over the last few years. Contents Contents i 1 Introduction 1 1.1 Motivation .................................. 1 1.2 Contribution ................................. 3 1.3 Publications.................................. 4 2 Cryptography for Embedded Systems 6 2.1 Introduction.................................. 6 2.2 IntroductiontoCryptographicSystems . .... 7 2.3 MathematicalBackground. 8 2.3.1 Groups ................................ 8 2.3.2 Rings ................................. 9 2.3.3 Fields ................................. 9 2.3.4 FiniteFields ............................. 10 2.4 GeneratingRandomNumbers. 10 2.5 Private-KeyCryptography. 11 2.6 Public-KeyCryptography . 12 2.6.1 PublicKeyInfrastructure . 13 2.6.1.1 Digital Signatures . 13 2.6.1.2 Digital Certificates . 16 2.7 CryptographicProtocols . 16 2.8 TransportLayerSecurity . 17 2.8.1 TLSRecordProtocol . .. .. .. .. .. .. .. 18 2.8.2 TLSAlertProtocol.. .. .. .. .. .. .. .. 18 2.8.3 TLSChangeCipherSpecProtocol . 19 2.8.4 TLS Application Data Protocol . 19 i CONTENTS 2.8.5 TLSHandshakeProtocol . 20 2.9 FieldProgrammableGateArrays . 22 2.9.1 MicroblazeProcessor. 23 2.9.2 FSLBus................................ 24 2.10 FPGAsandCryptography. 25 2.11 SideChannelAttacks ............................ 27 2.12RelatedWork................................. 28 2.12.1 Isobeetal. .............................. 28 2.12.2 Wangetal. .............................. 29 2.12.3 InstructionSetExtension . 30 2.12.4 Secure Key Management . 30 2.13Discussion................................... 32 3 Hardware-Software Co-Design for Elliptic Curve Cryptography 34 3.1 Introduction.................................. 34 3.2 BackgroundtoECC ............................. 35 3.2.1 Group operations on Elliptic Curves . 35 3.2.2 JacobianCoordinates . 36 3.2.3 Co-Z Arithmetic ........................... 38 3.3 Point Scalar Multiplication . 39 3.3.1 SPA Resistant Point Scalar Multiplication . 40 3.3.1.1 Combined double-add operation . 43 3.3.1.2 (X,Y )-only operations . 43 3.4 Montgomery Multiplication . 45 3.5 InstructionSetExtensionforECC . 47 3.5.1 Software................................ 47 3.6 Custom Hardware Acceleration . 49 3.6.1 Montgomery Multiplication in Hardware . 49 3.6.2 Instruction Set Extension Results . 50 3.7 Optimisations for the q = 2n 1case.................... 53 − 3.7.1 SerialMultiplier ........................... 54 3.7.2 BoothMultiplier ........................... 55 3.7.3 Multiplier with BRAMs and DSP48Es . 56 3.7.3.1 Multiplier Architecture . 58 3.7.3.2 Decomposing the Multiplicands . 59 3.7.3.3 DSPBlocks......................... 61 ii CONTENTS 3.7.3.4 BlockRAM......................... 62 3.7.3.5 TheAdder ......................... 62 3.7.3.6 Controller . 63 3.7.3.7 Multiplier Operation . 64 3.7.4 Results ................................ 64 3.8 Discussion................................... 66 4 FPGA Implementation of an ECDSA Coprocessor 68 4.1 Introduction.................................. 68 4.2 ECCProcessor ................................ 68 4.2.1 Fq Addition/Subtraction. 69 4.2.2 Fq Inversion.............................. 70 4.3 ComparingCoordinatePerformance . 71 4.4 Applications of Elliptic Curves in TLS . 76 4.4.1 Elliptic Curve Diffie-Hellman Key Exchange (ECDH) . 77 4.4.2 Elliptic Curve Digital Signature Algorithm (ECDSA) . ..... 78 4.4.3 DPAresistantECDSA. 80 4.4.4 Simultaneous multiple point multiplication . ..... 80 4.5 RelatedWork................................. 81 4.6 ECDSAProcessorArchitecture . 83 4.7 ImplementationResults . 84 4.8 Discussion................................... 87 5 Hash Functions and their Applications 89 5.1 Introduction.................................. 89 5.2 HashFunctionDesign ............................ 90 5.2.1 Implementation Options . 90 5.3 HashFunctionUsage............................. 91 5.4 HashFunctionsandTLS........................... 92 5.4.1 HMACFunction ........................... 92 5.4.2 TLSPseudorandomFunction . 93 5.4.3 Keyderivation ............................ 94 5.4.4 Finished Message Calculation . 94 5.5 SHAAlgorithms ............................... 95 5.5.1 SHA256................................ 95 5.6 SHA-3Competition ............................. 97 iii CONTENTS 5.7 BlueMidnightWish ............................. 97 5.8 Hamsi ..................................... 99 5.9 CubeHash................................... 100 5.10 Fair Comparison Methodology . 102 5.10.1 WrapperOverview . 102 5.10.2 Communications Protocol . 103 5.10.3 PaddingProtocol. 104 5.11 ImplementationResults . 105 5.12Discussion................................... 107 6 True Random Number Generators 109 6.1 Introduction.................................. 109 6.2 ImplementationofTRNGs. 110 6.2.1 Analysing the Quality of TRNG Output Data . 111 6.3 Vasyltsovetal. ................................ 113 6.4 VarcholaandDrutarovsk´y. 116 6.5 DichtlandGoli´c ............................... 118 6.6 ComparingtheResults. .. .. .. .. .. .. .. .. 122 6.7 TRNGFailureDetection. 123 6.7.1 FPGA Implementation . 124 6.8 Post-processingofTRNGs. 125 6.9 Secure Architecture Implementation Results . ....... 125 6.10Discussion................................... 126 7 Coprocessor Design For the Protocol Level 128 7.1 Introduction.................................. 128 7.2 DesigningaSecureCoprocessor . 129 7.3 RequirementsofaTLSCoprocessor . 131 7.3.1 Public-key algorithms . 131 7.3.2 Private-key Algorithms . 131 7.3.3 HashingOperations . 132 7.3.4 Operations Involving Private Keys . 132 7.4 EncryptionforTLS ............................. 132 7.4.1 CipherBlockChaining. 134 7.4.2 AES Implementation . 135 7.5 SHA256ImplementationforTLS . 136 iv CONTENTS 7.6 DesignOverview ............................... 137 7.7 Hardware/Software Partition . 137 7.8 CoprocessorOperation. 139 7.9 TestPlatform................................. 141 7.9.1 Microblaze Configuration . 141 7.10 ImplementationResults . 142 7.11Conclusions .................................. 143 8 Conclusions and Future Work 145 8.1 ContributiontotheField . 145 8.2 FutureWork ................................. 147 A Co-Z Algorithms 148 A.1 Point Doubling Formulæ with Update in Homogeneous