Finite Field Arithmetic and Its Application in Cryptography
Total Page:16
File Type:pdf, Size:1020Kb
UCLA UCLA Electronic Theses and Dissertations Title Finite Field Arithmetic and its Application in Cryptography Permalink https://escholarship.org/uc/item/7gj7w5mz Author Ansari, Bijan Publication Date 2012 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California University of California Los Angeles Finite Field Arithmetic and its Application in Cryptography A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Electrical Engineering by Bijan Ansari 2012 c Copyright by Bijan Ansari 2012 Abstract of the Dissertation Finite Field Arithmetic and its Application in Cryptography by Bijan Ansari Doctor of Philosophy in Electrical Engineering University of California, Los Angeles, 2012 Professor Frank Chang, Co-chair Professor Ingrid Verbauwhede, Co-chair The groundbreaking idea of public key cryptography and the rapid expansion of the internet in the 80s opened a new research area for finite field arithmetic. The large size of fields in cryptography demands new algorithms for efficient arithmetic and new metrics for estimat- ing finite field operation performance. The area, power, and timing constraints on hand-held and embedded devices necessitate accurate models to achieve expected goals. Additionally, cryptosystems need to protect their secrets and hide their internal operation states against side-channel attacks. Fault-injection attacks or random errors reduce the security of a cryp- tosystem and can help a cryptanalyst to extract a system’s secrets. This dissertation covers various aspects of finite field arithmetic to provide predictable, efficient, and secure elements for cryptography. We provide architecture for an elliptic curve processor (ECP), which is essentially a finite field processor. We also provide finite field multipliers over polynomial and optimal normal bases for pipeline and parallel architectures. To further analyze the behavior of finite field multipliers, we formalize timing, area, and energy consumption over binary extension fields. To ensure robustness of the multiplication operation, we provide concurrent error detection (CED) schemes for polynomial and normal base multipliers and provide the probability of error detection. ii The dissertation of Bijan Ansari is approved. Babak Daneshrad Dejan Markovc Milos Ercegovac Ingrid Verbauwhede, Committee Co-chair Frank Chang, Committee Co-chair University of California, Los Angeles 2012 iii To my homeland heroes who defended their country during the 1980-1988 war, ... and to the victims of the Iranian Green movement. iv Table of Contents 1 Introduction ...................................... 1 2 High Performance Architecture of Elliptic Curve Scalar Multiplication . 6 2.1 Introduction.................................... 6 2.2 Review of the Montgomery Scalar Multiplication . 10 2.3 Architecture for Scalar Multiplication . 13 2.4 Implementation .................................. 20 2.5 Conclusion..................................... 35 3 Concurrent Error Detection for Finite Field Multiplication ........ 37 3.1 Introduction.................................... 37 3.2 Parity Definition and its properties . 41 3.3 Error Detection in Sequential multipliers . 45 3.4 Error Detection in Parallel Polynomial Basis Multiplication . 52 3.5 Error Detection in Digit Serial Multipliers over Extension Fields . ... 63 3.6 ErrorModelandCoverage . .. .. 65 3.7 Conclusion..................................... 69 4 Concurrent Error Detection for Type II Optimal Normal Basis Multipliers 71 4.1 Introduction.................................... 71 4.2 Normal Basis Multiplication using Polynomials . 72 4.3 ConcurrentErrorDetection . 77 4.4 Conclusion..................................... 89 v 5 FO4-based Models for Area, Delay and Energy of Polynomial Multiplica- tion over Binary Fields ................................. 90 5.1 Introduction.................................... 90 5.2 Analysis of Polynomial Multiplication . 92 5.3 The Karatsuba-Ofman Multiplication Algorithm . 104 5.4 FutureWork.................................... 107 5.5 Conclusion..................................... 108 6 Efficiency Metrics for Elliptic Curve Crypto-Processors .......... 111 6.1 Introduction.................................... 111 6.2 EnergyConsumptioninECPs . .. .. 113 6.3 NormalizedEnergy ................................ 114 6.4 ComparisonofECPs ............................... 119 6.5 Conclusion..................................... 122 7 Conclusion and Future Work ........................... 123 References ......................................... 125 vi List of Figures 2.1 Parallel execution of multiplication and addition/squaring in one iteration . 16 2.2 Parallel with no idle cycle in the middle of the iteration . 17 2.3 Parallel with no idle cycle in entire scalar multiplication loop . 18 2.4 ImplementedArchitecture . 22 2.5 Pseudo-pipelined finite field multiplier . 25 2.6 Timing of scalar multiplication with no idle cycle in entire loop . 28 2.7 FPGA synthesis for various finite fields GF (2m)................ 30 2.8 FPGA synthesis for various w/m for GF (2163)................. 30 2.9 Swappingmechanism............................... 34 3.1 Error detection model for sequential multiplication over GF (qm)....... 47 3.2 HED for finite field multiplication. The dash-dot line indicates the critical pathfortheerrorsignal. ............................. 54 3.3 Implementationresults .............................. 58 3.4 CED for digit serial finite field multiplication . 65 4.1 A polynomial multiplier with bi-residue parity protection and j +k parity bits (PPM). ...................................... 78 4.2 Parallel ONB-II multiplier with CED . 81 4.3 Parallel ONB-II with CED, using a single polynomial multiplier. 85 5.1 Functional diagram of the polynomial multiplication over F2[x], where critical pathishighlighted. ................................ 93 5.2 Gates in the critical path of 16 4, 128 32 and 256 64 bit multipliers, × × × produced by the Synopsys Design Compiler synthesis tool. ... 96 vii 5.3 Buffers driving NAND gates in a polynomial multiplier. The last buffers can drive3NANDgates. ............................... 98 5.4 Gate delay of 26 synthesized multipliers compared with delay equations (5.8), (5.2)and(5.3)................................... 99 5.5 Areaevaluation .................................. 101 5.6 XOR tree and the switching probability at each node . 102 5.7 Energy consumption for one polynomial multiplication: Comparing Synopsys powerreportwiththeestimated energyin(5.12). 104 Energy F 5.8 Predicted Area for Polynomial Multiplication over 2[x] ........... 105 5.9 Gate delay for multiplication using KOA . 107 5.10 Conventional polynomial multiplication delay versus KOA delay for m ∈ 8, 16, 32, 64, 128 ................................. 108 { } 5.11 Equalizing delay path in in parallel polynomial multiplier . 109 viii List of Tables 2.1 Typical Number of Clock Cycles of Basic Finite Field Operations . .. 9 2.2 Performance with Various Enhancement Methods Assuming That m/w = 4, ⌈ ⌉ A = S =3ClockCycles ............................. 21 2.3 k versusCriticalPaths .............................. 24 2.4 State Diagram of the Pseudo-pipelined Finite Field Multiplier . 26 2.5 Synthesis, Place and Route Results for Virtex4:XC4VLX200 Using Two Dif- ferentSynthesizers ................................ 29 2.6 FPGA Synthesis for Various Finite Fields GF (2m)............... 29 2.7 m versus Slices over GF (2163)forVirtex2:XC2V2000 . 30 ⌈ w ⌉ 2.8 Performance of the Scalar Multipliers . 33 3.1 Delay and area for the HED over GF (2m) ................... 55 3.2 FPGA implementation results for HED v.s. LUM over GF (2163) ...... 62 3.3 Delay and area overhead comparison over GF (2163) .............. 64 4.1 Delay and gate count for modules in Fig. 4.1 and 4.2 . 80 4.2 Various ONB II multipliers with concurrent error detection . ..... 88 4.3 FPGA implementation over GF (2173) using KOA for g(x)= xk +1 ..... 89 5.1 Estimates of Normalized Input and Intrinsics Capacitance of LogicGates . 96 6.1 Technology Scaling (S =0.7) .......................... 115 6.2 Power/EnergyconsumptionofECPs . 118 ix Acknowledgments I would like to thank my committee members and my supervisor. I would also like to thank Deeona Columbia, Office of Student Affairs Manager, for her support. x Vita 2004 M.S. (Electrical Engineering), University of Windsor, Canada. 2004–2006 Research Assistance, Centre for Applied Cryptographic Research, Univer- sity of Waterloo, Canada. 2008–present Qualcomm Inc. San Diego, CA Publications Bijan Ansari and Ingrid Verbauwhede. , “FO4-based Models for Area, Delay and Energy of Polynomial Multiplication over Binary Fields.” , In IEEE Workshop on Signal Processing Systems: Design and Implementation, pp. 420–425, 2010. Bijan Ansari and Ingrid Verbauwhede. , “A Hybrid Scheme for Concurrent Error Detection of Multiplication over Finite Fields.” , In Proceedings of the 2010 IEEE 25th Inter- national Symposium on Defect and Fault Tolerance in VLSI Systems, DFT ’10, pp. 399–407, Washington, DC, USA, 2010. IEEE Computer Society. Bijan Ansari and Ingrid Verbauwhede. , “A new Parallel Scheme for Type II ONB Multi- plication with Concurrent Error Detection” ,under submission Bijan Ansari and Ingrid Verbauwhede. , “Comments On “On Concurrent Detection of Errors in Polynomial Basis Multiplication”” ,under submission Bijan Ansari and Ingrid Verbauwhede. , “Concurrent Error Detection for Polynomial Basis Multipliers over GF (qm)” ,under submission