Copyright by Gokhan Sayilar 2014 the Thesis Committee for Gokhan Sayilar Certiﬁes That This Is the Approved Version of the Following Thesis

Copyright by Gokhan Sayilar 2014 The Thesis Committee for Gokhan Sayilar Certifies that this is the approved version of the following thesis: Cryptoraptor: High Throughput Reconfigurable Cryptographic Processor for Symmetric Key Encryption and Cryptographic Hash Functions APPROVED BY SUPERVISING COMMITTEE: Derek Chiou, Supervisor Mohit Tiwari Cryptoraptor: High Throughput Reconfigurable Cryptographic Processor for Symmetric Key Encryption and Cryptographic Hash Functions by Gokhan Sayilar, B.S. THESIS Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE IN ENGINEERING THE UNIVERSITY OF TEXAS AT AUSTIN December 2014 To my family and many friends... Acknowledgments A major research project like this is never the work of anyone alone. I would like to extend my appreciation especially to the following. First and foremost I offer my sincerest gratitude to my supervisor, Dr. Derek Chiou, for his excellent guidance, caring, and patience. I would also like to thank him for being an open person to ideas, encouraging and helping me to shape my interest and ideas, and giving me the freedom to work in my own way. He’s the funniest advisor and one of the smartest people I know. Besides my advisor, I would like to thank to my second reader, Dr. Mohit Tiwari, for his advises and insightful comments. I am also thankful to my friends in US, Turkey, and other parts of the World for being sources of laughter, joy, and support. Last but not least, I would like to thank my parents and my brother for their continuous love and unconditional support in any decision that I make. I also want to thank to Semiconductor Research Corporation and Freescale Semiconductor, Inc for their financial support which allowed me to undertake this research For any errors or inadequacies that may remain in this work, of course, the responsibility is entirely my own. v Cryptoraptor: High Throughput Reconfigurable Cryptographic Processor for Symmetric Key Encryption and Cryptographic Hash Functions Gokhan Sayilar, M.S.E The University of Texas at Austin, 2014 Supervisor: Derek Chiou In cryptographic processor design, the selection of functional primitives and connection structures between these primitives are extremely crucial to maximize throughput and flexibility. Hence, detailed analysis on the specifications and requirements of existing crypto-systems plays a crucial role in cryptographic processor design. This thesis provides the most comprehensive literature review that we are aware of on the widest range of existing cryptographic algorithms, their specifications, requirements, and hardware structures. In the light of this analysis, it also describes a high performance, low power, and highly flexible cryptographic processor, Cryptoraptor, that is de- signed to support both today’s and tomorrow’s encryption standards. To the best of our knowledge, the proposed cryptographic processor supports the widest range of cryptographic algorithms compared to other solutions in the literature and is the only crypto-specific processor targeting the future standards as well. Unlike previous work, we aim for maximum throughput for all vi known encryption standards, and to support future standards as well. Our 1GHz design achieves a peak throughput of 128Gbps for AES-128 which is competitive with ASIC designs and has 25X and 160X higher throughput per area than CPU and GPU solutions, respectively. vii Table of Contents Acknowledgments v Abstract vi List of Tables xi List of Figures xiii Chapter 1. Introduction and Motivation 1 1.1 Introduction . 1 1.2 Our Contributions . 4 1.3 Thesis Outline . 5 Chapter 2. Related Work 7 2.1 Instruction Set Architecture Extensions . 8 2.2 Algorithm Specific Hardware . 10 2.3 Domain Independent Configurable Processors . 11 2.4 Configurable Cryptographic Processors . 13 Chapter 3. Cryptographic Algorithm Analysis 17 3.1 Existing Workload Characterizations . 17 3.2 Algorithm Selection . 19 3.3 Analysis Methodology . 23 3.4 Detailed Analysis . 24 3.4.1 Operation Classes . 24 3.4.2 Table Lookup Structure . 28 3.4.3 Bundled Operation Patterns . 31 3.4.4 Special Functional Units . 33 3.4.5 Processing Element Width . 37 viii 3.4.6 Connection Structures . 39 3.4.7 Storage Requirements . 41 Chapter 4. Cryptographic Algorithm Instrumentation 43 4.1 Existing Binary Instrumentation of Cryptographic Algorithms 43 4.2 Instrumentation Methodology . 45 4.3 Detailed Analysis . 47 Chapter 5. Cryptoraptor: Reconfigurable Cryptographic Pro- cessor 51 5.1 Design Methodology . 51 5.2 Cryptoraptor . 53 5.3 Execution Tile . 54 5.4 Connection Row . 56 5.5 Processing Element Row . 59 5.6 Processing Element . 60 5.7 Functional Units . 64 5.7.1 Logical Operation Unit (LOU) . 64 5.7.2 Table Lookup Unit (TLU) . 67 5.7.3 Arithmetic Unit (AU) . 69 5.7.4 Permutation/Expansion Unit (PEU) . 71 5.7.5 Shifter/Rotator Unit (SRU) . 72 Chapter 6. Processor Analysis 75 6.1 Implementation . 75 6.2 Timing Analysis . 77 6.3 Area Analysis . 79 6.4 Power Analysis . 82 6.5 Performance Analysis . 86 6.6 Resource Utilization . 93 6.7 Current Algorithm Coverage . 95 6.8 Limitations . 97 ix Chapter 7. Cryptographic Algorithm Mapping 100 7.1 Block Ciphers . 103 7.1.1 Advanced Encryption Standard (AES) . 104 7.1.2 Blowfish . 107 7.1.3 Camellia . 109 7.1.4 CAST-128 . 112 7.1.5 Data Encryption Standard (DES) . 115 7.1.6 GOST . 118 7.1.7 Kasumi . 119 7.1.8 Rivest Cipher 5 (RC5) . 122 7.1.9 SEED . 124 7.1.10 Twofish . 126 7.2 Stream Ciphers . 129 7.2.1 Rivest Cipher 4 (RC4) . 129 7.2.2 Phelix . 131 7.3 Cryptographic Hash Functions . 133 7.3.1 Message Digest Algorithm-4 (MD4) . 134 7.3.2 Message Digest Algorithm-5 (MD5) . 136 7.3.3 Secure Hash Algorithm-1 (SHA1) . 138 7.3.4 Secure Hash Algorithm-2 (SHA2) . 139 Chapter 8. Future Work 143 Chapter 9. Conclusion 145 Appendices 147 Appendix A. Detailed Operation Classes Usage 148 Appendix B. Operation Clusters 153 Appendix C. Operation Bundles 155 Appendix D. Detailed Processing Element Width Usage 157 Bibliography 159 x List of Tables 3.1 The distribution number of parallel lookup operation in cryptographic algorithms . 31 3.2 The distribution of XOR and SBOX patterns in cryptographic algorithms . 33 3.3 The distribution of Shift/rotate and logic operation patterns in cryptographic algorithms . 33 3.4 The distribution of XOR and Arithmetic operation patterns in cryptographic algorithms . 34 3.5 The special functional unit requirements in cryptographic algorithms . 34 3.6 The modular arithmetic base distribution in cryptographic algorithms . 36 4.1 Instruction Classes ......................... 46 4.2 Instruction Class Frequencies .................... 47 4.3 Operation Class Frequencies .................... 47 4.4 Distribution of Memory Accesses . 49 4.5 Distribution of Data Read/Write Granularities . 49 5.1 The control structure of one PE connector . 57 5.2 The input selection structure of PE connector (least significant 4 bits of 6 selection bits) . 58 5.3 The input structure of PE . 61 5.4 The output structure of PE . 62 5.5 The control signal structure of PE . 62 5.6 The Configurable Logic Block functionality . 66 5.7 The Table Lookup Unit functionality . 68 5.8 The Arithmetic Unit functionality . 70 5.9 The Bit Selector control structure . 72 5.10 The Shifter/Rotator Unit functionality . 73 xi 5.11 The Operation Block functionality . 74 6.1 The cycle time of functional units in PE . 77 6.2 The cycle time comparison of functional units with bundles . 78 6.3 The cycle time of sub-modules in Cryptoraptor . 79 6.4 The area comparison between Design Compiler and CACTI . 80 6.5 The area of functional units in PE . 80 6.6 The area comparison of functional units with bundles . 81 6.7 The area of sub-modules in Cryptoraptor . 81 6.8 The power usage comparison for memory blocks . 82 6.9 The power usage of functional units in PE . 83 6.10 The power usage comparison of functional units with bundles . 84 6.11 The power usage of modules in Cryptoraptor . 84 6.12 Power usage comparison of GPPs . 85 6.13 AES Performance comparison of ASIC solutions . 88 6.14 AES Performance comparison of FPGA solutions . 90 6.15 AES Performance comparison of GPP solutions . 91 6.16 Performance summary of algorithms on Cryptoraptor . 92 6.17 Resource utilization summary of mapped algorithms on Cryptoraptor 94 6.18 Resource utilization summary of mapped algorithms on Cryptoraptor 94 6.19 The current coverage of cryptographic algorithms . 96 7.1 Algorithm summary and selection for mapping process . 101 7.2 Instruction List . 102 A.1 The special functional unit requirements in cryptographic algorithms . 148 B.1 Operation clusters and patterns . 153 C.1 Operation patterns . 155 D.1 Operation width (PE way) . 157 xii List of Figures 2.1 The distribution of energy dissipation in an in-order RISC processor [92] . 12 3.1 The use of operation classes in cryptographic algorithm classes 25 3.2 The ratio of different table sizes used in cryptographic algorithms 29 3.3 The ratio of different table entry widths used in cryptographic algorithms . 30 3.4 The distribution of logical operation patterns in cryptographic algorithms . 32 3.5 The coverage ratio of algorithms that require modular arithmetic 37 3.6 The distribution of algorithms requires 1, 2, 4, 8 and 16-way processing elements . 38 3.7 The trend of connection structure among processing elements used for implementing algorithms . 40 4.1 Instruction and Operation Class Distribution . 48 5.1 The internal structure of Cryptoraptor . 53 5.2 The high level structure of Execution Tile .

Load more