Analysis and Implementation of Estream and SHA-3 Cryptographic Algorithms
Total Page:16
File Type:pdf, Size:1020Kb
COOPER UNION FOR THE ADVANCEMENT OF SCIENCE AND ART ALBERT NERKEN SCHOOL OF ENGINEERING Analysis and Implementation of eSTREAM and SHA-3 Cryptographic Algorithms by Deian Stefan A thesis submitted in partial fulfillment of the requirements for the degree of Master of Engineering May 10, 2011 Advisor Dr. Fred L. Fontaine COOPER UNION FOR THE ADVANCEMENT OF SCIENCE AND ART ALBERT NERKEN SCHOOL OF ENGINEERING This thesis was prepared under the direction of the Candidate’s Thesis Advisor and has received approval. It was submitted to the Dean of the School of Engineering and the full Faculty, and was approved as partial fulfillment of the requirements for the degree of Master of Engineering. Dr. Simon Ben Avi Dean, School of Engineering Dr. Fred L. Fontaine Candidate’s Thesis Advisor Abstract Invaluable benchmarking efforts have been made to measure the perfor- mance of eSTREAM portfolio stream ciphers and SHA-3 hash function can- didates on multiple architectures. In this thesis we contribute to these ef- forts; we evaluate the performance of all eSTREAM ciphers and all second- round SHA-3 candidates on NVIDIA Graphics Processing Units (GPUs). Complementarity, we present the first implementation of the cube at- tack in a multi-GPU setting. Our framework proves useful in the practi- cal analysis of algorithms by providing a generic black box interface and speedup factors over 100¢. Demonstrating its use we analyze two eS- TREAM stream ciphers, MICKEY v2 and Trivium. We find that MICKEY is not susceptible to low-degree cube attacks, while our Trivium analysis confirms previous results, in addition to several new equations applicable to a partial key recovery. We also extend the linear differential cryptanalysis framework intro- duced by Brier, Khazaei, Meier and Peyrin at ASIACRYPT 2009 using two new trail search algorithms, and several optimizations. We find several collision and second preimage attacks on simplified and round-reduced variants of BLAKE and CubeHash, two SHA-3 second round candidates. Using the extended framework we also present improved collision attacks on CubeHash, when compared to previous results. In combination with the condition function concept, our new trail search algorithms lead to much faster collision attacks. We demonstrate this by providing a real col- lision for CubeHash-5/96. Additionally our randomized trail search finds highly probable linear differential trails and leads to significantly better attacks for up to eight rounds of CubeHash. Acknowledgments I would like to thank my advisor Fred L. Fontaine for his encouragement and support on this work and my Electrical Engineering studies at The Cooper Union. I am especially grateful for his efforts in providing an ex- ¦ cellent environment, the S ProCom2 lab, in which I have flourished as both a student and researcher; our collaboration is one of the most rewarding and influential experiences to date. I am grateful to Peter Cooper and The Cooper Union for providing me with the opportunity to gain an exceptional education. I am thankful for having had the opportunity to work with and study under Jeff Hakner, Danfeng Yao, Kausik Chatterjee, William Donahue, and Alan Berenbaum. They have in many ways influenced my choice to pursue research studies. I would like to thank all my coauthors for the fruitful discussions and great moments we exchanged. In particular, I would like to thank David B. Nummey, Christopher Mitchell, Jared Harwayne-Gidansky, Ishaan L. Dalal, and Matthew Epstein. Furthermore, I am especially grateful to Arjen K. Lenstra, Joppe Boss and Shahram Khazaei for my summer study at LACAL, the work of which led to this thesis. Finally, I would like to thank my friends and family for supporting me during my studies. Most of all, I am grateful to my parents for their ongoing support and encouragement. Contents List of Figures viii List of Tablesx List of Algorithms xi 1 Introduction1 1.1 Stream Ciphers . .2 1.2 Hash Functions . .3 1.3 High-Performance Cryptologic Computing . .4 1.4 Related Work . .6 1.5 Contributions . .8 1.5.1 Implementation Contributions . .8 1.5.2 Analysis Contributions . 10 1.6 Thesis Outline . 11 2 Target primitives 12 2.1 eSTREAM Stream Ciphers . 13 2.1.1 Trivium . 14 2.1.2 MICKEY v2 . 16 2.2 SHA-3 Candidates . 19 2.2.1 BLAKE . 20 2.2.2 CubeHash . 25 3 SMP and GPU Parallel Programming 28 3.1 OpenMP and SMPs . 29 3.1.1 SMP Architectures . 30 3.1.2 OpenMP Programming . 31 3.2 CUDA and GPUs . 42 3.2.1 GPU Architectures . 43 3.2.2 CUDA Programming . 45 v CONTENTS vi 4 Cube Attack 54 4.1 Preliminaries . 55 4.2 Preprocessing . 60 4.2.1 Finding Maxterms . 61 4.2.2 Superpoly Reconstruction . 65 4.3 Online Attack . 68 5 Linear Differential Cryptanalysis 71 5.1 Constructing Differential Trails . 73 5.1.1 Notation . 74 5.1.2 Raw Probability . 74 5.1.3 Forward Differential Trails . 76 5.1.4 Reverse Differential Trails . 77 5.1.5 Randomized Differential Trails . 78 5.2 Finding Collisions Using Condition Functions . 79 5.3 Freedom Degrees Use: Dependency Table . 85 6 Cryptography and Cryptanalysis on GPUs 91 6.1 GPU Implementation of eSTREAM Ciphers . 93 6.1.1 gSTREAM Framework . 93 6.1.2 Implementation of eSTEAM Ciphers . 96 6.2 GPU Implementation of SHA-3 Candidates . 100 6.2.1 AES-Inspired SHA-3 Candidates . 101 6.2.2 Other SHA-3 Candidates . 104 6.3 Multi-GPU Implementation of the Cube Attack . 108 6.3.1 Finding Maxterms . 110 6.3.2 Superpoly Reconstruction . 113 6.3.3 Performance Measurements . 113 7 Cryptanalysis Results 117 7.1 Applying the Cube Attack . 117 7.1.1 Trivium . 118 7.1.2 MICKEY . 118 7.2 Applying Linear Differential Cryptanalysis . 121 7.2.1 BLAKE . 122 7.2.2 CubeHash . 123 8 Conclusion 136 A BLAKE Constants 138 B MICKEY v2 Constants 141 C Software Implementation of Grain 143 CONTENTS vii D gSTREAM API and Implementations 148 D.1 gSTREAM API . 148 D.2 MICKEY v2 Example Implementation . 159 D.3 Trivium Example Implementation . 172 E XOR-Shift RNG Implementation 184 F Differential Trails 187 F.1 BLAKE differential trails . 187 F.2 CubeHash-¦/t10, 20, 24, 36, 96u differential trails . 190 Bibliography 213 List of Figures 3.1 Uniform memory access architectures . 31 3.2 Non-uniform memory access architectures . 32 3.3 Fork-join programming model . 33 3.4 GT200 Texture Processor Cluster . 43 3.5 Multi-block parallel reduction using shared memory. 52 4.1 Preprocessing phase of the cube attack . 60 4.2 On-line phase of the cube attack . 69 6.1 Program flow using the gSTREAM framework. 94 6.2 32-bit Grain Core i7 960 (2.8GHz) benchmarking results . 99 6.3 Finding a maxterm for high-dimensional cube . 110 6.4 Finding a maxterm for medium-dimensional cube . 111 7.1 Linear differential framework using OpenMP . 121 r 7.2 CubeHash’s Compresslin ........................... 126 viii List of Tables 2.1 eSTREAM portfolio algorithms. 13 2.2 SHA-3 second-round candidates. 20 3.1 Commonly used parallel construct clauses. 34 3.2 Clauses supported by the work-share constructs. 37 3.3 Commonly used synchronization constructs. 41 3.4 Commonly-used OpenMP runtime functions. 42 6.1 Performance estimates for eSTREAM ciphers . 97 6.2 GPU performance results and estimates for eSTREAM ciphers . 98 6.3 Mix-column estimates . 103 6.4 Count of AES-like operations in SHA3-candidate designs . 105 6.5 GPU performance estimates for non-AES SHA3-candidates . 107 6.6 GPU performance results for SHA-3 non-AES based candidates . 109 6.7 Cube attack finding maxterms performance . 114 6.8 Cube attack superpoly reconstruction performance . 115 7.1 Trivium analysis, confirming existing results . 119 7.2 Trivium analysis, new results . 120 7.3 BLAKE32 analysis results . 122 7.4 BLAKE64 analysis results . 123 7.5 Minimal number of conditions found for l = 3 ............ 125 7.6 Minimal number of conditions y found with the randomized search 126 7.7 Logarithmic theoretical complexities cD of improved collision attacks.127 7.8 Cubehash (t, y) ............................... 128 7.9 CubeHash collision complexities cD with 1-bit modifications . 129 7.10 CubeHash collision complexities cD with 2-bit modifications . 129 7.11 CubeHash collision complexities cD with 4-bit modifications . 130 7.12 CubeHash collision complexities cD with 8-bit modifications . 130 7.13 Number of conditions per round theoretical complexities. 133 7.14 Partition sets for CubeHash-5/96 collision trail . 135 A.1 BLAKE intial values . 138 A.2 BLAKE permutation function . 139 ix LIST OF TABLES x A.3 BLAKE constant values . 140 List of Algorithms 2.1 Trivium state update function . 15 2.2 MICKEY state update function . 17 2.3 Clockr - clocking MICKEY’s r register . 18 2.4 Clocks - clocking MICKEY’s s register . 18 2.5 BLAKE’s compression function . 22 2.6 BLAKE-32 Gi function . 23 2.7 Round funtion for CubeHash . 27 4.1 Finding a maxterm . 66 4.2 Superpoly reconstruction . 68 5.1 Calculating the probabilistic-effect and dependancy tables . 87 5.2 Creating input and output partitions of a condition function . 88 5.3 Tree-based backtracking preimage search . 89 5.4 Computing adaptive backtrack steps . 90 xi Chapter 1 Introduction Security has become a crucial aspect in the design and use of computer systems and networks. Whether one is designing a wireless communication system, web application, or network protocol, addressing security is an essential engineering criterion. Though a well-designed system is built from a multitude of compo- nents, the use of cryptography as a building block is almost unanimous.