Privacy Preserving Computations Accelerated Using FPGA Overlays

Privacy Preserving Computations Accelerated using FPGA Overlays A Dissertation Presented by Xin Fang to The Department of Electrical and Computer Engineering in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Engineering Northeastern University Boston, Massachusetts August 2017 To my family. i Contents List of Figures v List of Tables vi Acknowledgments vii Abstract of the Dissertation ix 1 Introduction 1 1.1 Garbled Circuits . 1 1.1.1 An Example: Computing Average Blood Pressure . 2 1.2 Heterogeneous Reconfigurable Computing . 3 1.3 Contributions . 3 1.4 Remainder of the Dissertation . 5 2 Background 6 2.1 Garbled Circuits . 6 2.1.1 Garbled Circuits Overview . 7 2.1.2 Garbling Phase . 8 2.1.3 Evaluation Phase . 10 2.1.4 Optimization . 11 2.2 SHA-1 Algorithm . 12 2.3 Field-Programmable Gate Array . 13 2.3.1 FPGA Architecture . 13 2.3.2 FPGA Overlays . 14 2.3.3 Heterogeneous Computing Platform using FPGAs . 16 2.3.4 ProceV Board . 17 2.4 Related Work . 17 2.4.1 Garbled Circuit Algorithm Research . 17 2.4.2 Garbled Circuit Implementation . 19 2.4.3 Garbled Circuit Acceleration . 20 ii 3 System Design Methodology 23 3.1 Garbled Circuit Generation System . 24 3.2 Software Structure . 26 3.2.1 Problem Generation . 26 3.2.2 Layer Extractor . 30 3.2.3 Problem Parser . 31 3.2.4 Host Code Generation . 32 3.3 Simulation of Garbled Circuit Generation . 33 3.3.1 FPGA Overlay Architecture . 33 3.3.2 Garbled Circuit AND Overlay Cell . 34 3.3.3 Garbled Circuit XOR Overlay Cell . 36 3.3.4 Embedded Memory . 38 3.3.5 Workload Dispatcher and Data Controller . 39 3.4 Heterogeneous Computing System using ProceV Board . 42 3.4.1 Communicating between Host and FPGA . 42 3.4.2 Accessing DDR Memory On Board . 44 3.4.3 Accessing On-Chip Registers . 47 3.4.4 Workload Dispatcher and Data Controller . 47 3.4.5 AND and XOR Overlay Cells . 48 3.5 Architecture Improvement . 49 3.5.1 Hybrid Memory Hierarchy . 49 3.5.2 Host to FPGA Communication . 52 3.5.3 AND and XOR Overlay Cells . 54 3.5.4 System Parameters . 56 4 Experiments and Results 57 4.1 System Workflow . 57 4.2 Problem Analysis . 60 4.3 Simulation Results . 60 4.3.1 Testbench Generation . 60 4.3.2 Performance Results . 61 4.4 Heterogeneous Computing System Results . 63 4.4.1 Problem Analysis . 64 4.4.2 Performance Results . 66 5 Conclusion and Future work 72 5.1 Conclusion . 72 5.2 Future Work . 73 5.2.1 Performance Improvement within one Node . 73 5.2.2 Map Garbled Circuit Problem onto Multiple Nodes . 74 5.2.3 Other Platform Exploration for Garbled Circuit . 74 A Example of Garbled Circuit with Garbled Values 76 B Interface Between Host and ProceV Board 83 iii List of Acronyms 84 Bibliography 86 iv List of Figures 2.1 Yao’s Protocol . 9 2.2 A Garbled AND Gate . 11 2.3 ALM Architecture for Stratix V Family FPGAs [1] . 14 2.4 FPGA Overlay Architecture . 15 2.5 ProceV Block Diagram [2] . 18 3.1 A Garbled Circuit Generation System . 24 3.2 Software Workflow . 27 3.3 Layer Extractor Output Visualization . 31 3.4 Garbled Circuit Problem Parser . 32 3.5 Overlay Architecture for Garbled Circuit . 34 3.6 A Garbled AND Overlay Cell . 35 3.7 Optimized Garbled AND Overlay Cell . 37 3.8 A Garbled XOR Overlay Cell . 38 3.9 A Standard RAM Interface . 39 3.10 Workload Dispatcher and Data Controller Timing information . 41 3.11 Garbled Circuit AND Gate Operations Sequence . 41 3.12 Garbled Circuit XOR Gate Operations Sequence . 42 3.13 Heterogeneous Computing System with DDR Memory . 43 3.14 ProceV Board DDR Memory Interface . 46 3.15 Coarse Granularity CPU and FPGA Communication . 48 3.16 Hybrid Memory System . 50 3.17 Data transmission for XOR Operation without Check using BRAM . 53 3.18 Reduce of Number of Registers . 54 3.19 Fine Granularity CPU and FPGA Communication . 55 4.1 System of Garbled Circuits . 58 4.2 SHA-1 Message Padding . 61 v List of Tables 1.1 Garbled Circuit Time Information . 3 2.1 Comparison among Different Approaches . 21 4.1 Problem Switching Time . 59 4.2 Size of the Examples . 62 4.3 Clock Cycle Comparison . 62 4.4 Speedup compared with FlexSC . 63 4.5 Resource Utilization . 63 4.6 Gate Information for Problems . 64 4.7 Wire Information for Problems . 65 4.8 Wire Percent for Problems . 66 4.9 3 AND 1 XOR Overlay Cells Realization for Garbled Circuit . 67 4.10 Increase Number of AND Overlay Cells . 67 4.11 Results for Removing Host XOR Operation Check . 68 4.12 Directly-Used Policy using Register and DDR Hybrid Memory . 68 4.13 Directly-Used Policy using BRAM and DDR Hybrid Memory . 69 4.14 Most-Frequently-Used Policy . 69 4.15 Influence of Clock Frequency of Hardware . 69 4.16 Influence of Number of Gates . 70 4.17 Using 2 Address Registers for 3 Addresses . 70 4.18 Speedup Results . 71 vi Acknowledgments I want to express my appreciation to those who have supported me during the process of my Ph.D. study at Northeastern University. Ph.D. study is a marathon which requires the determination and endeavors to face the challenge in research and every life. Without the help and encouragement from them, I cannot think of finishing this journey while enjoying it. So I thank them at the beginning of this dissertation. First and foremost, I want to thank my research advisor Prof. Miriam Leeser who I have been working with for the last six years. Several projects are successfully finished with the help of her, including open-source variable precision floating point library, side-channel analysis and countermeasure and privacy-preserving computation using FPGA overlays. Looking back, I am deeply thankful for all the time that Prof. Leeser spends on my projects. She is very knowledgeable in many areas and so helpful that she does not hold back any things to help me improve. She comes up with insightful ideas during every one-on-one meeting. She is always there to help me. Words are powerless to express my gratitude to her. I am very thankful that Prof. Stratis Ioannidis is my co-advisor for this project. He is an expert on secure function evaluation and has made significant contributions to the garbled circuit platforms including GraphSC and applications in data mining. Exploring.

Privacy Preserving Computations Accelerated Using FPGA Overlays

A Fast and Verified Software Stackfor Secure Function Evaluation

Week 1: an Overview of Circuit Complexity 1 Welcome 2

Reusable Garbled Gates for New Fully Homomorphic Encryption Service Xu an Wang* Fatos Xhafa Jianfeng Ma Yunfei Cao and Dianhua T

Rigorous Cache Side Channel Mitigation Via Selective Circuit Compilation

Reusable Garbled Circuit Implementation of AES to Avoid Power Analysis Attacks Venkata Lakshmi Sudheera Bommakanti Clemson University

Probabilistic Boolean Logic, Arithmetic and Architectures

Improved Garbled Circuit: Free XOR Gates and Applications

Reaction Systems and Synchronous Digital Circuits

Garbled Protocols and Two-Round MPC from Bilinear Maps

Private Set Intersection: Are Garbled Circuits Better Than Custom Protocols?

SOTERIA: in Search of Efficient Neural Networks for Private Inference

Reducing Garbled Circuit Size While Preserving Circuit Gate Privacy ⋆