Privacy Preserving Computations Accelerated Using FPGA Overlays

Privacy Preserving Computations Accelerated using FPGA Overlays A Dissertation Presented by Xin Fang to The Department of Electrical and Computer Engineering in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Engineering Northeastern University Boston, Massachusetts August 2017 ProQuest Number:10623795 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. ProQuest 10623795 Published by ProQuest LLC ( 2017). Copyright of the Dissertation is held by the Author. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code Microform Edition © ProQuest LLC. ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, MI 48106 - 1346 To my family. i Contents List of Figures v List of Tables vi Acknowledgments vii Abstract of the Dissertation ix 1 Introduction 1 1.1 Garbled Circuits . 1 1.1.1 An Example: Computing Average Blood Pressure . 2 1.2 Heterogeneous Reconfigurable Computing . 3 1.3 Contributions . 3 1.4 Remainder of the Dissertation . 5 2 Background 6 2.1 Garbled Circuits . 6 2.1.1 Garbled Circuits Overview . 7 2.1.2 Garbling Phase . 8 2.1.3 Evaluation Phase . 10 2.1.4 Optimization . 11 2.2 SHA-1 Algorithm . 12 2.3 Field-Programmable Gate Array . 13 2.3.1 FPGA Architecture . 13 2.3.2 FPGA Overlays . 14 2.3.3 Heterogeneous Computing Platform using FPGAs . 16 2.3.4 ProceV Board . 17 2.4 Related Work . 17 2.4.1 Garbled Circuit Algorithm Research . 17 2.4.2 Garbled Circuit Implementation . 19 2.4.3 Garbled Circuit Acceleration . 20 ii 3 System Design Methodology 23 3.1 Garbled Circuit Generation System . 24 3.2 Software Structure . 26 3.2.1 Problem Generation . 26 3.2.2 Layer Extractor . 30 3.2.3 Problem Parser . 31 3.2.4 Host Code Generation . 32 3.3 Simulation of Garbled Circuit Generation . 33 3.3.1 FPGA Overlay Architecture . 33 3.3.2 Garbled Circuit AND Overlay Cell . 34 3.3.3 Garbled Circuit XOR Overlay Cell . 36 3.3.4 Embedded Memory . 38 3.3.5 Workload Dispatcher and Data Controller . 39 3.4 Heterogeneous Computing System using ProceV Board . 42 3.4.1 Communicating between Host and FPGA . 42 3.4.2 Accessing DDR Memory On Board . 44 3.4.3 Accessing On-Chip Registers . 47 3.4.4 Workload Dispatcher and Data Controller . 47 3.4.5 AND and XOR Overlay Cells . 48 3.5 Architecture Improvement . 49 3.5.1 Hybrid Memory Hierarchy . 49 3.5.2 Host to FPGA Communication . 52 3.5.3 AND and XOR Overlay Cells . 54 3.5.4 System Parameters . 56 4 Experiments and Results 57 4.1 System Workflow . 57 4.2 Problem Analysis . 60 4.3 Simulation Results . 60 4.3.1 Testbench Generation . 60 4.3.2 Performance Results . 61 4.4 Heterogeneous Computing System Results . 63 4.4.1 Problem Analysis . 64 4.4.2 Performance Results . 66 5 Conclusion and Future work 72 5.1 Conclusion . 72 5.2 Future Work . 73 5.2.1 Performance Improvement within one Node . 73 5.2.2 Map Garbled Circuit Problem onto Multiple Nodes . 74 5.2.3 Other Platform Exploration for Garbled Circuit . 74 A Example of Garbled Circuit with Garbled Values 76 B Interface Between Host and ProceV Board 83 iii List of Acronyms 84 Bibliography 86 iv List of Figures 2.1 Yao’s Protocol . 9 2.2 A Garbled AND Gate . 11 2.3 ALM Architecture for Stratix V Family FPGAs [1] . 14 2.4 FPGA Overlay Architecture . 15 2.5 ProceV Block Diagram [2] . 18 3.1 A Garbled Circuit Generation System . 24 3.2 Software Workflow . 27 3.3 Layer Extractor Output Visualization . 31 3.4 Garbled Circuit Problem Parser . 32 3.5 Overlay Architecture for Garbled Circuit . 34 3.6 A Garbled AND Overlay Cell . 35 3.7 Optimized Garbled AND Overlay Cell . 37 3.8 A Garbled XOR Overlay Cell . 38 3.9 A Standard RAM Interface . 39 3.10 Workload Dispatcher and Data Controller Timing information . 41 3.11 Garbled Circuit AND Gate Operations Sequence . 41 3.12 Garbled Circuit XOR Gate Operations Sequence . 42 3.13 Heterogeneous Computing System with DDR Memory . 43 3.14 ProceV Board DDR Memory Interface . 46 3.15 Coarse Granularity CPU and FPGA Communication . 48 3.16 Hybrid Memory System . 50 3.17 Data transmission for XOR Operation without Check using BRAM . 53 3.18 Reduce of Number of Registers . 54 3.19 Fine Granularity CPU and FPGA Communication . 55 4.1 System of Garbled Circuits . 58 4.2 SHA-1 Message Padding . 61 v List of Tables 1.1 Garbled Circuit Time Information . 3 2.1 Comparison among Different Approaches . 21 4.1 Problem Switching Time . 59 4.2 Size of the Examples . 62 4.3 Clock Cycle Comparison . 62 4.4 Speedup compared with FlexSC . 63 4.5 Resource Utilization . 63 4.6 Gate Information for Problems . 64 4.7 Wire Information for Problems . 65 4.8 Wire Percent for Problems . 66 4.9 3 AND 1 XOR Overlay Cells Realization for Garbled Circuit . 67 4.10 Increase Number of AND Overlay Cells . 67 4.11 Results for Removing Host XOR Operation Check . 68 4.12 Directly-Used Policy using Register and DDR Hybrid Memory . 68 4.13 Directly-Used Policy using BRAM and DDR Hybrid Memory . 69 4.14 Most-Frequently-Used Policy . 69 4.15 Influence of Clock Frequency of Hardware . 69 4.16 Influence of Number of Gates . 70 4.17 Using 2 Address Registers for 3 Addresses . 70 4.18 Speedup Results . 71 vi Acknowledgments I want to express my appreciation to those who have supported me during the process of my Ph.D. study at Northeastern University. Ph.D. study is a marathon which requires the determination and endeavors to face the challenge in research and every life. Without the help and encouragement from them, I cannot think of finishing this journey while enjoying it. So I thank them at the beginning of this dissertation. First and foremost, I want to thank my research advisor Prof. Miriam Leeser who I have been working with for the last six years. Several.

Privacy Preserving Computations Accelerated Using FPGA Overlays

A Fast and Verified Software Stackfor Secure Function Evaluation

Week 1: an Overview of Circuit Complexity 1 Welcome 2

Rigorous Cache Side Channel Mitigation Via Selective Circuit Compilation

Probabilistic Boolean Logic, Arithmetic and Architectures

Cray XD1™ Supercomputer Release 1.3

Privacy Preserving Computations Accelerated Using FPGA Overlays

Reaction Systems and Synchronous Digital Circuits

Merrimac – High-Performance and Highly-Efficient Scientific Computing with Streams

SOTERIA: in Search of Efficient Neural Networks for Private Inference

Pubtex Output 2006.05.15:1001

The Gemini Network

The Size and Depth of Layered Boolean Circuits