Privacy Preserving Computations Accelerated Using FPGA Overlays
Total Page:16
File Type:pdf, Size:1020Kb

Load more
Recommended publications
-
A Fast and Verified Software Stackfor Secure Function Evaluation
Session I4: Verifying Crypto CCS’17, October 30-November 3, 2017, Dallas, TX, USA A Fast and Verified Software Stack for Secure Function Evaluation José Bacelar Almeida Manuel Barbosa Gilles Barthe INESC TEC and INESC TEC and FCUP IMDEA Software Institute, Spain Universidade do Minho, Portugal Universidade do Porto, Portugal François Dupressoir Benjamin Grégoire Vincent Laporte University of Surrey, UK Inria Sophia-Antipolis, France IMDEA Software Institute, Spain Vitor Pereira INESC TEC and FCUP Universidade do Porto, Portugal ABSTRACT as OpenSSL,1 s2n2 and Bouncy Castle,3 as well as prototyping We present a high-assurance software stack for secure function frameworks such as CHARM [1] and SCAPI [31]. More recently, a evaluation (SFE). Our stack consists of three components: i. a veri- series of groundbreaking cryptographic engineering projects have fied compiler (CircGen) that translates C programs into Boolean emerged, that aim to bring a new generation of cryptographic proto- circuits; ii. a verified implementation of Yao’s SFE protocol based on cols to real-world applications. In this new generation of protocols, garbled circuits and oblivious transfer; and iii. transparent applica- which has matured in the last two decades, secure computation tion integration and communications via FRESCO, an open-source over encrypted data stands out as one of the technologies with the framework for secure multiparty computation (MPC). CircGen is a highest potential to change the landscape of secure ITC, namely general purpose tool that builds on CompCert, a verified optimizing by improving cloud reliability and thus opening the way for new compiler for C. It can be used in arbitrary Boolean circuit-based secure cloud-based applications. -
Week 1: an Overview of Circuit Complexity 1 Welcome 2
Topics in Circuit Complexity (CS354, Fall’11) Week 1: An Overview of Circuit Complexity Lecture Notes for 9/27 and 9/29 Ryan Williams 1 Welcome The area of circuit complexity has a long history, starting in the 1940’s. It is full of open problems and frontiers that seem insurmountable, yet the literature on circuit complexity is fairly large. There is much that we do know, although it is scattered across several textbooks and academic papers. I think now is a good time to look again at circuit complexity with fresh eyes, and try to see what can be done. 2 Preliminaries An n-bit Boolean function has domain f0; 1gn and co-domain f0; 1g. At a high level, the basic question asked in circuit complexity is: given a collection of “simple functions” and a target Boolean function f, how efficiently can f be computed (on all inputs) using the simple functions? Of course, efficiency can be measured in many ways. The most natural measure is that of the “size” of computation: how many copies of these simple functions are necessary to compute f? Let B be a set of Boolean functions, which we call a basis set. The fan-in of a function g 2 B is the number of inputs that g takes. (Typical choices are fan-in 2, or unbounded fan-in, meaning that g can take any number of inputs.) We define a circuit C with n inputs and size s over a basis B, as follows. C consists of a directed acyclic graph (DAG) of s + n + 2 nodes, with n sources and one sink (the sth node in some fixed topological order on the nodes). -
Rigorous Cache Side Channel Mitigation Via Selective Circuit Compilation
RiCaSi: Rigorous Cache Side Channel Mitigation via Selective Circuit Compilation B Heiko Mantel, Lukas Scheidel, Thomas Schneider, Alexandra Weber( ), Christian Weinert, and Tim Weißmantel Technical University of Darmstadt, Darmstadt, Germany {mantel,weber,weissmantel}@mais.informatik.tu-darmstadt.de, {scheidel,schneider,weinert}@encrypto.cs.tu-darmstadt.de Abstract. Cache side channels constitute a persistent threat to crypto implementations. In particular, block ciphers are prone to attacks when implemented with a simple lookup-table approach. Implementing crypto as software evaluations of circuits avoids this threat but is very costly. We propose an approach that combines program analysis and circuit compilation to support the selective hardening of regular C implemen- tations against cache side channels. We implement this approach in our toolchain RiCaSi. RiCaSi avoids unnecessary complexity and overhead if it can derive sufficiently strong security guarantees for the original implementation. If necessary, RiCaSi produces a circuit-based, hardened implementation. For this, it leverages established circuit-compilation technology from the area of secure computation. A final program analysis step ensures that the hardening is, indeed, effective. 1 Introduction Cache side channels are unintended communication channels of programs. Cache- side-channel leakage might occur if a program accesses memory addresses that depend on secret information like cryptographic keys. When these secret-depen- dent memory addresses are loaded into a shared cache, an attacker might deduce the secret information based on observing the cache. Such cache side channels are particularly dangerous for implementations of block ciphers, as shown, e.g., by attacks on implementations of DES [58,67], AES [2,11,57], and Camellia [59,67,73]. -
Probabilistic Boolean Logic, Arithmetic and Architectures
PROBABILISTIC BOOLEAN LOGIC, ARITHMETIC AND ARCHITECTURES A Thesis Presented to The Academic Faculty by Lakshmi Narasimhan Barath Chakrapani In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the School of Computer Science, College of Computing Georgia Institute of Technology December 2008 PROBABILISTIC BOOLEAN LOGIC, ARITHMETIC AND ARCHITECTURES Approved by: Professor Krishna V. Palem, Advisor Professor Trevor Mudge School of Computer Science, College Department of Electrical Engineering of Computing and Computer Science Georgia Institute of Technology University of Michigan, Ann Arbor Professor Sung Kyu Lim Professor Sudhakar Yalamanchili School of Electrical and Computer School of Electrical and Computer Engineering Engineering Georgia Institute of Technology Georgia Institute of Technology Professor Gabriel H. Loh Date Approved: 24 March 2008 College of Computing Georgia Institute of Technology To my parents The source of my existence, inspiration and strength. iii ACKNOWLEDGEMENTS आचायातर् ्पादमादे पादं िशंयः ःवमेधया। पादं सॄचारयः पादं कालबमेणच॥ “One fourth (of knowledge) from the teacher, one fourth from self study, one fourth from fellow students and one fourth in due time” 1 Many people have played a profound role in the successful completion of this disser- tation and I first apologize to those whose help I might have failed to acknowledge. I express my sincere gratitude for everything you have done for me. I express my gratitude to Professor Krisha V. Palem, for his energy, support and guidance throughout the course of my graduate studies. Several key results per- taining to the semantic model and the properties of probabilistic Boolean logic were due to his brilliant insights. -
Cray XD1™ Supercomputer Release 1.3
CRAY XD1 DATASHEET Cray XD1™ Supercomputer Release 1.3 ■ Purpose-built for HPC — delivers exceptional application The Cray XD1 supercomputer combines breakthrough performance interconnect, management and reconfigurable computing ■ Affordable power — designed for a broad range of HPC technologies to meet users’ demands for exceptional workloads and budgets performance, reliability and usability. Designed to meet ■ Linux, 32 and 64-bit x86 compatible — runs a wide variety of ISV the requirements of highly demanding HPC applications applications and open source codes in fields ranging from product design to weather prediction to ■ Simplified system administration — automates configuration and scientific research, the Cray XD1 system is an indispensable management functions tool for engineers and scientists to simulate and analyze ■ Highly reliable — monitors and maintains system health faster, solve more complex problems, and bring solutions ■ Scalable to hundreds of compute nodes — high bandwidth and to market sooner. low latency let applications scale Direct Connected Processor Architecture Cray XD1 System Highlights The Cray XD1 system is based on the Direct Connected Processor (DCP) architecture, harnessing many processors into a single, unified system to deliver new levels of application � � � ���� � � � Compute� � � Processors performance. ���� � Cray’s implementation of the DCP architecture optimizes message-passing applications by 12 AMD Opteron™ 64-bit single directly linking processors to each other through a high performance interconnect fabric, or dual core processors run eliminating shared memory contention and PCI bus bottlenecks. Linux and are organized as six nodes of 2 or 4-way SMPs to deliver up to 106 GFLOPS* per chassis. Matching memory and I/O performance removes bottlenecks and maximizes processor performance. -
Privacy Preserving Computations Accelerated Using FPGA Overlays
Privacy Preserving Computations Accelerated using FPGA Overlays A Dissertation Presented by Xin Fang to The Department of Electrical and Computer Engineering in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Engineering Northeastern University Boston, Massachusetts August 2017 To my family. i Contents List of Figures v List of Tables vi Acknowledgments vii Abstract of the Dissertation ix 1 Introduction 1 1.1 Garbled Circuits . 1 1.1.1 An Example: Computing Average Blood Pressure . 2 1.2 Heterogeneous Reconfigurable Computing . 3 1.3 Contributions . 3 1.4 Remainder of the Dissertation . 5 2 Background 6 2.1 Garbled Circuits . 6 2.1.1 Garbled Circuits Overview . 7 2.1.2 Garbling Phase . 8 2.1.3 Evaluation Phase . 10 2.1.4 Optimization . 11 2.2 SHA-1 Algorithm . 12 2.3 Field-Programmable Gate Array . 13 2.3.1 FPGA Architecture . 13 2.3.2 FPGA Overlays . 14 2.3.3 Heterogeneous Computing Platform using FPGAs . 16 2.3.4 ProceV Board . 17 2.4 Related Work . 17 2.4.1 Garbled Circuit Algorithm Research . 17 2.4.2 Garbled Circuit Implementation . 19 2.4.3 Garbled Circuit Acceleration . 20 ii 3 System Design Methodology 23 3.1 Garbled Circuit Generation System . 24 3.2 Software Structure . 26 3.2.1 Problem Generation . 26 3.2.2 Layer Extractor . 30 3.2.3 Problem Parser . 31 3.2.4 Host Code Generation . 32 3.3 Simulation of Garbled Circuit Generation . 33 3.3.1 FPGA Overlay Architecture . 33 3.3.2 Garbled Circuit AND Overlay Cell . -
Reaction Systems and Synchronous Digital Circuits
molecules Article Reaction Systems and Synchronous Digital Circuits Zeyi Shang 1,2,† , Sergey Verlan 2,† , Ion Petre 3,4 and Gexiang Zhang 1,∗ 1 School of Electrical Engineering, Southwest Jiaotong University, Chengdu 611756, Sichuan, China; [email protected] 2 Laboratoire d’Algorithmique, Complexité et Logique, Université Paris Est Créteil, 94010 Créteil, France; [email protected] 3 Department of Mathematics and Statistics, University of Turku, FI-20014, Turku, Finland; ion.petre@utu.fi 4 National Institute for Research and Development in Biological Sciences, 060031 Bucharest, Romania * Correspondence: [email protected] † These authors contributed equally to this work. Received: 25 March 2019; Accepted: 28 April 2019 ; Published: 21 May 2019 Abstract: A reaction system is a modeling framework for investigating the functioning of the living cell, focused on capturing cause–effect relationships in biochemical environments. Biochemical processes in this framework are seen to interact with each other by producing the ingredients enabling and/or inhibiting other reactions. They can also be influenced by the environment seen as a systematic driver of the processes through the ingredients brought into the cellular environment. In this paper, the first attempt is made to implement reaction systems in the hardware. We first show a tight relation between reaction systems and synchronous digital circuits, generally used for digital electronics design. We describe the algorithms allowing us to translate one model to the other one, while keeping the same behavior and similar size. We also develop a compiler translating a reaction systems description into hardware circuit description using field-programming gate arrays (FPGA) technology, leading to high performance, hardware-based simulations of reaction systems. -
Merrimac – High-Performance and Highly-Efficient Scientific Computing with Streams
MERRIMAC – HIGH-PERFORMANCE AND HIGHLY-EFFICIENT SCIENTIFIC COMPUTING WITH STREAMS A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Mattan Erez May 2007 c Copyright by Mattan Erez 2007 All Rights Reserved ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (William J. Dally) Principal Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (Patrick M. Hanrahan) I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. (Mendel Rosenblum) Approved for the University Committee on Graduate Studies. iii iv Abstract Advances in VLSI technology have made the raw ingredients for computation plentiful. Large numbers of fast functional units and large amounts of memory and bandwidth can be made efficient in terms of chip area, cost, and energy, however, high-performance com- puters realize only a small fraction of VLSI’s potential. This dissertation describes the Merrimac streaming supercomputer architecture and system. Merrimac has an integrated view of the applications, software system, compiler, and architecture. We will show how this approach leads to over an order of magnitude gains in performance per unit cost, unit power, and unit floor-space for scientific applications when compared to common scien- tific computers designed around clusters of commodity general-purpose processors. -
SOTERIA: in Search of Efficient Neural Networks for Private Inference
SOTERIA: In Search of Efficient Neural Networks for Private Inference Anshul Aggarwal, Trevor E. Carlson, Reza Shokri, Shruti Tople National University of Singapore (NUS), Microsoft Research {anshul, trevor, reza}@comp.nus.edu.sg, [email protected] Abstract paper, we focus on private computation of inference over deep neural networks, which is the setting of machine learning-as- ML-as-a-service is gaining popularity where a cloud server a-service. Consider a server that provides a machine learning hosts a trained model and offers prediction (inference) ser- service (e.g., classification), and a client who needs to use vice to users. In this setting, our objective is to protect the the service for an inference on her data record. The server is confidentiality of both the users’ input queries as well as the not willing to share the proprietary machine learning model, model parameters at the server, with modest computation and underpinning her service, with any client. The clients are also communication overhead. Prior solutions primarily propose unwilling to share their sensitive private data with the server. fine-tuning cryptographic methods to make them efficient We consider an honest-but-curious threat model. In addition, for known fixed model architectures. The drawback with this we assume the two parties do not trust, nor include, any third line of approach is that the model itself is never designed to entity in the protocol. In this setting, our first objective is to operate with existing efficient cryptographic computations. design a secure protocol that protects the confidentiality of We observe that the network architecture, internal functions, client data as well as the prediction results against the server and parameters of a model, which are all chosen during train- who runs the computation. -
Pubtex Output 2006.05.15:1001
Cray XD1™ Release Description Private S–2453–14 © 2006 Cray Inc. All Rights Reserved. Unpublished Private Information. This unpublished work is protected to trade secret, copyright and other laws. Except as permitted by contract or express written permission of Cray Inc., no part of this work or its content may be used, reproduced or disclosed in any form. U.S. GOVERNMENT RESTRICTED RIGHTS NOTICE The Computer Software is delivered as "Commercial Computer Software" as defined in DFARS 48 CFR 252.227-7014. All Computer Software and Computer Software Documentation acquired by or for the U.S. Government is provided with Restricted Rights. Use, duplication or disclosure by the U.S. Government is subject to the restrictions described in FAR 48 CFR 52.227-14 or DFARS 48 CFR 252.227-7014, as applicable. Technical Data acquired by or for the U.S. Government, if any, is provided with Limited Rights. Use, duplication or disclosure by the U.S. Government is subject to the restrictions described in FAR 48 CFR 52.227-14 or DFARS 48 CFR 252.227-7013, as applicable. Autotasking, Cray, Cray Channels, Cray Y-MP, GigaRing, LibSci, UNICOS and UNICOS/mk are federally registered trademarks and Active Manager, CCI, CCMT, CF77, CF90, CFT, CFT2, CFT77, ConCurrent Maintenance Tools, COS, Cray Ada, Cray Animation Theater, Cray APP, Cray Apprentice2, Cray C++ Compiling System, Cray C90, Cray C90D, Cray CF90, Cray EL, Cray Fortran Compiler, Cray J90, Cray J90se, Cray J916, Cray J932, Cray MTA, Cray MTA-2, Cray MTX, Cray NQS, Cray Research, Cray SeaStar, Cray S-MP, -
The Gemini Network
The Gemini Network Rev 1.1 Cray Inc. © 2010 Cray Inc. All Rights Reserved. Unpublished Proprietary Information. This unpublished work is protected by trade secret, copyright and other laws. Except as permitted by contract or express written permission of Cray Inc., no part of this work or its content may be used, reproduced or disclosed in any form. Technical Data acquired by or for the U.S. Government, if any, is provided with Limited Rights. Use, duplication or disclosure by the U.S. Government is subject to the restrictions described in FAR 48 CFR 52.227-14 or DFARS 48 CFR 252.227-7013, as applicable. Autotasking, Cray, Cray Channels, Cray Y-MP, UNICOS and UNICOS/mk are federally registered trademarks and Active Manager, CCI, CCMT, CF77, CF90, CFT, CFT2, CFT77, ConCurrent Maintenance Tools, COS, Cray Ada, Cray Animation Theater, Cray APP, Cray Apprentice2, Cray C90, Cray C90D, Cray C++ Compiling System, Cray CF90, Cray EL, Cray Fortran Compiler, Cray J90, Cray J90se, Cray J916, Cray J932, Cray MTA, Cray MTA-2, Cray MTX, Cray NQS, Cray Research, Cray SeaStar, Cray SeaStar2, Cray SeaStar2+, Cray SHMEM, Cray S-MP, Cray SSD-T90, Cray SuperCluster, Cray SV1, Cray SV1ex, Cray SX-5, Cray SX-6, Cray T90, Cray T916, Cray T932, Cray T3D, Cray T3D MC, Cray T3D MCA, Cray T3D SC, Cray T3E, Cray Threadstorm, Cray UNICOS, Cray X1, Cray X1E, Cray X2, Cray XD1, Cray X-MP, Cray XMS, Cray XMT, Cray XR1, Cray XT, Cray XT3, Cray XT4, Cray XT5, Cray XT5h, Cray Y-MP EL, Cray-1, Cray-2, Cray-3, CrayDoc, CrayLink, Cray-MP, CrayPacs, CrayPat, CrayPort, Cray/REELlibrarian, CraySoft, CrayTutor, CRInform, CRI/TurboKiva, CSIM, CVT, Delivering the power…, Dgauss, Docview, EMDS, GigaRing, HEXAR, HSX, IOS, ISP/Superlink, LibSci, MPP Apprentice, ND Series Network Disk Array, Network Queuing Environment, Network Queuing Tools, OLNET, RapidArray, RQS, SEGLDR, SMARTE, SSD, SUPERLINK, System Maintenance and Remote Testing Environment, Trusted UNICOS, TurboKiva, UNICOS MAX, UNICOS/lc, and UNICOS/mp are trademarks of Cray Inc. -
The Size and Depth of Layered Boolean Circuits
The Size and Depth of Layered Boolean Circuits Anna G´al? and Jing-Tang Jang?? Dept. of Computer Science, University of Texas at Austin, Austin, TX 78712-1188, USA fpanni, [email protected] Abstract. We consider the relationship between size and depth for layered Boolean circuits, synchronous circuits and planar circuits as well as classes of circuits with small separators. In particular, we show that every layered Boolean circuitp of size s can be simulated by a lay- ered Boolean circuit of depth O( s log s). For planar circuits and syn- p chronous circuits of size s, we obtain simulations of depth O( s). The best known result so far was by Paterson and Valiant [16], and Dymond and Tompa [6], which holds for general Boolean circuits and states that D(f) = O(C(f)= log C(f)), where C(f) and D(f) are the minimum size and depth, respectively, of Boolean circuits computing f. The proof of our main result uses an adaptive strategy based on the two-person peb- ble game introduced by Dymond and Tompa [6]. Improving any of our results by polylog factors would immediately improve the bounds for general circuits. Key words: Boolean circuits, circuit size, circuit depth, pebble games 1 Introduction In this paper, we study the relationship between the size and depth of fan-in 2 Boolean circuits over the basis f_; ^; :g. Given a Boolean circuit C, the size of C is the number of gates in C, and the depth of C is the length of the longest path from any input to the output.