Practical Secure Two-Party Computation and Applications

Thomas Schneider

Estonian Winter School in Computer Science 2016 Overview

Lecture 1: Introduction to Secure Two-Party Computation

Lecture 2: Private Set Intersection

Lecture 3: Tools and Applications

Lecture 4: Hardware-assisted Cryptographic Protocols

2 The Engineering Cryptographic Protocols Group (ENCRYPTO)

Thomas Daniel Ágnes Michael Schneider Demmler Kiss Zohner

Info: http://encrypto.de

3 Interested in Practical Secure Computation?

We have an open, fully funded position as Ph.D. Student / Research Assistant in Engineering Scalable Secure Computation

Darmstadt - 30km south of FRA - 150,000 inhabitants (5.8 Mio in Frankfurt/Rhine-Main Metro Area) - 40,000 students

TU Darmstadt - Ranked #1 for IT security research in Germany (#5 in Europe) - Among Top 5 universies for computer science in Germany http://encrypto.de/jobs

4 Practical Secure Two-Party Computation and Applications

Lecture 1: Introduction

Estonian Winter School in Computer Science 2016 The Web of Services

Our life moves into the web...

... and so does our data.

6 How were web services used yesterday?

http://www.google.de “heart disease”

heart disease attacker can eavesdrop or modify communication

7 How should web services be used today?

https://www.google.de “heart disease”

secure channel protects communication heart disease against external attackers

HTTPS per default since

01/2010 02/2011 11/2012

8 Data breaches happen every day...

June 2, 2011: Google attacked from China Computer hackers in China broke into the Gmail accounts of several hundred people, including senior ... from outsiders US government officials, military personnel and political activists.

November 29, 2010: New WikiLeaks Publication WikiLeaks releases US State Department communiqués that offer an extraordinary look at the ... or insiders inner workings, and sharp elbows of diplomacy.

October 16, 2012: Espionage Malware MiniFlame Kaspersky Labs discover that MiniFlame is most likely a targeted cyberweapon to conduct in-depth ... or malware. surveillance and cyber-espionage.

9 How could web services be used tomorrow?

httpp://www.google.de encrypted query

process under encryption heart disease

encrypted response sensitive data remains encrypted

➪ Privacy-Preserving Web Services

10 Vision: Privacy-Preserving Web Services process sensitive data without any data leakage, e.g.,

Privacy-Preserving Medical Diagnostics Services give health recommendations without direct access to patient’s data.

Privacy-Preserving Face Recognition Services detect criminals without allowing to trace honest citizens.

Privacy-Preserving Cloud Computing Services allow to store and process data at untrusted service providers.

11 Is this possible at all?

Andrew Chi-Chi Yao 1986: Any efficiently computable function can be evaluated securely.

➪ Secure Computation

12 Secure Two-Party Computation

x y f(x,y) f

All Lectures: Semi-Honest (Passive) Adversaries

13 Secure Two-Party Computation

public function f( , ) Is C • compute arbitrary function f · · richer? • on private data x, y x > y Client Server • without trusted third party C S • reveal nothing but result z = f(x,y)

private data x private data y x = $2 Mio y = $1 Mio S2PC Example: Yao’s Millionaires’ Problem

true z = f(x, y)

14 Secure Two-Party Computation

Auctions [NaorPS99], ...

Remote Diagnostics [BrickellPSW07], ...

DNA Searching [Troncoso-PastorizaKC07], ...

Biometric Identification [ErkinFGKLT09], ...

Medical Diagnostics [BarniFKLSS09], ...

15 (OT)

(x0, x1) r

OT xr

1-out-of-2 OT is an essential building block for secure computation.

16 How to Measure Efficiency of a Protocol?

✓ Runtime (depends on implementation & scenario) ✓ Communication • # bits sent (important for networks with low bandwidth) • # rounds (important for networks with high latency) ? Computation • Usually: count # crypto operations, e.g., • # modular exponentiations • # point multiplications • # hash function evaluations (SHA) • # block cipher evaluations (AES) faster • # One-Time Pad evaluations • But also non-cryptographic operations do matter!

17 Overview of this lecture

Part 1: Yao vs. GMW

Special Purpose Protocols Generic Protocols

Arithmetic Circuit Boolean Circuit

Homomorphic Encryption Yao GMW

OT

Public Key Crypto >> Symmetric Crypto >> One-Time Pad

Part 2: Efficient OT Extensions 18 Part 1: Yao vs. GMW and Efficient Circuits

T. Schneider, M. Zohner: GMW vs. Yao? Efficient secure two-party computation with low depth circuits. In FC’13.

19 Yao’s Garbled Circuits Protocol [Yao86]

f( , ) e.g., x < y · · Client Server C S private data x = x1, .., xn private data y = y1, .., yn

xn yn x2 y2 x1 y1 • Circuit < ... c2 < c1 < z xn yn x2 y2 x1 y1

• Garbled ... c2 c1 Setup Circuit C Phase C z 0 0 g(0,0) Online y E(x1, y1; c1 ) e c0, c1 E(x0, y1; cg(0,1)) Phase (x; ) OT(x;(x0, x1)) 1 1 1 1 1 Garbled e1 e0 eg(1,0) ? e E(x1, y1; c1 ) Values E(xe1, ye1; ecg(1,1)) f(x, y)=C(x, y) Part 2: Efficient OT e e 1 1 1 e e e Garblede e eTable

e e e 20 e e e Garbled Circuits [Yao86] Conventional circuit Garbled circuit

01

keys look random

0 1

01

01 01

given input keys, can compute output key only

(Slide from Viet-Tung Hoang) 21 Garbled Gate [Yao86] X Y

Y 0

X 1 given two input keys, can compute only output key X 2

X 3

A B C D

(Slide from Viet-Tung Hoang) 22 Overview of Efficient Garbled Circuit Constructions

1990 Point-and-Permute [BeaverMicaliRogaway]

1999 3-row reduction [NaorPinkasSumner]

2008 Free-XOR [KolesnikovSchneider]

2009 2-row reduction [PinkasSchneiderSmartWilliams]

2012 Garbling via AES [KreuterShelatShen]

2013 Fixed-key AES [BellareHoangKeelveedhiRogaway]

2014 FleXor [KolesnikovMohasselRosulek]

2015 HalfGates [ZahurRosulekEvans]

(Slide from Payman Mohassel) 23 Summary of Garbled Circuit Constructions

size (× t) garble cost (AES) eval cost (AES)

XOR AND XOR AND XOR AND

Classical large 8 5

P&P 4 4 1

GRR3 3 4 1

Free XOR 0 3 0 4 0 1

HalfGates 0 2 0 4 0 2

t: symmetric security parameter, e.g., t=128 (Slide from Mike Rosulek) 24 Summary: Yao - the Apple

How to eat an apple? bite-by-bite

+ Yao has constant #rounds - Evaluating a garbled gate requires symmetric crypto in the online phase

25 The GMW Protocol [GMW87]

Secret share inputs: a = a1 ⊕ a2 a b b = b1 ⊕ b2 Non-Interactive XOR gates: c1 = a1 ⊕ b1 ; c2 = a2 ⊕ b2 c

Interactive AND gates: c1,b1 c2,b2 ^ AND d1 ∧ d2 d

Recombine outputs: d = d1 ⊕ d2

26 Evaluating ANDs via Multiplication Triples [Beaver91]

Part 2: Efficient OTs Setup phase: Generate multiplication triple (a1⊕a2) (b1⊕b2) = c1⊕c2 for each AND via 2 OTs:

1) P1: m0, m1 ∈R {0,1}; P2: a2 ∈R {0,1} 2) P1 and P2 run OT, where P1 inputs (m0, m1), P2 inputs a2 and gets u2=ma2 3) P1 sets b1 = m0 ⊕ m1; v1 = m0 4) P1 and P2 repeat steps 1-3 with reversed roles to obtain (a1, u1); (b2, v2) 5) Pi sets ci = (ai bi) ⊕ ui ⊕ vi

Online phase: x1, y1 xc2, y,b2 P1 → P2: d1=x1⊕a1; e1=y1⊕b1 c1,b1 2 2 AND P1 ← P2: d2=x2⊕a2; e2=y2⊕b2 ∧ dz11 zd2 2 P1, P2: d=d1⊕d2; e=e1⊕e2 P1: z1=db1⊕ea1⊕c1⊕de P2: z2=db2⊕ea2⊕c2

27 Summary: GMW - the Orange

How to eat an orange? 1) peel (almost all the effort) Setup phase: - precompute multiplication triples for each AND gate using 2 R-OTs and constant #rounds + no need to know function, only max. #ANDs

2) eat (easy) Online phase: + evaluating circuit needs OTP operations only - 2x2 bit communication per layer of AND gates

28 Benchmarks of an optimized GMW implementation [SZ13]

Runtime in seconds for 512-bit multiplication circuit (800k AND gates, depth 38) over Gigabit LAN.

29 Benchmarks of an optimized GMW implementation [SZ13]

Runtime in seconds for 512-bit multiplication circuit (800k AND gates, depth 38) over Gigabit LAN.

Interactive AND gates via Beaver’s multiplication triples [D. Beaver. Efficient multiparty protocols using circuit randomization. CRYPTO’91.]

setup phase: 1-out-of-4 OT online phase: 2 independent 2-bit messages (sent in parallel) => 1x network latency per layer of AND gates

30 Benchmarks of an optimized GMW implementation [SZ13]

Runtime in seconds for 512-bit multiplication circuit (800k AND gates, depth 38) over Gigabit LAN.

Use AES-based PRF for OT extensions (instead of SHA-1).

31 Benchmarks of an optimized GMW implementation [SZ13]

Runtime in seconds for 512-bit multiplication circuit (800k AND gates, depth 38) over Gigabit LAN.

Load Balancing: • Run half of the precomputed OTs in each direction (in parallel). • Run base OTs twice (in parallel).

=> Each party has exactly the same workload.

32 Benchmarks of an optimized GMW implementation [SZ13]

Runtime in seconds for 512-bit multiplication circuit (800k AND gates, depth 38) over Gigabit LAN.

Use GMP instead of NTL for base OTs.

33 Benchmarks of an optimized GMW implementation [SZ13]

Runtime in seconds for 512-bit multiplication circuit (800k AND gates, depth 38) over Gigabit LAN.

Process data in chunks of bytes (instead of bits).

34 Benchmarks of an optimized GMW implementation [SZ13]

Runtime in seconds for 512-bit multiplication circuit (800k AND gates, depth 38) over Gigabit LAN.

Use assembly implementation of OpenSSL for SHA-1 (instead of C implementation of PolarSSL).

35 Benchmarks of an optimized GMW implementation [SZ13]

Runtime in seconds for 512-bit multiplication circuit (800k AND gates, depth 38) over Gigabit LAN.

Single Instruction Multiple Data: Evaluate multiple circuits in parallel (here 32).

(inspired by Sharemind)

36 Remaining Bottlenecks in LAN Setting

1% 0.1% (Base OTs) 1.4% 0.8% 7% 3% 3%

20% 32% 35% 47% 98% 37% 16%

37 Yao vs. GMW

Yao GMW

Free XOR

S: 4, R: 2 (online) symmetric crypto per AND setup: S: 6, R: 6

setup: S→R:t || R→S:t S→R: 2t communication [bit] per AND online: S→R:2 || R→S:2

setup: O(1) O(1) rounds online: O(ANDdepth(f))

t memory per wire [bit] 1

t: symmetric security parameter 38 Efficient Circuit Constructions for Secure Computation

Classical circuit design: - few gates (⇒ small chip area)

- low depth (⇒ high clock frequency)

Circuits for secure computation: - low ANDsize (#non-XORs ⇒ communication and symmetric crypto)

- low ANDdepth (#rounds in GMW’s online phase)

Automatically generate optimized circuits from high-level descriptions: E. M. Songhori, S. U. Hussain, A.-R. Sadeghi, T. Schneider, F. Koushanfar: TinyGarble: Highly compressed and scalable sequential garbled circuits. In IEEE S&P’15.

D. Demmler, G. Dessouky, F. Koushanfar, A.-R. Sadeghi, T. Schneider, S. Zeitouni: Automated Synthesis of Optimized Circuits for Secure Computation. In ACM CCS’15. 39 Chapter 3 Circuit Optimizations and Constructions

Table 3.2: Size: Ecient Circuit Constructions (for n unsigned `-bit values) Circuit Standard Free XOR #gates 2-input 3-input 2-input non-XOR ADD, SUB ( 3.3.1) 2 2` 2 ` ADDSUB (§3.3.1.3) ` +1 2` ` § MUL (Textbook) ( 3.3.2.1) `2 +2` 2 2`2 4` +2 2`2 ` MUL (Karatsuba) (§3.3.2.2) 9`1.6 13` 34` § ⇡ CMP ( 3.3.3.1) 1 ` 1 ` MUX (§3.3.3.2) 0 ` ` MIN, MAX (§3.3.3.3) n 1 2`(n 1) + 2 2`(n 1) + n +1 §

The first 1-bit has constant input c1 = 0 and can be replaced by a smaller half- adder with two inputs. Each 1-bit adder has as inputs the carry-in bit ci from the previous 1-bit adder and the two input bits xi,yi. The outputs are the carry-out bit c =(x y ) (x c ) (y c )=(x ,y,c)[00010111] and the sum bit i+1 i ^ i _ i ^ i _ i ^ i i i i si = xi yi ci =(xi,yi,ci)[01101001]. All occurring gates are even and can be optimized to a small number of XOR gates. An equivalent construction for computing Exampleci+1 with the Circuit: same number Addition of non-XOR gates was given in [BPP00, BDP00].

x` y` x2 y2 x1 y1 Ripple-Carry-Adder ADD c3 c2 + ... + + 0

s`+1 s` s2 s1

si = xi ⊕ yi ⊕ ci Figure 3.3: Circuit: Addition (ADD) ci+1 = ((xi ⊕ yi) ∧ (xi ⊕ ci)) ⊕ xi [BoyarPeraltaPochuev00] ANDsize = ℓ, ANDdepth = ℓ

x4 y4 x3 y3 x2 y2 x1 y1 Ladner-Fischer-Adder3.3.1.2 Subtraction [LF80] (SUB) p4,0 p3,0 p2,0 p1,0 pi,0=xi⊕yi, ci,0=xi∧yi c 4,0 c 3,0 c 2,0 c 1,0 Subtraction in two’s complement representation is defined as x y = x+ y+1. Hence, ¬ asubtractioncircuit(SUB) can bep4,1 constructedp2,1 analogously to the addition circuit from c 4,1 c 2,1 1-bit subtractors ( ) as shown in Fig. 3.4. Each 1-bit subtractor computes the carry- pi,j=pi,j-1∧pk,j-1 p4,2 p3,2 out bit ci+1 =(xi yi) (xi ci) ( yi ci)=(xi,yi,ci))[01001101]ci,j=(pi,j-1∧ck,j-1 and)∨c thei,j-1 di↵erence c 4,2 c 3,2 bit d = x y ^c¬ =(_x ,y^,c)[10010110]._ ¬ ^ The size of SUB is equal to that of ADD. i i ¬ i i i i i

s5 s4 s3 s2 s1

3.3.1.3 ControlledANDsize Addition/Subtraction = ℓ+1.25 ℓ log2(ℓ), (ADDSUB ANDdepth) = 1+2 log2(ℓ) 40 A controlled addition/subtraction circuit (ADDSUB)whichcanaddorsubtracttwo unsigned `-bit values x and y depending on a control input bit ctrl can be naturally constructed as combination of ADD, SUB, and controlled inversion (CNOT) which is

40 290 T. Schneider and M. Zohner

ASummaryofCircuitBuildingBlocks Example Circuits Summarized in [SchneiderZohner13] Table 7. Size and Depth of Circuit Constructions (dH :Hammingweight)

Circuit Size S Depth D Addition ℓ Ripple-carry ADD/SUBRC ℓ ℓ ℓ Ladner-Fischer ADDLF 1.25ℓ log2 ℓ + ℓ 2 log2 ℓ +1 ℓ ⌈ ⌉ ⌈ ⌉ LF subtraction SUBLF 1.25ℓ log2 ℓ +2ℓ 2 log2 ℓ +2 (ℓ,3) ⌈ ⌉ ℓ ⌈ ⌉ℓ Carry-save ADDCSA ℓ + S(ADD ) D(ADD )+1 RC network ADD(ℓ,n) ℓn ℓ + n log n 1 log n 1 + ℓ RC − −⌈ 2 ⌉− ⌈ 2 − ⌉ (ℓ,n) ℓn 2ℓ + n log2 n log2 n 1 CSA network ADDCSA − ℓ+−⌈log n ⌉ ⌈ ℓ+−log⌉ n ⌈ 2 ⌉ ⌈ 2 ⌉ +S(ADDLF ) +D(ADDLF ) Multiplication RCN school method MULℓ 2ℓ2 ℓ 2ℓ 1 RC − − CSN school method MULℓ 2ℓ2 +1.25ℓ log ℓ ℓ +2 3 log ℓ +4 CSN ⌈ 2 ⌉− ⌈ 2 ⌉ RC squaring SQRℓ ℓ2 ℓ 2ℓ 3 RC − − LF squaring SQRℓ ℓ2 +1.25ℓ log ℓ 1.5ℓ 2 3 log ℓ +3 LF ⌈ 2 ⌉− − ⌈ 2 ⌉ Comparison ℓ Equality EQ ℓ 1 log2 ℓ ℓ − ⌈ ⌉ Sequential greater than GTS ℓ ℓ D&C greater than GTℓ 3ℓ log ℓ 2 log ℓ +1 DC −⌈ 2 ⌉− ⌈ 2 ⌉ Selection Multiplexer MUXℓ ℓ 1 Minimum MIN(ℓ,n) (n 1)(S(GTℓ)+ℓ) log n (D(GTℓ)+1) − ⌈ 2 ⌉ Minimum indexCan MIN( ℓ,ntrade-off) (n larger1)(S(GT sizeℓ)+ℓ +forlog bettern ) depth.log n (D(GTℓ)+1) IDX − ⌈ 2 ⌉ ⌈ 2 ⌉ 41 Set Operations Set union ℓ ℓ 1 ∪ Set intersection ℓ ℓ 1 ∩ Set inclusion ℓ 2ℓ 1 log ℓ +1 ⊆ − ⌈ 2 ⌉ Count ℓ Full Adder count CNTFA 2ℓ log2 ℓ 2 log2 ℓ ℓ −⌈ ⌉− ⌈ ⌉ Boyar-Peralta count CNT ℓ dH (ℓ) log ℓ BP − ⌊ 2 ⌋ Distances ℓ ℓ (ℓ,3) ℓ (ℓ,3) Manhattan distance DSTM 2S(SUB )+S(ADD )+1 D(SUB )+D(ADD )+1 2S(SUBℓ)+2S(SQRℓ) D(SUBℓ) Euclidean distance DSTℓ E +S(ADD(2ℓ,4))+2S(MUXℓ) +D(SQRℓ)+3

BDepthEfficient Distance Circuits B.1 Manhattan Distance ℓ ℓ ℓ The Manhattan distance DSTM between two points p1 =(x1,y1)andp2 = ℓ ℓ (x2,y2) is the distance in a two dimensional space allowing only horizontal and ℓ ℓ ℓ ℓ vertical moves and is computed as x1 x2 + y1 y2 . [5] give such a circuit ℓ ℓ | − | | − ℓ| DSTM,C with size S(DSTM,C)=9ℓ and depth D(DSTM,C)=2ℓ +2.Theyuse ℓ ℓ ℓ 4multiplexercircuitsMUX (cf. [18]), 2 GTS circuits ( 3.3), 2 SUBRC circuits (cf. [18]), and one ADDℓ circuit ( 3.1). § RC § Part 2: Efficient OTs

http://encrypto.de/code/OTExtension

G. Asharov, Y. Lindell, T. Schneider, M. Zohner: More efficient oblivious transfer and extensions for faster secure computation. In ACM CCS’13.

42 Oblivious Transfer (OT)

(x0, x1) r

OT xr

1-out-of-2 OT is an essential building block for secure computation.

43 OT - Bad News

- [ImpagliazzoRudich89]: there’s no black-box reduction from OT to OWFs

- Several OT protocols based on public-key - e.g., [NaorPinkas01] yields ~1,000 OTs per second

- Since public-key crypto is expensive, OT was believed to be inefficient

44 A Public-Key Based OT Protocol: [NaorPinkas01]

Common input: G= of prime order q

input: x0, x1 input: b

t ∈R [0,q) C C= gt k ∈R [0,q) k PKb = g PK0 PK1-b = C/PKb PK1=C/PK0 r0, r1 ∈R [0,q) r0 r0 E0= r1 r1 E1= E0, E1 Eb= k rb h=H(L )=H((PKb) ) xb=h⊕R

output: xb 45 OT - Good News

- [Beaver95]: OTs can be precomputed (only OTP in online phase)

- OT Extensions (similar to hybrid encryption): use symmetric crypto to stretch few “real” OTs into longer/many OTs - [Beaver96]: OT on long strings from short seeds - [IshaiKilianNissimPetrank03]: many OTs from few OTs

l-bit k-bit

[Beaver96] k OTs “real” OTs

m OTs

[IKNP03]

46 OT Extension of [IKNP03] (1)

- Alice inputs m pairs of ℓ-bit strings (xi,0 , xi,1)

- Bob inputs m-bit string r and obtains xi,ri in i-th OT

47 OT Extension of [IKNP03] (2)

- Alice and Bob perform k “real” OTs on random seeds with reverse roles (k: security parameter)

48 OT Extension of [IKNP03] (3)

- Bob generates a random m x k bit matrix T and masks his choices r

- The matrix is masked with the stretched seeds of the “real” OTs

PRG: pseudo-random generator (instantiated with AES)

49 OT Extension of [IKNP03] (4)

- Transpose matrices V and T

- Alice masks her inputs and obliviously sends them to Bob

H: correlation robust function (instantiated with hash function)

50 Computation Complexity of OT Extension

Per OT:

1 # PRG evaluations 2

2 # H evaluations 1

Time distribution for 10 Mio. OTs (in 21s): 1 % 10 % "real" OTs 33 % H (SHA-1) PRG (AES) 42 % Transpose 14 % Misc (Snd/Rcv/XOR)

Non-crypto part was bottleneck!!!

51 Algorithmic Optimization: Efficient Matrix Transposition

- Naive matrix transposition performs mk load/process/store operations

- [Eklundh72]’s algorithm reduces number of operations to O(m log2 k) swaps - Swap whole registers instead of bits - Transposing 10 times faster

52 Algorithmic Optimization: Parallelization

- OT extension can easily be parallelized by splitting the T matrix into sub-matrices

- Since columns are independent, OT is highly parallelizable

53 Communication Complexity of OT Extension

Per OT:

2ℓ Bits sent 2k

Yao: ℓ = k = 128 GMW: ℓ = 1, k = 128

Alice

Alice Bob

Bob

54 Protocol Optimization: General OT Extension

- Instead of generating a random T matrix, we derive it from sj,0 (similar to garbled 3-row reduction)

- Reduces data sent by Bob by factor 2

55 Specific OT Functionalities

- Secure computation protocols often require a specific OT functionality - Yao with free XORs requires strings x0, x1 to be XOR-correlated - GMW with multiplication triples can use random strings

- Correlated OT: random x0 and x1 = x0 ⊕ x - Random OT: random x0 and x1

Correlated OT Random OT

e.g., for Yao e.g., for GMW

56 Specific OT Functionalities: Correlated OT (C-OT)

- Choose xi,0 as random output of H (modeled as RO here), similar to garbled 3-row reduction

- Compute xi,1 as xi,0 ⊕ xi to obliviously transfer XOR-correlated values

- Reduces data sent by Alice by factor 2

57 Specific OT Functionalities: Random OT (R-OT)

- Choose xi,0 and xi,1 as random outputs of H (modeled as RO here), similar to garbled 3-row reduction

- No data sent by Alice

58 Performance Evaluation Performance - Performance for Mio. OTs10 - Performance strings 80-bit on OT of [SZ13] implementing - C++ implementation of [IKNP03] extension Runtime in s 10 20 30 40 0 20,6 Orig 30,7 14,4 Performance for Mio. OTs10 Performance strings 80-bit on EMT 30,5 Gigabit LAN Gigabit 13,9 G-OT : Original Implementation 29,4 10,6 C-OT 14,4 10,0 WiFi 802.11g R-OT 14,2 5,0 2T 14,2 2,6 4T 14,2

59 Performance Evaluation Performance - Only decreases runtime in LAN where computation is computation where LAN in runtime - Only decreases - Efficient computation – improves matrix transposition Runtime in s 10 20 30 40 0 20,6 Orig 30,7 14,4 Performance for Mio. OTs10 Performance strings 80-bit on EMT 30,5 Gigabit LAN Gigabit 13,9 G-OT : Efficient TranspositionMatrix 29,4 10,6 C-OT 14,4 10,0 WiFi 802.11g R-OT 14,2

the the bottleneck 5,0 2T 14,2 2,6 4T 14,2 60 Performance Evaluation Performance - Runtimes only slightly faster (bottleneck: communication communication faster (bottleneck: slightly only - Runtimes → Bob Alice - Generate T → Bob communication – improves matrix from seeds Alice Runtime in s 10 20 30 40 0 20,6 Orig 30,7 14,4 Performance for Mio. OTs10 Performance strings 80-bit on EMT 30,5 Gigabit LAN Gigabit 13,9 G-OT : General OT 29,4 10,6 C-OT 14,4 10,0 WiFi 802.11g R-OT 14,2 5,0 2T 14,2 2,6 4T ) 14,2

61 Performance Evaluation: OTCorrelated/Random Performance - WiFi runtime faster by factor 2 (bottleneck: communication Bob → Bob - faster communication WiFi by factor runtime 2 (bottleneck: OT- Correlated/Random communication – improved → Bob Alice Runtime in s 10 20 30 40 0 20,6 Orig 30,7 14,4 Performance for Mio. OTs10 Performance strings 80-bit on EMT 30,5 Gigabit LAN Gigabit 13,9 G-OT 29,4 10,6 C-OT 14,4 10,0 WiFi 802.11g R-OT 14,2 5,0 2T 14,2

2,6 Alice) 4T 14,2 62 Performance Evaluation: Parallelization

Gigabit LAN WiFi 802.11g 40 30,7 30,5 29,4 30 20,6 20 14,4 13,9 14,4 14,2 14,2 14,2 10,6 10,0

Runtimein s 10 5,0 2,6 0 Orig EMT G-OT C-OT R-OT 2T 4T Performance for 10 Mio. OTs on 80-bit strings

- Parallel OT extension with 2 and 4 threads – improved computation

- LAN runtime decreases linear in # of threads

- WiFi runtime remains the same (bottleneck: communication)

63 Performance Evaluation: Summary

Gigabit LAN WiFi 802.11g 40 30,7 30,5 29,4 30 20,6 20 14,4 13,9 14,4 14,2 14,2 14,2 10,6 10,0

Runtimein s 10 5,0 2,6 0 Orig EMT G-OT C-OT R-OT 2T 4T Performance for 10 Mio. OTs on 80-bit strings

- OT is very efficient

- Communication is the bottleneck for OT (even without using AES-NI)

64 Summary

Part 1: Yao vs. GMW - can trade-off size for depth - Yao has constant #rounds ⇒ good for high-latency networks (Internet) - GMW can precompute all crypto, good for low-latency networks (LAN)

Part 2: OT extension - send 1 ciphertext + |payload| - communication is the bottleneck

Bottleneck of today’s secure computation protocols is communication.

65 EXERCISE 1

Measure speed of crypto operations with the “openssl speed” command and order them according to throughput: • aes-128-cbc (block cipher) • dsa2048 (public-key crypto using modular exponentiation) • ecdsap256 (public-key crypto using point multiplication on elliptic curve) • rsa2048 (public-key crypto using modexp in RSA group) • sha256 (hash function)

66 Literature

[ALSZ13] G. Asharov, Y. Lindell, T. Schneider, M. Zohner: More efficient oblivious transfer and extensions for faster secure computation. In ACM CCS’13. [BarniFKLSS09] M. Barni, P. Failla, V. Kolesnikov, R. Lazzeretti, A.-R. Sadeghi, T. Schneider: Secure Evaluation of Private Linear Branching Programs with Medical Applications. In ESORICS’09. [Beaver91] D. Beaver: Efficient multiparty protocols using circuit randomization. In CRYPTO’91. [Beaver95] D. Beaver: Precomputing oblivious transfer. In CRYPTO’95. [BrickellPSW07] J. Brickell, D. E. Porter, V. Shmatikov, E. Witchel. Privacy-preserving remote diagnostics. In ACM CCS’07. [DDKSSZ15] D. Demmler, G. Dessouky, F. Koushanfar, A.-R. Sadeghi, T. Schneider, S. Zeitouni: Automated Synthesis of Optimized Circuits for Secure Computation. In ACM CCS’15. [CHKMR12] S. G. Choi, K.-W. Hwang, J. Katz, T. Malkin, D. Rubinstein: Secure multi-party computation of Boolean circuits with applications to privacy in on-line marketplaces. In CT-RSA’12. [Eklundh72] J. O. Eklundh. A fast computer method for matrix transposing. In IEEE Transactions on Computers, 1972. [ErkinFGKLT09] Z. Erkin, M. Franz, J. Guajardo, S. Katzenbeisser, I. Lagendijk, T. Toft: Privacy-preserving face recognition. In PETS’09. [GMW87] O. Goldreich, S. Micali, A. Wigderson: How to play any mental game or a completeness theorem for protocols with honest majority. In STOC’87. [IKNP03] Y. Ishai, J. Kilian, K. Nissim, E. Petrank: Extending oblivious transfers efficiently. In CRYPTO’03. [ImpagliazzoRudich89] R. Impagliazzo, S. Rudich. Limits on the provable consequences of one-way permutations. In STOC’89. [NaorPinkas01] M. Naor, B. Pinkas: Efficient oblivious transfer protocols. In SODA’01. [NaorPS99] M. Naor, B. Pinkas, R. Sumner: Privacy preserving auctions and mechanism design. In EC’99. [SHSSK15] E. M. Songhori, S. U. Hussain, A.-R. Sadeghi, T. Schneider, F. Koushanfar: TinyGarble: Highly compressed and scalable sequential garbled circuits. In IEEE S&P’15. [SZ13] T. Schneider, M. Zohner: GMW vs. Yao? Efficient secure two-party computation with low depth circuits. In FC’13. [Troncoso-PastorizaKC07] J. R. Troncoso-Pasoriza, S. Katzenbeisser, M. U. Celik: Privacy preserving error resilient DNA searching through oblivious automata. In ACM CCS’07. [Yao86] A. C. Yao. How to generate and exchange secrets. In FOCS’86.

67