Optimizations of Generic Protocols for Semi-Honest Adversaries

Thomas Schneider (TU Darmstadt)

5th Bar-Ilan Winter School on , Feb 2015 Secure Two-Party Computation

x y f(x,y) f

This Lecture: Semi-Honest (Passive) Adversaries

2 Secure Two-Party Computation

Auctions [NaorPS99], ...

Remote Diagnostics [BrickellPSW07], ...

DNA Searching [Troncoso-PastorizaKC07], ...

Biometric Identification [ErkinFGKLT09], ...

Medical Diagnostics [BarniFKLSS09], ...

3 (OT)

(x0, x1) r

OT xr

1-out-of-2 OT is an essential building block for secure computation.

4 How to Measure Efficiency of a Protocol?

✓ Runtime (depends on implementation & scenario)

5 How to Measure Efficiency of a Protocol?

✓ Runtime (depends on implementation & scenario) ✓ Communication • # bits sent (important for networks with low bandwidth) • # rounds (important for networks with high latency)

5 How to Measure Efficiency of a Protocol?

✓ Runtime (depends on implementation & scenario) ✓ Communication • # bits sent (important for networks with low bandwidth) • # rounds (important for networks with high latency) ? Computation • Usually: count # crypto operations, e.g., • # modular exponentiations • # point multiplications • # hash function evaluations (SHA) • # block cipher evaluations (AES) faster • # One-Time Pad evaluations • But also non-cryptographic operations do matter!

5 Overview of this Lecture

Public Key Crypto >> Symmetric Crypto >> One-Time Pad

6 Overview of this Lecture

Generic Protocols

Boolean Circuit

Yao GMW

OT

Public Key Crypto >> Symmetric Crypto >> One-Time Pad

6 Overview of this Lecture

Generic Protocols

Part 1: Efficient Garbled Circuits Boolean Circuit

Yao GMW

OT

Public Key Crypto >> Symmetric Crypto >> One-Time Pad

6 Overview of this Lecture

Generic Protocols

Part 1: Efficient Garbled Circuits Boolean Circuit

Yao GMW Part 2: Efficient OTs

OT

Public Key Crypto >> Symmetric Crypto >> One-Time Pad

6 Overview of this Lecture

Generic Protocols Part 3: Efficient Circuits and Yao vs. GMW

Part 1: Efficient Garbled Circuits Boolean Circuit

Yao GMW Part 2: Efficient OTs

OT

Public Key Crypto >> Symmetric Crypto >> One-Time Pad

6 Yao’s Garbled Circuits Protocol [Yao86]

f( , ) e.g., x < y · · private data x = x1, .., xn private data y = y1, .., yn

7 Yao’s Garbled Circuits Protocol [Yao86]

f( , ) e.g., x < y · · private data x = x1, .., xn private data y = y1, .., yn

xn yn x2 y2 x1 y1 Circuit < ... c2 < c1 < z

7 Yao’s Garbled Circuits Protocol [Yao86]

f( , ) e.g., x < y · · private data x = x1, .., xn private data y = y1, .., yn

xn yn x2 y2 x1 y1 Circuit < ... c2 < c1 < z xn yn x2 y2 x1 y1

Garbled ... c2 c1 Circuit C z

7 Yao’s Garbled Circuits Protocol [Yao86]

f( , ) e.g., x < y · · private data x = x1, .., xn private data y = y1, .., yn

xn yn x2 y2 x1 y1 Circuit < ... c2 < c1 < z xn yn x2 y2 x1 y1

Garbled ... c2 c1 Circuit C z 0 1 c1, c1 Garbled Values e e

7 Yao’s Garbled Circuits Protocol [Yao86]

f( , ) e.g., x < y · · private data x = x1, .., xn private data y = y1, .., yn

xn yn x2 y2 x1 y1 Circuit < ... c2 < c1 < z xn yn x2 y2 x1 y1

Garbled ... c2 c1 Circuit C z 0 0 g(0,0) E(x1, y1; c1 ) 0 1 g(0,1) c0, c1 E(x1, y1; c1 ) 1 1 e1 e0 eg(1,0) Garbled E(x1, y1; c1 ) e1 e1 eg(1,1) Values E(x1, y1; c1 ) e e Garblede e eTable e e e 7 Yao’s Garbled Circuits Protocol [Yao86]

f( , ) e.g., x < y · · private data x = x1, .., xn private data y = y1, .., yn

xn yn x2 y2 x1 y1 Circuit < ... c2 < c1 < z xn yn x2 y2 x1 y1

Garbled ... c2 c1 Circuit C C z 0 0 g(0,0) E(x1, y1; c1 ) 0 1 g(0,1) e c0, c1 E(x1, y1; c1 ) 1 1 e1 e0 eg(1,0) Garbled E(x1, y1; c1 ) e1 e1 eg(1,1) Values E(x1, y1; c1 ) e e Garblede e eTable e e e 7 Yao’s Garbled Circuits Protocol [Yao86]

f( , ) e.g., x < y · · private data x = x1, .., xn private data y = y1, .., yn

xn yn x2 y2 x1 y1 Circuit < ... c2 < c1 < z xn yn x2 y2 x1 y1

Garbled ... c2 c1 Circuit C C z 0 0 g(0,0) E(x1, y1; c1 ) y 0 1 g(0,1) e c0, c1 E(x1, y1; c1 ) 1 1 e1 e0 eg(1,0) Garbled E(x1, y1; c1 ) e e1 e1 eg(1,1) Values E(x1, y1; c1 ) e e Garblede e eTable e e e 7 Yao’s Garbled Circuits Protocol [Yao86]

f( , ) e.g., x < y · · private data x = x1, .., xn private data y = y1, .., yn

xn yn x2 y2 x1 y1 Circuit < ... c2 < c1 < z xn yn x2 y2 x1 y1

Garbled ... c2 c1 Circuit C C z 0 0 g(0,0) E(x1, y1; c1 ) y 0 1 g(0,1) e c0, c1 E(x1, y1; c1 ) (x; ) OT(x;(x0, x1)) 1 1 e1 e0 eg(1,0) Garbled E(x1, y1; c1 ) ? e e1 e1 eg(1,1) Values E(x1, y1; c1 ) e e e e e Garblede e eTable e e e 7 Yao’s Garbled Circuits Protocol [Yao86]

f( , ) e.g., x < y · · private data x = x1, .., xn private data y = y1, .., yn

xn yn x2 y2 x1 y1 Circuit < ... c2 < c1 < z xn yn x2 y2 x1 y1

Garbled ... c2 c1 Circuit C C z 0 0 g(0,0) E(x1, y1; c1 ) y 0 1 g(0,1) e c0, c1 E(x1, y1; c1 ) (x; ) OT(x;(x0, x1)) 1 1 e1 e0 eg(1,0) Garbled E(x1, y1; c1 ) ? e e1 e1 eg(1,1) Values E(x1, y1; c1 ) f(x, y)=C(x, y) e e e e e Garblede e eTable e e e 7 e e e Yao’s Garbled Circuits Protocol [Yao86]

f( , ) e.g., x < y · · private data x = x1, .., xn private data y = y1, .., yn

xn yn x2 y2 x1 y1 Circuit < ... c2 < c1 < z xn yn x2 y2 x1 y1

Part 1: Efficient Garbled Circuits Garbled ... c2 c1 Circuit C C z 0 0 g(0,0) E(x1, y1; c1 ) y 0 1 g(0,1) e c0, c1 E(x1, y1; c1 ) (x; ) OT(x;(x0, x1)) 1 1 e1 e0 eg(1,0) Garbled E(x1, y1; c1 ) ? e e1 e1 eg(1,1) Values E(x1, y1; c1 ) f(x, y)=C(x, y) e e e e e Garblede e eTable e e e 7 e e e Yao’s Garbled Circuits Protocol [Yao86]

f( , ) e.g., x < y · · private data x = x1, .., xn private data y = y1, .., yn

xn yn x2 y2 x1 y1 Circuit < ... c2 < c1 < z xn yn x2 y2 x1 y1

Part 1: Efficient Garbled Circuits Garbled ... c2 c1 Circuit C C z 0 0 g(0,0) E(x1, y1; c1 ) y 0 1 g(0,1) e c0, c1 E(x1, y1; c1 ) (x; ) OT(x;(x0, x1)) 1 1 e1 e0 eg(1,0) Garbled E(x1, y1; c1 ) ? e e1 e1 eg(1,1) Values E(x1, y1; c1 ) f(x, y)=C(x, y) e e e Part 2: Efficiente OTe Garblede e eTable e e e 7 e e e Yao’s Garbled Circuits Protocol [Yao86]

f( , ) e.g., x < y · · private data x = x1, .., xn private data y = y1, .., yn

xn yn x2 y2 x1 y1 c c Part 3: Efficient Circuits Circuit < ... 2 < 1 < z xn yn x2 y2 x1 y1

Part 1: Efficient Garbled Circuits Garbled ... c2 c1 Circuit C C z 0 0 g(0,0) E(x1, y1; c1 ) y 0 1 g(0,1) e c0, c1 E(x1, y1; c1 ) (x; ) OT(x;(x0, x1)) 1 1 e1 e0 eg(1,0) Garbled E(x1, y1; c1 ) ? e e1 e1 eg(1,1) Values E(x1, y1; c1 ) f(x, y)=C(x, y) e e e Part 2: Efficiente OTe Garblede e eTable e e e 7 e e e The GMW Protocol [GMW87]

a b

c

^ d

8 The GMW Protocol [GMW87]

Secret share inputs: a = a1 ⊕ a2 a b b = b1 ⊕ b2 c

^ d

8 The GMW Protocol [GMW87]

Secret share inputs: a = a1 ⊕ a2 a b b = b1 ⊕ b2 Non-Interactive XOR gates: c1 = a1 ⊕ b1 ; c2 = a2 ⊕ b2 c

^ d

8 The GMW Protocol [GMW87]

Secret share inputs: a = a1 ⊕ a2 a b b = b1 ⊕ b2 Non-Interactive XOR gates: c1 = a1 ⊕ b1 ; c2 = a2 ⊕ b2 c

Interactive AND gates: c1,b1 c2,b2 ^ AND d1 ∧ d2 d

8 The GMW Protocol [GMW87]

Secret share inputs: a = a1 ⊕ a2 a b b = b1 ⊕ b2 Non-Interactive XOR gates: c1 = a1 ⊕ b1 ; c2 = a2 ⊕ b2 c

Interactive AND gates: c1,b1 c2,b2 ^ AND d1 ∧ d2 d

Recombine outputs: d = d1 ⊕ d2

8 The GMW Protocol [GMW87]

Secret share inputs: a = a1 ⊕ a2 a b b = b1 ⊕ b2 Non-Interactive XOR gates: c1 = a1 ⊕ b1 ; c2 = a2 ⊕ b2 c

Interactive AND gates: c1,b1 c2,b2 ^ AND d1 ∧ d2 d

Part 3: Efficient Circuits

Recombine outputs: d = d1 ⊕ d2

8 Evaluating ANDs via Multiplication Triples [Beaver91]

9 Evaluating ANDs via Multiplication Triples [Beaver91]

The Aim: Generate a multiplication triple (a1⊕a2) (b1⊕b2) = c1⊕c2

9 Evaluating ANDs via Multiplication Triples [Beaver91]

The Aim: Generate a multiplication triple (a1⊕a2) (b1⊕b2) = c1⊕c2

• P1’s output: a1,b1,c1

9 Evaluating ANDs via Multiplication Triples [Beaver91]

The Aim: Generate a multiplication triple (a1⊕a2) (b1⊕b2) = c1⊕c2

• P1’s output: a1,b1,c1

• P2’s output: a2,b2,c2

9 Evaluating ANDs via Multiplication Triples [Beaver91]

The Aim: Generate a multiplication triple (a1⊕a2) (b1⊕b2) = c1⊕c2

• P1’s output: a1,b1,c1

• P2’s output: a2,b2,c2

• Property: (a1⊕a2) (b1⊕b2) = c1⊕c2

9 Evaluating ANDs via Multiplication Triples [Beaver91]

The Aim: Generate a multiplication triple (a1⊕a2) (b1⊕b2) = c1⊕c2

• P1’s output: a1,b1,c1

• P2’s output: a2,b2,c2

• Property: (a1⊕a2) (b1⊕b2) = c1⊕c2

• Observe that c1⊕c2= a1b1⊕a2b1⊕a1b2⊕a2b2

9 Evaluating ANDs via Multiplication Triples [Beaver91]

The Aim: Generate a multiplication triple (a1⊕a2) (b1⊕b2) = c1⊕c2

• P1’s output: a1,b1,c1

• P2’s output: a2,b2,c2

• Property: (a1⊕a2) (b1⊕b2) = c1⊕c2

• Observe that c1⊕c2= a1b1⊕a2b1⊕a1b2⊕a2b2 The Protocol:

9 Evaluating ANDs via Multiplication Triples [Beaver91]

The Aim: Generate a multiplication triple (a1⊕a2) (b1⊕b2) = c1⊕c2

• P1’s output: a1,b1,c1

• P2’s output: a2,b2,c2

• Property: (a1⊕a2) (b1⊕b2) = c1⊕c2

• Observe that c1⊕c2= a1b1⊕a2b1⊕a1b2⊕a2b2 The Protocol:

1. P1: choose m0,m1 ∈R {0,1}; P2: choose a2 ∈R {0,1}

9 Evaluating ANDs via Multiplication Triples [Beaver91]

The Aim: Generate a multiplication triple (a1⊕a2) (b1⊕b2) = c1⊕c2

• P1’s output: a1,b1,c1

• P2’s output: a2,b2,c2

• Property: (a1⊕a2) (b1⊕b2) = c1⊕c2

• Observe that c1⊕c2= a1b1⊕a2b1⊕a1b2⊕a2b2 The Protocol:

1. P1: choose m0,m1 ∈R {0,1}; P2: choose a2 ∈R {0,1}

2. P1 and P2 run OT: P1 inputs (m0,m1), P2 inputs a2 and gets u2 = ma2

9 Evaluating ANDs via Multiplication Triples [Beaver91]

The Aim: Generate a multiplication triple (a1⊕a2) (b1⊕b2) = c1⊕c2

• P1’s output: a1,b1,c1

• P2’s output: a2,b2,c2

• Property: (a1⊕a2) (b1⊕b2) = c1⊕c2

• Observe that c1⊕c2= a1b1⊕a2b1⊕a1b2⊕a2b2 The Protocol:

1. P1: choose m0,m1 ∈R {0,1}; P2: choose a2 ∈R {0,1}

2. P1 and P2 run OT: P1 inputs (m0,m1), P2 inputs a2 and gets u2 = ma2

3. P1 sets b1 = m0⊕m1; v1 = m0

9 Evaluating ANDs via Multiplication Triples [Beaver91]

The Aim: Generate a multiplication triple (a1⊕a2) (b1⊕b2) = c1⊕c2

• P1’s output: a1,b1,c1

• P2’s output: a2,b2,c2

• Property: (a1⊕a2) (b1⊕b2) = c1⊕c2

• Observe that c1⊕c2= a1b1⊕a2b1⊕a1b2⊕a2b2 The Protocol:

1. P1: choose m0,m1 ∈R {0,1}; P2: choose a2 ∈R {0,1}

2. P1 and P2 run OT: P1 inputs (m0,m1), P2 inputs a2 and gets u2 = ma2

3. P1 sets b1 = m0⊕m1; v1 = m0

• Observe: v1⊕u2 = m0⊕ma2

9 Evaluating ANDs via Multiplication Triples [Beaver91]

The Aim: Generate a multiplication triple (a1⊕a2) (b1⊕b2) = c1⊕c2

• P1’s output: a1,b1,c1

• P2’s output: a2,b2,c2

• Property: (a1⊕a2) (b1⊕b2) = c1⊕c2

• Observe that c1⊕c2= a1b1⊕a2b1⊕a1b2⊕a2b2 The Protocol:

1. P1: choose m0,m1 ∈R {0,1}; P2: choose a2 ∈R {0,1}

2. P1 and P2 run OT: P1 inputs (m0,m1), P2 inputs a2 and gets u2 = ma2

3. P1 sets b1 = m0⊕m1; v1 = m0

• Observe: v1⊕u2 = m0⊕ma2

• If a2=0 then v1⊕u2 = m0⊕m0 = 0 = a2b1

9 Evaluating ANDs via Multiplication Triples [Beaver91]

The Aim: Generate a multiplication triple (a1⊕a2) (b1⊕b2) = c1⊕c2

• P1’s output: a1,b1,c1

• P2’s output: a2,b2,c2

• Property: (a1⊕a2) (b1⊕b2) = c1⊕c2

• Observe that c1⊕c2= a1b1⊕a2b1⊕a1b2⊕a2b2 The Protocol:

1. P1: choose m0,m1 ∈R {0,1}; P2: choose a2 ∈R {0,1}

2. P1 and P2 run OT: P1 inputs (m0,m1), P2 inputs a2 and gets u2 = ma2

3. P1 sets b1 = m0⊕m1; v1 = m0

• Observe: v1⊕u2 = m0⊕ma2

• If a2=0 then v1⊕u2 = m0⊕m0 = 0 = a2b1

• If a2=1 then v1⊕u2 = m0⊕m1 = b1 = a2b1

9 Evaluating ANDs via Multiplication Triples [Beaver91]

The Aim: Generate a multiplication triple (a1⊕a2) (b1⊕b2) = c1⊕c2

• P1’s output: a1,b1,c1

• P2’s output: a2,b2,c2

• Property: (a1⊕a2) (b1⊕b2) = c1⊕c2

• Observe that c1⊕c2= a1b1⊕a2b1⊕a1b2⊕a2b2 The Protocol:

1. P1: choose m0,m1 ∈R {0,1}; P2: choose a2 ∈R {0,1}

2. P1 and P2 run OT: P1 inputs (m0,m1), P2 inputs a2 and gets u2 = ma2

3. P1 sets b1 = m0⊕m1; v1 = m0

• Observe: v1⊕u2 = m0⊕ma2

• If a2=0 then v1⊕u2 = m0⊕m0 = 0 = a2b1

• If a2=1 then v1⊕u2 = m0⊕m1 = b1 = a2b1

4. P1 and P2 repeat steps 1-3 with reversed roles to obtain (a1,u1); (b2,v2)

9 Evaluating ANDs via Multiplication Triples [Beaver91]

The Aim: Generate a multiplication triple (a1⊕a2) (b1⊕b2) = c1⊕c2

• P1’s output: a1,b1,c1

• P2’s output: a2,b2,c2

• Property: (a1⊕a2) (b1⊕b2) = c1⊕c2

• Observe that c1⊕c2= a1b1⊕a2b1⊕a1b2⊕a2b2 The Protocol:

1. P1: choose m0,m1 ∈R {0,1}; P2: choose a2 ∈R {0,1}

2. P1 and P2 run OT: P1 inputs (m0,m1), P2 inputs a2 and gets u2 = ma2

3. P1 sets b1 = m0⊕m1; v1 = m0

• Observe: v1⊕u2 = m0⊕ma2

• If a2=0 then v1⊕u2 = m0⊕m0 = 0 = a2b1

• If a2=1 then v1⊕u2 = m0⊕m1 = b1 = a2b1

4. P1 and P2 repeat steps 1-3 with reversed roles to obtain (a1,u1); (b2,v2)

5. Pi sets ci = (aibi) ⊕ ui ⊕ vi

9 Evaluating ANDs via Multiplication Triples [Beaver91]

The Aim: Generate a multiplication triple (a1⊕a2) (b1⊕b2) = c1⊕c2

• P1’s output: a1,b1,c1

• P2’s output: a2,b2,c2

• Property: (a1⊕a2) (b1⊕b2) = c1⊕c2

• Observe that c1⊕c2= a1b1⊕a2b1⊕a1b2⊕a2b2 The Protocol:

1. P1: choose m0,m1 ∈R {0,1}; P2: choose a2 ∈R {0,1}

2. P1 and P2 run OT: P1 inputs (m0,m1), P2 inputs a2 and gets u2 = ma2

3. P1 sets b1 = m0⊕m1; v1 = m0

• Observe: v1⊕u2 = m0⊕ma2

• If a2=0 then v1⊕u2 = m0⊕m0 = 0 = a2b1

• If a2=1 then v1⊕u2 = m0⊕m1 = b1 = a2b1

4. P1 and P2 repeat steps 1-3 with reversed roles to obtain (a1,u1); (b2,v2)

5. Pi sets ci = (aibi) ⊕ ui ⊕ vi

Observe: c1⊕c2 = a1b1⊕u1⊕v1 ⊕ a2b2⊕u2⊕v2 = a1b1⊕a2b1⊕a1b2⊕a2b2

9 Evaluating ANDs via Multiplication Triples [Beaver91]

c1x,b1, 1y1 xc22, y,b2 2 AND∧ dz11 zd2 2

10 Evaluating ANDs via Multiplication Triples [Beaver91]

The aim: Compute AND using multiplication triple

c1x,b1, 1y1 xc22, y,b2 2 AND∧ dz11 zd2 2

10 Evaluating ANDs via Multiplication Triples [Beaver91]

The aim: Compute AND using multiplication triple

Given: a1,b1,c1 and a2,b2,c2 such that c1⊕c2= a1b1⊕a2b1⊕a1b2⊕a2b2

c1x,b1, 1y1 xc22, y,b2 2 AND∧ dz11 zd2 2

10 Evaluating ANDs via Multiplication Triples [Beaver91]

The aim: Compute AND using multiplication triple

Given: a1,b1,c1 and a2,b2,c2 such that c1⊕c2= a1b1⊕a2b1⊕a1b2⊕a2b2

c1x,b1, 1y1 xc22, y,b2 2 P1 sends P2: d1=x1⊕a1; e1=y1⊕b1 AND∧ dz11 zd2 2

10 Evaluating ANDs via Multiplication Triples [Beaver91]

The aim: Compute AND using multiplication triple

Given: a1,b1,c1 and a2,b2,c2 such that c1⊕c2= a1b1⊕a2b1⊕a1b2⊕a2b2

c1x,b1, 1y1 xc22, y,b2 2 P1 sends P2: d1=x1⊕a1; e1=y1⊕b1 AND P sends P : d =x a ; e =y b ∧ 2 1 2 2⊕ 2 2 2⊕ 2 dz11 zd2 2

10 Evaluating ANDs via Multiplication Triples [Beaver91]

The aim: Compute AND using multiplication triple

Given: a1,b1,c1 and a2,b2,c2 such that c1⊕c2= a1b1⊕a2b1⊕a1b2⊕a2b2

c1x,b1, 1y1 xc22, y,b2 2 P1 sends P2: d1=x1⊕a1; e1=y1⊕b1 AND P sends P : d =x a ; e =y b ∧ 2 1 2 2⊕ 2 2 2⊕ 2 dz11 zd2 2

P1 and P2 locally compute: d=d1⊕d2; e=e1⊕e2

10 Evaluating ANDs via Multiplication Triples [Beaver91]

The aim: Compute AND using multiplication triple

Given: a1,b1,c1 and a2,b2,c2 such that c1⊕c2= a1b1⊕a2b1⊕a1b2⊕a2b2

c1x,b1, 1y1 xc22, y,b2 2 P1 sends P2: d1=x1⊕a1; e1=y1⊕b1 AND P sends P : d =x a ; e =y b ∧ 2 1 2 2⊕ 2 2 2⊕ 2 dz11 zd2 2

P1 and P2 locally compute: d=d1⊕d2; e=e1⊕e2 P1 outputs: z1 = db1⊕ea1⊕c1⊕de

10 Evaluating ANDs via Multiplication Triples [Beaver91]

The aim: Compute AND using multiplication triple

Given: a1,b1,c1 and a2,b2,c2 such that c1⊕c2= a1b1⊕a2b1⊕a1b2⊕a2b2

c1x,b1, 1y1 xc22, y,b2 2 P1 sends P2: d1=x1⊕a1; e1=y1⊕b1 AND P sends P : d =x a ; e =y b ∧ 2 1 2 2⊕ 2 2 2⊕ 2 dz11 zd2 2

P1 and P2 locally compute: d=d1⊕d2; e=e1⊕e2 P1 outputs: z1 = db1⊕ea1⊕c1⊕de P2 outputs: z2 = db2⊕ea2⊕c2

10 Evaluating ANDs via Multiplication Triples [Beaver91]

The aim: Compute AND using multiplication triple

Given: a1,b1,c1 and a2,b2,c2 such that c1⊕c2= a1b1⊕a2b1⊕a1b2⊕a2b2

c1x,b1, 1y1 xc22, y,b2 2 P1 sends P2: d1=x1⊕a1; e1=y1⊕b1 AND P sends P : d =x a ; e =y b ∧ 2 1 2 2⊕ 2 2 2⊕ 2 dz11 zd2 2

P1 and P2 locally compute: d=d1⊕d2; e=e1⊕e2 P1 outputs: z1 = db1⊕ea1⊕c1⊕de P2 outputs: z2 = db2⊕ea2⊕c2

On the board: it holds z1⊕z2 = (x1⊕x2)(y1⊕y2)

10 Part 1: Efficient Garbled Circuits

A AB B

11 Garbled Circuits [Yao86] Conventional circuit

01

0 1

01

01 01

(Slide from Viet-Tung Hoang) 12 Garbled Circuits [Yao86] Conventional circuit

0

0

0

0 1

(Slide from Viet-Tung Hoang) 12 Garbled Circuits [Yao86] Conventional circuit Garbled circuit

0

0

0

0 1

(Slide from Viet-Tung Hoang) 12 Garbled Circuits [Yao86] Conventional circuit Garbled circuit

0

keys look random

0

0

0 1

given input keys, can compute output key only

(Slide from Viet-Tung Hoang) 12 Garbled Gate [Yao86] X Y

Y 0

1 X given two input keys, can compute only output key X 2

X 3

A B C D

(Slide from Viet-Tung Hoang) 13 Garbled Gate [Yao86]

Y 0

1 X given two input keys, can compute only output key X 2

X 3

A D

(Slide from Viet-Tung Hoang) 13 Garbled Gate [Yao86] X

1 X given two input keys, can compute only output key

A D

(Slide from Viet-Tung Hoang) 13 Formalization: Garbling Schemes [BellareHoangRogaway12]

F 1k Y e Ev X f Gb x En y De d

f y x ev

(Slide from Viet-Tung Hoang) 14 Formalization: Garbling Schemes [BellareHoangRogaway12]

F 1k Y e Ev X f Gb x En y De d

f y x ev Correctness (∀ f, x, k), if (F, e, d) ← Gb(1k, f), X ← En(e, x), Y ← Ev (F, X), y ← De(d, Y) then y = ev(f, x)

(Slide from Viet-Tung Hoang) 14 Formalization: Garbling Schemes [BellareHoangRogaway12]

F 1k Y e Ev X f Gb x En y De d

f y x ev Correctness (∀ f, x, k), if (F, e, d) ← Gb(1k, f), X En(e, x), ← Privacy (informal): Y ← Ev (F, X), Given (F, X, d) learn nothing but y=f(x). y ← De(d, Y) then y = ev(f, x)

(Slide from Viet-Tung Hoang) 14 Overview of Efficient Garbled Circuit Constructions

1990 Point-and-Permute [BeaverMicaliRogaway]

1999 3-row reduction [NaorPinkasSumner]

2008 Free-XOR [KolesnikovSchneider]

2009 2-row reduction [PinkasSchneiderSmartWilliams]

2012 Garbling via AES [KreuterShelatShen]

2013 Fixed-key AES [BellareHoangKeelveedhiRogaway]

2014 FleXor [KolesnikovMohasselRosulek]

2015 HalfGates [ZahurRosulekEvans]

(Slide from Payman Mohassel) 15 1) Encryption with Efficiently Verifiable Range [LindellPinkas04]

Wires:

Assign random keys ki, Ki to all wires i

16 1) Encryption with Efficiently Verifiable Range [LindellPinkas04]

Wires:

Assign random keys ki, Ki to all wires i

Garbling: For each gate use double-encryption a, A x, X and randomly permute entries: b, B

Ea(Eb(x))

Ea(EB(x))

EA(Eb(x))

EA(EB(X))

16 1) Encryption with Efficiently Verifiable Range [LindellPinkas04]

Wires:

Assign random keys ki, Ki to all wires i

Garbling: For each gate use double-encryption a, A x, X and randomly permute entries: b, B

Ea(Eb(x))

Ea(EB(x))

EA(Eb(x))

EA(EB(X))

Outputs:

For each output wire i: provide mapping [(0, ki), (1, Ki)]

16 1) Encryption with Efficiently Verifiable Range [LindellPinkas04]

Evaluator needs to know which entry was decrypted successfully

17 1) Encryption with Efficiently Verifiable Range [LindellPinkas04]

Evaluator needs to know which entry was decrypted successfully

⇒ Use encryption function with efficiently verifiable range:

n Ek(m) = [r, fk(r) ⊕ (m || 0 )], where f is a pseudo-random function (by pseudorandomness of f, prob. of obtaining 0n with incorrect k is negl.)

17 1) Encryption with Efficiently Verifiable Range [LindellPinkas04]

Evaluator needs to know which entry was decrypted successfully

⇒ Use encryption function with efficiently verifiable range:

n Ek(m) = [r, fk(r) ⊕ (m || 0 )], where f is a pseudo-random function (by pseudorandomness of f, prob. of obtaining 0n with incorrect k is negl.)

⇒ Need to decrypt multiple entries until decryption succeeds

17 1) Encryption with Efficiently Verifiable Range [LindellPinkas04]

Evaluator needs to know which entry was decrypted successfully

⇒ Use encryption function with efficiently verifiable range:

n Ek(m) = [r, fk(r) ⊕ (m || 0 )], where f is a pseudo-random function (by pseudorandomness of f, prob. of obtaining 0n with incorrect k is negl.)

⇒ Need to decrypt multiple entries until decryption succeeds

§ Expected number of decryptions requires is 2.5

17 2) Point and Permute [BeaverMicaliRogaway90]

For every wire i, choose a random signal bit pi together with the key.

0 1

0

1

2

3

1 0 0 1 2) Point and Permute [BeaverMicaliRogaway90]

For every wire i, choose a random signal bit pi together with the key.

The signal bits of the input wires determine which entry to decrypt.

0 1

0

1

2

3

1 0 0 1 2) Point and Permute [BeaverMicaliRogaway90]

For every wire i, choose a random signal bit pi together with the key.

The signal bits of the input wires determine which entry to decrypt.

0

1

2

3

1 0 2) Point and Permute [BeaverMicaliRogaway90]

For every wire i, choose a random signal bit pi together with the key.

The signal bits of the input wires determine which entry to decrypt.

0

1 2 * 1 + 0 = 2

2

3

1 0 2) Point and Permute [BeaverMicaliRogaway90]

For every wire i, choose a random signal bit pi together with the key.

The signal bits of the input wires determine which entry to decrypt.

0

1 2 * 1 + 0 = 2

2

3 => decrypt entry #2

1 0 2) Point and Permute [BeaverMicaliRogaway90]

For every wire i, choose a random signal bit pi together with the key.

The signal bits of the input wires determine which entry to decrypt.

0 1

0

1

2

3

1 0 0 1 2) Point and Permute [BeaverMicaliRogaway90]

Advantages of point-and-permute: • Exactly one entry needs to be decrypted

19 2) Point and Permute [BeaverMicaliRogaway90]

Advantages of point-and-permute: • Exactly one entry needs to be decrypted • Simplifies output decryption

19 2) Point and Permute [BeaverMicaliRogaway90]

Advantages of point-and-permute: • Exactly one entry needs to be decrypted • Simplifies output decryption

If output permutation pi = 0 then output is permutation bit 0 1

If output permutation pi = 1 then output is negated permutation bit 1 0

19 2) Point and Permute [BeaverMicaliRogaway90]

Advantages of point-and-permute: • Exactly one entry needs to be decrypted • Simplifies output decryption

If output permutation pi = 0 then output is permutation bit 0 1

If output permutation pi = 1 then output is negated permutation bit 1 0

⇒ Sender simply reveals for each output wire the bit pi to receiver.

19 2) Point and Permute [BeaverMicaliRogaway90]

Advantages of point-and-permute: • Exactly one entry needs to be decrypted • Simplifies output decryption

If output permutation pi = 0 then output is permutation bit 0 1

If output permutation pi = 1 then output is negated permutation bit 1 0

⇒ Sender simply reveals for each output wire the bit pi to receiver.

In the following we always assume usage of point and permute. p(k) is the permutation bit of key k.

19 3) 3-Row Reduction [NaorPinkasSumner99]

T Encryption function: E (kl, kr; ko) = ko ⊕ F(kl, p(kl) || T) ⊕ F(kr, p(kr) || T), where F is a pseudo-random function, e.g., instantiated with AES.

20 3) 3-Row Reduction [NaorPinkasSumner99]

T Encryption function: E (kl, kr; ko) = ko ⊕ F(kl, p(kl) || T) ⊕ F(kr, p(kr) || T), where F is a pseudo-random function, e.g., instantiated with AES.

Idea: Eliminate first table entry by fixing it to be 0. T E (kl,kr; c) = c ⊕ F(kl, p(kl) || T) ⊕ F(kr, p(kr) || i) != 0

⇒ c = F(kl, p(kl) || T) ⊕ F(kr, p(kr) || T). ⇒ One of the two output keys is derived from the input keys.

20 3) 3-Row Reduction [NaorPinkasSumner99]

T Encryption function: E (kl, kr; ko) = ko ⊕ F(kl, p(kl) || T) ⊕ F(kr, p(kr) || T), where F is a pseudo-random function, e.g., instantiated with AES.

Idea: Eliminate first table entry by fixing it to be 0. T E (kl,kr; c) = c ⊕ F(kl, p(kl) || T) ⊕ F(kr, p(kr) || i) != 0

⇒ c = F(kl, p(kl) || T) ⊕ F(kr, p(kr) || T). ⇒ One of the two output keys is derived from the input keys.

ET(a,B; c) ET(A,b; c) remaining 3 table entries as before ET(A,B; C) }

20 3) 3-Row Reduction [NaorPinkasSumner99]

T Encryption function: E (kl, kr; ko) = ko ⊕ F(kl, p(kl) || T) ⊕ F(kr, p(kr) || T), where F is a pseudo-random function, e.g., instantiated with AES.

Idea: Eliminate first table entry by fixing it to be 0. T E (kl,kr; c) = c ⊕ F(kl, p(kl) || T) ⊕ F(kr, p(kr) || i) != 0

⇒ c = F(kl, p(kl) || T) ⊕ F(kr, p(kr) || T). ⇒ One of the two output keys is derived from the input keys.

ET(a,B; c) ET(A,b; c) remaining 3 table entries as before ET(A,B; C) }

⇒ Communication is reduced from 4 to 3 table entries. 20 4) Free XOR [KolesnikovSchneider08]

T Encryption function: E (kl, kr; ko) = ko ⊕ H(kl || kr || T), where H is a random oracle, e.g., instantiated with SHA-2

21 4) Free XOR [KolesnikovSchneider08]

T Encryption function: E (kl, kr; ko) = ko ⊕ H(kl || kr || T), where H is a random oracle, e.g., instantiated with SHA-2

Idea: Choose keys s.t. each pair has distance R (unknown to evaluator). R = a ⊕ A = b ⊕ B = c ⊕ C = …

21 4) Free XOR [KolesnikovSchneider08]

T Encryption function: E (kl, kr; ko) = ko ⊕ H(kl || kr || T), where H is a random oracle, e.g., instantiated with SHA-2

Idea: Choose keys s.t. each pair has distance R (unknown to evaluator). R = a ⊕ A = b ⊕ B = c ⊕ C = …

Garble XOR: set output key c = a ⊕ b

21 4) Free XOR [KolesnikovSchneider08]

T Encryption function: E (kl, kr; ko) = ko ⊕ H(kl || kr || T), where H is a random oracle, e.g., instantiated with SHA-2

Idea: Choose keys s.t. each pair has distance R (unknown to evaluator). R = a ⊕ A = b ⊕ B = c ⊕ C = …

Garble XOR: set output key c = a ⊕ b

Evaluate XOR: set output key kc = ka ⊕ kb

21 4) Free XOR [KolesnikovSchneider08]

T Encryption function: E (kl, kr; ko) = ko ⊕ H(kl || kr || T), where H is a random oracle, e.g., instantiated with SHA-2

Idea: Choose keys s.t. each pair has distance R (unknown to evaluator). R = a ⊕ A = b ⊕ B = c ⊕ C = …

Garble XOR: set output key c = a ⊕ b

Evaluate XOR: set output key kc = ka ⊕ kb

Correctness: c = a ⊕ b = (R ⊕ a) ⊕ (R ⊕ b) = A ⊕ B C = c ⊕ R = a ⊕ b ⊕ R = a ⊕ B = A ⊕ b

21 4) Free XOR [KolesnikovSchneider08]

T Encryption function: E (kl, kr; ko) = ko ⊕ H(kl || kr || T), where H is a random oracle, e.g., instantiated with SHA-2

Idea: Choose keys s.t. each pair has distance R (unknown to evaluator). R = a ⊕ A = b ⊕ B = c ⊕ C = …

Garble XOR: set output key c = a ⊕ b

Evaluate XOR: set output key kc = ka ⊕ kb

Correctness: c = a ⊕ b = (R ⊕ a) ⊕ (R ⊕ b) = A ⊕ B C = c ⊕ R = a ⊕ b ⊕ R = a ⊕ B = A ⊕ b Security (intuitively): Evaluator knows one key per wire, so never learns R § Requires random oracle or non-standard circularity assumption

21 4) Free XOR [KolesnikovSchneider08]

T Encryption function: E (kl, kr; ko) = ko ⊕ H(kl || kr || T), where H is a random oracle, e.g., instantiated with SHA-2

Idea: Choose keys s.t. each pair has distance R (unknown to evaluator). R = a ⊕ A = b ⊕ B = c ⊕ C = …

Garble XOR: set output key c = a ⊕ b

Evaluate XOR: set output key kc = ka ⊕ kb

Correctness: c = a ⊕ b = (R ⊕ a) ⊕ (R ⊕ b) = A ⊕ B C = c ⊕ R = a ⊕ b ⊕ R = a ⊕ B = A ⊕ b Security (intuitively): Evaluator knows one key per wire, so never learns R § Requires random oracle or non-standard circularity assumption Can be combined with 3-row reduction. 21 5) Garbling via AES Since 2008 many Intel and AMD CPUs have hardware support for AES: Advanced Encryption Standard New Instructions (AES-NI)

22 5) Garbling via AES Since 2008 many Intel and AMD CPUs have hardware support for AES: Advanced Encryption Standard New Instructions (AES-NI)

T [KreuterShelatShen12]: E (kl, kr; ko) = ko ⊕ AES-256(kl || kr; T)

22 5) Garbling via AES Since 2008 many Intel and AMD CPUs have hardware support for AES: Advanced Encryption Standard New Instructions (AES-NI)

T [KreuterShelatShen12]: E (kl, kr; ko) = ko ⊕ AES-256(kl || kr; T) => Needs to run expensive AES key schedule per gate => Also assumes a related-key assumptions (not great for AES)

22 5) Garbling via AES Since 2008 many Intel and AMD CPUs have hardware support for AES: Advanced Encryption Standard New Instructions (AES-NI)

T [KreuterShelatShen12]: E (kl, kr; ko) = ko ⊕ AES-256(kl || kr; T) => Needs to run expensive AES key schedule per gate => Also assumes a related-key assumptions (not great for AES)

[BellareHoangKeelveedhiRogaway13]: Choose fixed key X and run AES key schedule once T E (kl, kr; ko) = ko ⊕ AES-128(X; K) ⊕ K with K = 2kl ⊕ 4kr ⊕ T

22 5) Garbling via AES Since 2008 many Intel and AMD CPUs have hardware support for AES: Advanced Encryption Standard New Instructions (AES-NI)

T [KreuterShelatShen12]: E (kl, kr; ko) = ko ⊕ AES-256(kl || kr; T) => Needs to run expensive AES key schedule per gate => Also assumes a related-key assumptions (not great for AES)

[BellareHoangKeelveedhiRogaway13]: Choose fixed key X and run AES key schedule once T E (kl, kr; ko) = ko ⊕ AES-128(X; K) ⊕ K with K = 2kl ⊕ 4kr ⊕ T

Requires assuming an “ideal cipher” assumption on AES

22 5) Garbling via AES Since 2008 many Intel and AMD CPUs have hardware support for AES: Advanced Encryption Standard New Instructions (AES-NI)

T [KreuterShelatShen12]: E (kl, kr; ko) = ko ⊕ AES-256(kl || kr; T) => Needs to run expensive AES key schedule per gate => Also assumes a related-key assumptions (not great for AES)

[BellareHoangKeelveedhiRogaway13]: Choose fixed key X and run AES key schedule once T E (kl, kr; ko) = ko ⊕ AES-128(X; K) ⊕ K with K = 2kl ⊕ 4kr ⊕ T

Requires assuming an “ideal cipher” assumption on AES Can be combined with free XOR and 3-row reduction

22 6) HalfGates - Today’s Most Efficient Scheme [ZRE14]

procedure Gb(1k,f): procedure En(ˆe, xˆ): k 1 R 0, 1 1 for ei eˆ do ⌘ { } 2 i Inputs(f) Xi ei xiR for do 02 k ˆ Wi ⌘ 0, 1 return X 1 { 0 } Wi Wi R 0 ei Wi procedure De(d,ˆ Yˆ ):

for i/Inputs(f) in topo. order do ˆ 2 { } for di d do a, b GateInputs(f,i) 2 { } yi di lsb Yi if i XorGates(f) then 2 return yˆ W 0 W 0 W 0 i a b else ˆ ˆ 0 0 0 procedure Ev(F,X): (Wi ,TGi,TEi) GbAnd(Wa ,Wb ) for i Inputs(Fˆ) do Fi (TGi,TEi) 2 W X end if i i ˆ W 1 W 0 R for i/Inputs(F ) in topo. order do i i 2 { } for i Outputs(f) do a, b GateInputs(F,iˆ ) 2 0 { } di lsb(Wi ) if i XorGates(Fˆ) then 2 return (F,ˆ e,ˆ dˆ) Wi Wa Wb else 0 0 private procedure GbAnd(Wa ,Wb ): sa lsb Wa; sb lsb Wb 0 0 pa lsb W ; pb lsb W j1 NextIndex(); j2 NextIndex() a b j NextIndex(); j0 NextIndex() (TGi,TEi) Fi

First half gate WGi H(Wa,j1) saTGi { 0 } 1 TG H(Wa ,j) H(Wa ,j) pbR WEi H(Wb,j2) sb(TEi Wa) 0 0 W H(W ,j) paTG Wi WGi WEi G a Second half gate end if { 0 } 1 0 TE H(Wb ,j0) H(Wb ,j0) Wa for i Outputs(Fˆ) do 0 0 0 2 WE H(Wb ,j0) pb(TE Wa ) Yi Wi Combine halves Yˆ { 0 0 }0 return W WG WE 0 return (W ,TG,TE ) 23 Figure 2: Our complete garbling scheme. NextIndex is a stateful procedure that simply increments an internal counter.

Following the description in Section 3.1, we garble each gate using a composition of two half-gates. Conceptually, b b WGi and WEi denote the output wire labels for these two half-gates (generator-side and evaluator-side, respectively) that comprise the ith gate. The final logical output wire label for the ith gate is then set to be W 0 = W 0 W 0 . i Gi Ei Similarly, we use TGi and TEi to denote the single garbled row transmitted for each half gate used in the ith gate. The first rows of Figure 1 show the function being computed by each half gate. In (a), generator knows pb while in (b) the evaluator knows v p = lsb W . The second rows show the two ciphertexts of each half-gate, before they b b b are permuted according to their select bits (in case of (a)) and before garbled row reduction (GRR) is applied. Here, we have expanded W f(x,pb) to W 0 f(x, p )R to make the row reduction clearer in the next step. The third rows Gc Gc b show the final result.

The complete scheme. The full garbling procedure for an entire circuit is shown in Figure 2. For simplicity of discussion and proof, the we assume all gates are either AND or XOR.

4 Security

We now prove the security of our scheme, using the prv.sim and obv.sim security definitions of Bellare, Hoang, and S Rogaway [5]. The scheme shown in Figure 2 does not provideS authenticity, simply because authenticity is not required in many use cases including semi-honest Yao’s circuits. However, there are well-known, standard modifications to the decoding procedure that can add authenticity, which we describe separately in Section 4.3. Finally, since we only consider circuits with just AND and XOR gates, everything about the function f is public and we do not define a separate function (f) to extract public information about f.

7 6) HalfGates - Today’s Most Efficient Scheme [ZRE14]

procedure Gb(1k,f): procedure En(ˆe, xˆ): k 1 R 0, 1 1 for ei eˆ do ⌘ { } 2 i Inputs(f) Xi ei xiR for do 02 k ˆ Wi ⌘ 0, 1 return X 1 { 0 } Wi Wi R 0 ei Wi procedure De(d,ˆ Yˆ ):

for i/Inputs(f) in topo. order do ˆ 2 { } for di d do a, b GateInputs(f,i) 2 { } yi di lsb Yi if i XorGates(f) then 2 return yˆ Free XOR W 0 W 0 W 0 i a b else ˆ ˆ 0 0 0 procedure Ev(F,X): (Wi ,TGi,TEi) GbAnd(Wa ,Wb ) for i Inputs(Fˆ) do Fi (TGi,TEi) 2 W X end if i i ˆ W 1 W 0 R for i/Inputs(F ) in topo. order do i i 2 { } for i Outputs(f) do a, b GateInputs(F,iˆ ) 2 0 { } di lsb(Wi ) if i XorGates(Fˆ) then 2 return (F,ˆ e,ˆ dˆ) Wi Wa Wb else 0 0 private procedure GbAnd(Wa ,Wb ): sa lsb Wa; sb lsb Wb 0 0 pa lsb W ; pb lsb W j1 NextIndex(); j2 NextIndex() a b j NextIndex(); j0 NextIndex() (TGi,TEi) Fi

First half gate WGi H(Wa,j1) saTGi { 0 } 1 TG H(Wa ,j) H(Wa ,j) pbR WEi H(Wb,j2) sb(TEi Wa) 0 0 W H(W ,j) paTG Wi WGi WEi G a Second half gate end if { 0 } 1 0 TE H(Wb ,j0) H(Wb ,j0) Wa for i Outputs(Fˆ) do 0 0 0 2 WE H(Wb ,j0) pb(TE Wa ) Yi Wi Combine halves Yˆ { 0 0 }0 return W WG WE 0 return (W ,TG,TE ) 23 Figure 2: Our complete garbling scheme. NextIndex is a stateful procedure that simply increments an internal counter.

Following the description in Section 3.1, we garble each gate using a composition of two half-gates. Conceptually, b b WGi and WEi denote the output wire labels for these two half-gates (generator-side and evaluator-side, respectively) that comprise the ith gate. The final logical output wire label for the ith gate is then set to be W 0 = W 0 W 0 . i Gi Ei Similarly, we use TGi and TEi to denote the single garbled row transmitted for each half gate used in the ith gate. The first rows of Figure 1 show the function being computed by each half gate. In (a), generator knows pb while in (b) the evaluator knows v p = lsb W . The second rows show the two ciphertexts of each half-gate, before they b b b are permuted according to their select bits (in case of (a)) and before garbled row reduction (GRR) is applied. Here, we have expanded W f(x,pb) to W 0 f(x, p )R to make the row reduction clearer in the next step. The third rows Gc Gc b show the final result.

The complete scheme. The full garbling procedure for an entire circuit is shown in Figure 2. For simplicity of discussion and proof, the we assume all gates are either AND or XOR.

4 Security

We now prove the security of our scheme, using the prv.sim and obv.sim security definitions of Bellare, Hoang, and S Rogaway [5]. The scheme shown in Figure 2 does not provideS authenticity, simply because authenticity is not required in many use cases including semi-honest Yao’s circuits. However, there are well-known, standard modifications to the decoding procedure that can add authenticity, which we describe separately in Section 4.3. Finally, since we only consider circuits with just AND and XOR gates, everything about the function f is public and we do not define a separate function (f) to extract public information about f.

7 6) HalfGates - Today’s Most Efficient Scheme [ZRE14]

procedure Gb(1k,f): procedure En(ˆe, xˆ): k 1 R 0, 1 1 for ei eˆ do ⌘ { } 2 i Inputs(f) Xi ei xiR for do 02 k ˆ Wi ⌘ 0, 1 return X 1 { 0 } Wi Wi R 0 ei Wi procedure De(d,ˆ Yˆ ):

for i/Inputs(f) in topo. order do ˆ 2 { } for di d do a, b GateInputs(f,i) 2 { } yi di lsb Yi if i XorGates(f) then 2 return yˆ Free XOR W 0 W 0 W 0 i a b else ˆ ˆ 0 0 0 procedure Ev(F,X): (Wi ,TGi,TEi) GbAnd(Wa ,Wb ) for i Inputs(Fˆ) do Fi (TGi,TEi) 2 W X end if i i ˆ W 1 W 0 R for i/Inputs(F ) in topo. order do i i 2 { } for i Outputs(f) do a, b GateInputs(F,iˆ ) 2 0 { } di lsb(Wi ) if i XorGates(Fˆ) then 2 return (F,ˆ e,ˆ dˆ) Wi Wa Wb else 0 0 private procedure GbAnd(Wa ,Wb ): sa lsb Wa; sb lsb Wb 0 0 pa lsb W ; pb lsb W j1 NextIndex(); j2 NextIndex() a b j NextIndex(); j0 NextIndex() (TGi,TEi) Fi

First half gate WGi H(Wa,j1) saTGi { 0 } 1 TG H(Wa ,j) H(Wa ,j) pbR WEi H(Wb,j2) sb(TEi Wa) 0 0 W H(W ,j) paTG Wi WGi WEi 4 calls of H G a for garbling Second half gate end if { 0 } 1 0 AND TE H(Wb ,j0) H(Wb ,j0) Wa for i Outputs(Fˆ) do 0 0 0 2 WE H(Wb ,j0) pb(TE Wa ) Yi Wi Combine halves Yˆ { 0 0 }0 return W WG WE 0 return (W ,TG,TE ) 23 Figure 2: Our complete garbling scheme. NextIndex is a stateful procedure that simply increments an internal counter.

Following the description in Section 3.1, we garble each gate using a composition of two half-gates. Conceptually, b b WGi and WEi denote the output wire labels for these two half-gates (generator-side and evaluator-side, respectively) that comprise the ith gate. The final logical output wire label for the ith gate is then set to be W 0 = W 0 W 0 . i Gi Ei Similarly, we use TGi and TEi to denote the single garbled row transmitted for each half gate used in the ith gate. The first rows of Figure 1 show the function being computed by each half gate. In (a), generator knows pb while in (b) the evaluator knows v p = lsb W . The second rows show the two ciphertexts of each half-gate, before they b b b are permuted according to their select bits (in case of (a)) and before garbled row reduction (GRR) is applied. Here, we have expanded W f(x,pb) to W 0 f(x, p )R to make the row reduction clearer in the next step. The third rows Gc Gc b show the final result.

The complete scheme. The full garbling procedure for an entire circuit is shown in Figure 2. For simplicity of discussion and proof, the we assume all gates are either AND or XOR.

4 Security

We now prove the security of our scheme, using the prv.sim and obv.sim security definitions of Bellare, Hoang, and S Rogaway [5]. The scheme shown in Figure 2 does not provideS authenticity, simply because authenticity is not required in many use cases including semi-honest Yao’s circuits. However, there are well-known, standard modifications to the decoding procedure that can add authenticity, which we describe separately in Section 4.3. Finally, since we only consider circuits with just AND and XOR gates, everything about the function f is public and we do not define a separate function (f) to extract public information about f.

7 6) HalfGates - Today’s Most Efficient Scheme [ZRE14]

procedure Gb(1k,f): procedure En(ˆe, xˆ): k 1 R 0, 1 1 for ei eˆ do ⌘ { } 2 i Inputs(f) Xi ei xiR for do 02 k ˆ Wi ⌘ 0, 1 return X 1 { 0 } Wi Wi R 0 ei Wi procedure De(d,ˆ Yˆ ):

for i/Inputs(f) in topo. order do ˆ 2 { } for di d do a, b GateInputs(f,i) 2 { } yi di lsb Yi if i XorGates(f) then 2 return yˆ Free XOR W 0 W 0 W 0 i a b else ˆ ˆ 0 0 0 procedure Ev(F,X): (Wi ,TGi,TEi) GbAnd(Wa ,Wb ) for i Inputs(Fˆ) do Fi (TGi,TEi) 2 W X end if i i ˆ W 1 W 0 R for i/Inputs(F ) in topo. order do i i 2 { } for i Outputs(f) do a, b GateInputs(F,iˆ ) 2 0 { } di lsb(Wi ) if i XorGates(Fˆ) then 2 return (F,ˆ e,ˆ dˆ) Wi Wa Wb else 0 0 private procedure GbAnd(Wa ,Wb ): sa lsb Wa; sb lsb Wb 0 0 pa lsb W ; pb lsb W j1 NextIndex(); j2 NextIndex() a b j NextIndex(); j0 NextIndex() (TGi,TEi) Fi

First half gate WGi H(Wa,j1) saTGi { 0 } 1 TG H(Wa ,j) H(Wa ,j) pbR WEi H(Wb,j2) sb(TEi Wa) 0 0 W H(W ,j) paTG Wi WGi WEi 4 calls of H G a for garbling Second half gate end if { 0 } 1 0 AND TE H(Wb ,j0) H(Wb ,j0) Wa for i Outputs(Fˆ) do 0 0 0 2 WE H(Wb ,j0) pb(TE Wa ) Yi Wi Combine halves Yˆ { 0 0 }0 return W WG WE 0 2 table entries per AND! return (W ,TG,TE ) 23 Figure 2: Our complete garbling scheme. NextIndex is a stateful procedure that simply increments an internal counter.

Following the description in Section 3.1, we garble each gate using a composition of two half-gates. Conceptually, b b WGi and WEi denote the output wire labels for these two half-gates (generator-side and evaluator-side, respectively) that comprise the ith gate. The final logical output wire label for the ith gate is then set to be W 0 = W 0 W 0 . i Gi Ei Similarly, we use TGi and TEi to denote the single garbled row transmitted for each half gate used in the ith gate. The first rows of Figure 1 show the function being computed by each half gate. In (a), generator knows pb while in (b) the evaluator knows v p = lsb W . The second rows show the two ciphertexts of each half-gate, before they b b b are permuted according to their select bits (in case of (a)) and before garbled row reduction (GRR) is applied. Here, we have expanded W f(x,pb) to W 0 f(x, p )R to make the row reduction clearer in the next step. The third rows Gc Gc b show the final result.

The complete scheme. The full garbling procedure for an entire circuit is shown in Figure 2. For simplicity of discussion and proof, the we assume all gates are either AND or XOR.

4 Security

We now prove the security of our scheme, using the prv.sim and obv.sim security definitions of Bellare, Hoang, and S Rogaway [5]. The scheme shown in Figure 2 does not provideS authenticity, simply because authenticity is not required in many use cases including semi-honest Yao’s circuits. However, there are well-known, standard modifications to the decoding procedure that can add authenticity, which we describe separately in Section 4.3. Finally, since we only consider circuits with just AND and XOR gates, everything about the function f is public and we do not define a separate function (f) to extract public information about f.

7 6) HalfGates - Today’s Most Efficient Scheme [ZRE14]

procedure Gb(1k,f): procedure En(ˆe, xˆ): k 1 R 0, 1 1 for ei eˆ do ⌘ { } 2 i Inputs(f) Xi ei xiR for do 02 k ˆ Wi ⌘ 0, 1 return X 1 { 0 } Wi Wi R 0 ei Wi procedure De(d,ˆ Yˆ ):

for i/Inputs(f) in topo. order do ˆ 2 { } for di d do a, b GateInputs(f,i) 2 { } yi di lsb Yi if i XorGates(f) then 2 return yˆ Free XOR W 0 W 0 W 0 i a b else ˆ ˆ 0 0 0 procedure Ev(F,X): (Wi ,TGi,TEi) GbAnd(Wa ,Wb ) for i Inputs(Fˆ) do Fi (TGi,TEi) 2 W X end if i i ˆ W 1 W 0 R for i/Inputs(F ) in topo. order do i i 2 { } for i Outputs(f) do a, b GateInputs(F,iˆ ) 2 0 { } di lsb(Wi ) if i XorGates(Fˆ) then 2 return (F,ˆ e,ˆ dˆ) Wi Wa Wb else 0 0 private procedure GbAnd(Wa ,Wb ): sa lsb Wa; sb lsb Wb 0 0 pa lsb W ; pb lsb W j1 NextIndex(); j2 NextIndex() a b j NextIndex(); j0 NextIndex() (TGi,TEi) Fi 2 calls of H First half gate WGi H(Wa,j1) saTGi { 0 } 1 for evaluating TG H(Wa ,j) H(Wa ,j) pbR WEi H(Wb,j2) sb(TEi Wa) 0 0 AND W H(W ,j) paTG Wi WGi WEi 4 calls of H G a for garbling Second half gate end if { 0 } 1 0 AND TE H(Wb ,j0) H(Wb ,j0) Wa for i Outputs(Fˆ) do 0 0 0 2 WE H(Wb ,j0) pb(TE Wa ) Yi Wi Combine halves Yˆ { 0 0 }0 return W WG WE 0 2 table entries per AND! return (W ,TG,TE ) 23 Figure 2: Our complete garbling scheme. NextIndex is a stateful procedure that simply increments an internal counter.

Following the description in Section 3.1, we garble each gate using a composition of two half-gates. Conceptually, b b WGi and WEi denote the output wire labels for these two half-gates (generator-side and evaluator-side, respectively) that comprise the ith gate. The final logical output wire label for the ith gate is then set to be W 0 = W 0 W 0 . i Gi Ei Similarly, we use TGi and TEi to denote the single garbled row transmitted for each half gate used in the ith gate. The first rows of Figure 1 show the function being computed by each half gate. In (a), generator knows pb while in (b) the evaluator knows v p = lsb W . The second rows show the two ciphertexts of each half-gate, before they b b b are permuted according to their select bits (in case of (a)) and before garbled row reduction (GRR) is applied. Here, we have expanded W f(x,pb) to W 0 f(x, p )R to make the row reduction clearer in the next step. The third rows Gc Gc b show the final result.

The complete scheme. The full garbling procedure for an entire circuit is shown in Figure 2. For simplicity of discussion and proof, the we assume all gates are either AND or XOR.

4 Security

We now prove the security of our scheme, using the prv.sim and obv.sim security definitions of Bellare, Hoang, and S Rogaway [5]. The scheme shown in Figure 2 does not provideS authenticity, simply because authenticity is not required in many use cases including semi-honest Yao’s circuits. However, there are well-known, standard modifications to the decoding procedure that can add authenticity, which we describe separately in Section 4.3. Finally, since we only consider circuits with just AND and XOR gates, everything about the function f is public and we do not define a separate function (f) to extract public information about f.

7 Part 2: Efficient OTs

http://encrypto.de/code/OTExtension

G. Asharov, Y. Lindell, T. Schneider, M. Zohner: More efficient oblivious transfer and extensions for faster secure computation. In ACM CCS’13.

24 OT - Bad News

- [ImpagliazzoRudich89]: there’s no black-box reduction from OT to OWFs

25 OT - Bad News

- [ImpagliazzoRudich89]: there’s no black-box reduction from OT to OWFs

- Several OT protocols based on public-key cryptography - e.g., [NaorPinkas01] yields ~1,000 OTs per second

25 OT - Bad News

- [ImpagliazzoRudich89]: there’s no black-box reduction from OT to OWFs

- Several OT protocols based on public-key cryptography - e.g., [NaorPinkas01] yields ~1,000 OTs per second

- Since public-key crypto is expensive, OT was believed to be inefficient

25 OT - Good News

- [Beaver95]: OTs can be precomputed (only OTP in online phase)

26 OT - Good News

- [Beaver95]: OTs can be precomputed (only OTP in online phase)

- OT Extensions (similar to hybrid encryption): use symmetric crypto to stretch few “real” OTs into longer/many OTs

k-bit

k OTs “real” OTs

26 OT - Good News

- [Beaver95]: OTs can be precomputed (only OTP in online phase)

- OT Extensions (similar to hybrid encryption): use symmetric crypto to stretch few “real” OTs into longer/many OTs - [Beaver96]: OT on long strings from short seeds

l-bit k-bit

[Beaver96] k OTs “real” OTs

26 OT - Good News

- [Beaver95]: OTs can be precomputed (only OTP in online phase)

- OT Extensions (similar to hybrid encryption): use symmetric crypto to stretch few “real” OTs into longer/many OTs - [Beaver96]: OT on long strings from short seeds - [IshaiKilianNissimPetrank03]: many OTs from few OTs

l-bit k-bit

[Beaver96] k OTs “real” OTs

m OTs

[IKNP03]

26 OT Extension of [IKNP03] (1)

- Alice inputs m pairs of ℓ-bit strings (xi,0 , xi,1)

- Bob inputs m-bit string r and obtains xi,ri in i-th OT

27 OT Extension of [IKNP03] (2)

- Alice and Bob perform k “real” OTs on random seeds with reverse roles (k: security parameter)

28 OT Extension of [IKNP03] (3)

- Bob generates a random m x k bit matrix T and masks his choices r

- The matrix is masked with the stretched seeds of the “real” OTs

PRG: pseudo-random generator (instantiated with AES)

29 OT Extension of [IKNP03] (4)

- Transpose matrices V and T

- Alice masks her inputs and obliviously sends them to Bob

H: correlation robust function (instantiated with hash function)

30 Computation Complexity of OT Extension

31 Computation Complexity of OT Extension

Per OT:

1 # PRG evaluations 2

2 # H evaluations 1

31 Computation Complexity of OT Extension

Per OT:

1 # PRG evaluations 2

2 # H evaluations 1

Time distribution for 10 Million OTs (in 21s): 2.1 microseconds per OT 1% 10% "real" OTs 33% H (SHA-1) PRG (AES) 42% Transpose 14% Misc (Snd/Rcv/XOR)

31 Computation Complexity of OT Extension

Per OT:

1 # PRG evaluations 2

2 # H evaluations 1

Time distribution for 10 Million OTs (in 21s): 2.1 microseconds per OT 1% 10% "real" OTs 33% H (SHA-1) PRG (AES) 42% Transpose 14% Misc (Snd/Rcv/XOR)

Non-crypto part was bottleneck!!!

31 Algorithmic Optimization: Efficient Matrix Transposition

- Naive matrix transposition performs mk load/process/store operations

32 Algorithmic Optimization: Efficient Matrix Transposition

- Naive matrix transposition performs mk load/process/store operations

- Eklundh's algorithm reduces number of operations to O(m log2 k) swaps - Swap whole registers instead of bits - Transposing 10 times faster

32 Algorithmic Optimization: Parallelization

- OT extension can easily be parallelized by splitting the T matrix into sub-matrices

- Since columns are independent, OT is highly parallelizable

33 Communication Complexity of OT Extension

Per OT:

2ℓ Bits sent 2k

34 Communication Complexity of OT Extension

Per OT:

2ℓ Bits sent 2k

Yao: ℓ = k = 80 GMW: ℓ = 1, k = 80

Alice

Alice Bob

Bob

34 Protocol Optimization: General OT Extension

- Instead of generating a random T matrix, we derive it from sj,0

- Reduces data sent by Bob by factor 2

35 Specific OT Functionalities

- Secure computation protocols often require a specific OT functionality

36 Specific OT Functionalities

- Secure computation protocols often require a specific OT functionality - Yao with free XORs requires strings x0, x1 to be XOR-correlated

- Correlated OT: random x0 and x1 = x0 ⊕ x

Correlated OT

e.g., for Yao

36 Specific OT Functionalities

- Secure computation protocols often require a specific OT functionality - Yao with free XORs requires strings x0, x1 to be XOR-correlated - GMW with multiplication triples can use random strings

- Correlated OT: random x0 and x1 = x0 ⊕ x - Random OT: random x0 and x1

Correlated OT Random OT

e.g., for Yao e.g., for GMW

36 Specific OT Functionalities: Correlated OT (C-OT)

- Choose xi,0 as random output of H (modeled as RO here)

- Compute xi,1 as xi,0 ⊕ xi to obliviously transfer XOR-correlated values

- Reduces data sent by Alice by factor 2

37 Specific OT Functionalities: Random OT (R-OT)

- Choose xi,0 and xi,1 as random outputs of H (modeled as RO here)

- No data sent by Alice

38 Performance Evaluation Performance - Performance for Million 10 - Performance OT of [SZ13] implementing - C++ implementation of [IKNP03] extension Runtime in s 10 20 30 40 0 20.6 Orig 30.7 14.4 EMT 30.5 Gigabit LAN Gigabit 13.9 OTs strings 80-bit on G-OT : Original Implementation 29.4 10.6 C-OT 14.4 10.0 WiFi 802.11g R-OT 14.2 5.0 2T 14.2 2.6 4T 14.2

39 Performance Evaluation Performance - Only decreases runtime in LAN where computation is computation where LAN in runtime - Only decreases - Efficient computation – improves matrix transposition Runtime in s 10 20 30 40 0 20.6 Orig 30.7 14.4 EMT 30.5 Gigabit LAN Gigabit 13.9 G-OT : Efficient TranspositionMatrix 29.4 10.6 C-OT 14.4 10.0 WiFi 802.11g R-OT 14.2

the the bottleneck 5.0 2T 14.2 2.6 4T 14.2 40 Performance Evaluation Performance - Runtimes only slightly faster (bottleneck: communication communication faster (bottleneck: slightly only - Runtimes → Bob Alice - Generate T → Bob communication – improves matrix from seeds Alice Runtime in s 10 20 30 40 0 20.6 Orig 30.7 14.4 EMT 30.5 Gigabit LAN Gigabit 13.9 G-OT : General OT 29.4 10.6 C-OT 14.4 10.0 WiFi 802.11g R-OT 14.2 5.0 2T 14.2 2.6 4T ) 14.2

41 Performance Evaluation: OTCorrelated/Random Performance - WiFi runtime faster by factor 2 (bottleneck: communication Bob → Bob - faster communication WiFi by factor runtime 2 (bottleneck: OT- Correlated/Random communication – improved → Bob Alice Runtime in s 10 20 30 40 0 20.6 Orig 30.7 14.4 EMT 30.5 Gigabit LAN Gigabit 13.9 G-OT 29.4 10.6 C-OT 14.4 10.0 WiFi 802.11g R-OT 14.2 5.0 2T 14.2

2.6 Alice) 4T 14.2 42 Performance Evaluation: Parallelization

Gigabit LAN WiFi 802.11g 40 30.7 30.5 29.4 30 20.6 20 14.4 13.9 14.4 14.2 14.2 14.2 10.6 10.0

Runtimein s 10 5.0 2.6 0 Orig EMT G-OT C-OT R-OT 2T 4T

- Parallel OT extension with 2 and 4 threads – improved computation

- LAN runtime decreases linear in # of threads

- WiFi runtime remains the same (bottleneck: communication)

43 Performance Evaluation: Summary

Gigabit LAN WiFi 802.11g 40 30.7 30.5 29.4 30 20.6 20 14.4 13.9 14.4 14.2 14.2 14.2 10.6 10.0

Runtimein s 10 5.0 2.6 0 Orig EMT G-OT C-OT R-OT 2T 4T Performance for 10 Mio. OTs on 80-bit strings

- OT is very efficient

- Communication is the bottleneck for OT (even without using AES-NI)

44 Part 3: Efficient Circuits and Yao vs. GMW

T. Schneider, M. Zohner: GMW vs. Yao? Efficient secure two-party computation with low depth circuits. In FC’13.

45 Yao - the Apple

How to eat an apple?

46 Yao - the Apple

How to eat an apple? bite-by-bite

46 Yao - the Apple

How to eat an apple? bite-by-bite

+ Yao has constant #rounds

46 Yao - the Apple

How to eat an apple? bite-by-bite

+ Yao has constant #rounds - Evaluating a garbled gate requires symmetric crypto in the online phase

46 GMW - the Orange

How to eat an orange?

47 GMW - the Orange

How to eat an orange? 1) peel (almost all the effort)

47 GMW - the Orange

How to eat an orange? 1) peel (almost all the effort)

2) eat (easy)

47 GMW - the Orange

How to eat an orange? 1) peel (almost all the effort) Setup phase: - precompute multiplication triples for each AND gate using 2 R-OTs and constant #rounds + no need to know function, only max. #ANDs

2) eat (easy)

47 GMW - the Orange

How to eat an orange? 1) peel (almost all the effort) Setup phase: - precompute multiplication triples for each AND gate using 2 R-OTs and constant #rounds + no need to know function, only max. #ANDs

2) eat (easy) Online phase: + evaluating circuit needs OTP operations only - communication per layer of AND gates

47 Yao vs. GMW

Yao GMW

t: symmetric security parameter 48 Yao vs. GMW

Yao GMW

Free XOR

t: symmetric security parameter 48 Yao vs. GMW

Yao GMW

Free XOR

S: 4, R: 2 (online) symmetric crypto per AND setup: S: 6, R: 6

t: symmetric security parameter 48 Yao vs. GMW

Yao GMW

Free XOR

S: 4, R: 2 (online) symmetric crypto per AND setup: S: 6, R: 6

setup: S→R:t || R→S:t S→R: 2t communication [bit] per AND online: S→R:2 || R→S:2

t: symmetric security parameter 48 Yao vs. GMW

Yao GMW

Free XOR

S: 4, R: 2 (online) symmetric crypto per AND setup: S: 6, R: 6

setup: S→R:t || R→S:t S→R: 2t communication [bit] per AND online: S→R:2 || R→S:2

setup: O(1) O(1) rounds online: O(ANDdepth(f))

t: symmetric security parameter 48 Yao vs. GMW

Yao GMW

Free XOR

S: 4, R: 2 (online) symmetric crypto per AND setup: S: 6, R: 6

setup: S→R:t || R→S:t S→R: 2t communication [bit] per AND online: S→R:2 || R→S:2

setup: O(1) O(1) rounds online: O(ANDdepth(f))

t memory per wire [bit] 1

t: symmetric security parameter 48 Efficient Circuit Constructions for Secure Computation

Classical circuit design: - few gates (⇒ small chip area)

- low depth (⇒ high clock frequency)

49 Efficient Circuit Constructions for Secure Computation

Classical circuit design: - few gates (⇒ small chip area)

- low depth (⇒ high clock frequency)

Circuits for secure computation: - low ANDsize (#non-XORs ⇒ communication and symmetric crypto)

- low ANDdepth (#rounds in GMW’s online phase)

49 Chapter 3 Circuit Optimizations and Constructions

Table 3.2: Size: Ecient Circuit Constructions (for n unsigned `-bit values) Circuit Standard Free XOR #gates 2-input 3-input 2-input non-XOR ADD, SUB ( 3.3.1) 2 2` 2 ` ADDSUB (§3.3.1.3) ` +1 2` ` § MUL (Textbook) ( 3.3.2.1) `2 +2` 2 2`2 4` +2 2`2 ` MUL (Karatsuba) (§3.3.2.2) 9`1.6 13` 34` § ⇡ CMP ( 3.3.3.1) 1 ` 1 ` MUX (§3.3.3.2) 0 ` ` MIN, MAX (§3.3.3.3) n 1 2`(n 1) + 2 2`(n 1) + n +1 §

The first 1-bit has constant input c1 = 0 and can be replaced by a smaller half- adder with two inputs. Each 1-bit adder has as inputs the carry-in bit ci from the previous 1-bit adder and the two input bits xi,yi. The outputs are the carry-out bit c =(x y ) (x c ) (y c )=(x ,y,c)[00010111] and the sum bit i+1 i ^ i _ i ^ i _ i ^ i i i i si = xi yi ci =(xi,yi,ci)[01101001]. All occurring gates are even and can be optimized to a small number of XOR gates. An equivalent construction for computing Exampleci+1 with the Circuit: same number Addition of non-XOR gates was given in [BPP00, BDP00].

x` y` x2 y2 x1 y1 Ripple-Carry-Adder ADD c3 c2 + ... + + 0

s`+1 s` s2 s1

si = xi ⊕ yi ⊕ ci Figure 3.3: Circuit: Addition (ADD) ci+1 = ((xi ⊕ yi) ∧ (xi ⊕ ci)) ⊕ xi [BoyarPeraltaPochuev00] ANDsize = ℓ, ANDdepth = ℓ

3.3.1.2 Subtraction (SUB) Subtraction in two’s complement representation is defined as x y = x+ y+1. Hence, asubtractioncircuit(SUB) can be constructed analogously to the addition¬ circuit from 1-bit subtractors ( ) as shown in Fig. 3.4. Each 1-bit subtractor computes the carry- out bit ci+1 =(xi yi) (xi ci) ( yi ci)=(xi,yi,ci))[01001101] and the di↵erence bit d = x y ^c¬ =(_x ,y^,c)[10010110]._ ¬ ^ The size of SUB is equal to that of ADD. i i ¬ i i i i i

3.3.1.3 Controlled Addition/Subtraction (ADDSUB) 50 A controlled addition/subtraction circuit (ADDSUB)whichcanaddorsubtracttwo unsigned `-bit values x and y depending on a control input bit ctrl can be naturally constructed as combination of ADD, SUB, and controlled inversion (CNOT) which is

40 Chapter 3 Circuit Optimizations and Constructions

Table 3.2: Size: Ecient Circuit Constructions (for n unsigned `-bit values) Circuit Standard Free XOR #gates 2-input 3-input 2-input non-XOR ADD, SUB ( 3.3.1) 2 2` 2 ` ADDSUB (§3.3.1.3) ` +1 2` ` § MUL (Textbook) ( 3.3.2.1) `2 +2` 2 2`2 4` +2 2`2 ` MUL (Karatsuba) (§3.3.2.2) 9`1.6 13` 34` § ⇡ CMP ( 3.3.3.1) 1 ` 1 ` MUX (§3.3.3.2) 0 ` ` MIN, MAX (§3.3.3.3) n 1 2`(n 1) + 2 2`(n 1) + n +1 §

The first 1-bit adder has constant input c1 = 0 and can be replaced by a smaller half- adder with two inputs. Each 1-bit adder has as inputs the carry-in bit ci from the previous 1-bit adder and the two input bits xi,yi. The outputs are the carry-out bit c =(x y ) (x c ) (y c )=(x ,y,c)[00010111] and the sum bit i+1 i ^ i _ i ^ i _ i ^ i i i i si = xi yi ci =(xi,yi,ci)[01101001]. All occurring gates are even and can be optimized to a small number of XOR gates. An equivalent construction for computing Exampleci+1 with the Circuit: same number Addition of non-XOR gates was given in [BPP00, BDP00].

x` y` x2 y2 x1 y1 Ripple-Carry-Adder ADD c3 c2 + ... + + 0

s`+1 s` s2 s1

si = xi ⊕ yi ⊕ ci Figure 3.3: Circuit: Addition (ADD) ci+1 = ((xi ⊕ yi) ∧ (xi ⊕ ci)) ⊕ xi [BoyarPeraltaPochuev00] ANDsize = ℓ, ANDdepth = ℓ

x4 y4 x3 y3 x2 y2 x1 y1 Ladner-Fischer-Adder3.3.1.2 Subtraction [LF80] (SUB) p4,0 p3,0 p2,0 p1,0 pi,0=xi⊕yi, ci,0=xi∧yi c 4,0 c 3,0 c 2,0 c 1,0 Subtraction in two’s complement representation is defined as x y = x+ y+1. Hence, ¬ asubtractioncircuit(SUB) can bep4,1 constructedp2,1 analogously to the addition circuit from c 4,1 c 2,1 1-bit subtractors ( ) as shown in Fig. 3.4. Each 1-bit subtractor computes the carry- pi,j=pi,j-1∧pk,j-1 p4,2 p3,2 out bit ci+1 =(xi yi) (xi ci) ( yi ci)=(xi,yi,ci))[01001101]ci,j=(pi,j-1∧ck,j-1 and)∨c thei,j-1 di↵erence c 4,2 c 3,2 bit d = x y ^c¬ =(_x ,y^,c)[10010110]._ ¬ ^ The size of SUB is equal to that of ADD. i i ¬ i i i i i

s5 s4 s3 s2 s1

3.3.1.3 ControlledANDsize Addition/Subtraction = ℓ+1.25 ℓ log2(ℓ), (ADDSUB ANDdepth) = 1+2 log2(ℓ) 50 A controlled addition/subtraction circuit (ADDSUB)whichcanaddorsubtracttwo unsigned `-bit values x and y depending on a control input bit ctrl can be naturally constructed as combination of ADD, SUB, and controlled inversion (CNOT) which is

40 290 T. Schneider and M. Zohner

ASummaryofCircuitBuildingBlocks Example Circuits Summarized in [SchneiderZohner13] Table 7. Size and Depth of Circuit Constructions (dH :Hammingweight)

Circuit Size S Depth D Addition ℓ Ripple-carry ADD/SUBRC ℓ ℓ ℓ Ladner-Fischer ADDLF 1.25ℓ log2 ℓ + ℓ 2 log2 ℓ +1 ℓ ⌈ ⌉ ⌈ ⌉ LF subtraction SUBLF 1.25ℓ log2 ℓ +2ℓ 2 log2 ℓ +2 (ℓ,3) ⌈ ⌉ ℓ ⌈ ⌉ℓ Carry-save ADDCSA ℓ + S(ADD ) D(ADD )+1 RC network ADD(ℓ,n) ℓn ℓ + n log n 1 log n 1 + ℓ RC − −⌈ 2 ⌉− ⌈ 2 − ⌉ (ℓ,n) ℓn 2ℓ + n log2 n log2 n 1 CSA network ADDCSA − ℓ+−⌈log n ⌉ ⌈ ℓ+−log⌉ n ⌈ 2 ⌉ ⌈ 2 ⌉ +S(ADDLF ) +D(ADDLF ) Multiplication RCN school method MULℓ 2ℓ2 ℓ 2ℓ 1 RC − − CSN school method MULℓ 2ℓ2 +1.25ℓ log ℓ ℓ +2 3 log ℓ +4 CSN ⌈ 2 ⌉− ⌈ 2 ⌉ RC squaring SQRℓ ℓ2 ℓ 2ℓ 3 RC − − LF squaring SQRℓ ℓ2 +1.25ℓ log ℓ 1.5ℓ 2 3 log ℓ +3 LF ⌈ 2 ⌉− − ⌈ 2 ⌉ Comparison ℓ Equality EQ ℓ 1 log2 ℓ ℓ − ⌈ ⌉ Sequential greater than GTS ℓ ℓ D&C greater than GTℓ 3ℓ log ℓ 2 log ℓ +1 DC −⌈ 2 ⌉− ⌈ 2 ⌉ Selection Multiplexer MUXℓ ℓ 1 Minimum MIN(ℓ,n) (n 1)(S(GTℓ)+ℓ) log n (D(GTℓ)+1) − ⌈ 2 ⌉ Minimum indexCan MIN( ℓtrade-off,n) (n larger1)(S(GT sizeℓ)+ℓ +forlog bettern ) depth.log n (D(GTℓ)+1) IDX − ⌈ 2 ⌉ ⌈ 2 ⌉ 51 Set Operations Set union ℓ ℓ 1 ∪ Set intersection ℓ ℓ 1 ∩ Set inclusion ℓ 2ℓ 1 log ℓ +1 ⊆ − ⌈ 2 ⌉ Count ℓ Full Adder count CNTFA 2ℓ log2 ℓ 2 log2 ℓ ℓ −⌈ ⌉− ⌈ ⌉ Boyar-Peralta count CNT ℓ dH (ℓ) log ℓ BP − ⌊ 2 ⌋ Distances ℓ ℓ (ℓ,3) ℓ (ℓ,3) Manhattan distance DSTM 2S(SUB )+S(ADD )+1 D(SUB )+D(ADD )+1 2S(SUBℓ)+2S(SQRℓ) D(SUBℓ) Euclidean distance DSTℓ E +S(ADD(2ℓ,4))+2S(MUXℓ) +D(SQRℓ)+3

BDepthEfficient Distance Circuits B.1 Manhattan Distance ℓ ℓ ℓ The Manhattan distance DSTM between two points p1 =(x1,y1)andp2 = ℓ ℓ (x2,y2) is the distance in a two dimensional space allowing only horizontal and ℓ ℓ ℓ ℓ vertical moves and is computed as x1 x2 + y1 y2 . [5] give such a circuit ℓ ℓ | − | | − ℓ| DSTM,C with size S(DSTM,C)=9ℓ and depth D(DSTM,C)=2ℓ +2.Theyuse ℓ ℓ ℓ 4multiplexercircuitsMUX (cf. [18]), 2 GTS circuits ( 3.3), 2 SUBRC circuits (cf. [18]), and one ADDℓ circuit ( 3.1). § RC § 290 T. Schneider and M. Zohner

ASummaryofCircuitBuildingBlocks Example Circuits Summarized in [SchneiderZohner13] Table 7. Size and Depth of Circuit Constructions (dH :Hammingweight)

Circuit Size S Depth D Addition ℓ Ripple-carry ADD/SUBRC ℓ ℓ ℓ Ladner-Fischer ADDLF 1.25ℓ log2 ℓ + ℓ 2 log2 ℓ +1 ℓ ⌈ ⌉ ⌈ ⌉ LF subtraction SUBLF 1.25ℓ log2 ℓ +2ℓ 2 log2 ℓ +2 (ℓ,3) ⌈ ⌉ ℓ ⌈ ⌉ℓ Carry-save ADDCSA ℓ + S(ADD ) D(ADD )+1 RC network ADD(ℓ,n) ℓn ℓ + n log n 1 log n 1 + ℓ RC − −⌈ 2 ⌉− ⌈ 2 − ⌉ (ℓ,n) ℓn 2ℓ + n log2 n log2 n 1 CSA network ADDCSA − ℓ+−⌈log n ⌉ ⌈ ℓ+−log⌉ n ⌈ 2 ⌉ ⌈ 2 ⌉ +S(ADDLF ) +D(ADDLF ) Multiplication RCN school method MULℓ 2ℓ2 ℓ 2ℓ 1 RC − − CSN school method MULℓ 2ℓ2 +1.25ℓ log ℓ ℓ +2 3 log ℓ +4 CSN ⌈ 2 ⌉− ⌈ 2 ⌉ RC squaring SQRℓ ℓ2 ℓ 2ℓ 3 RC − − LF squaring SQRℓ ℓ2 +1.25ℓ log ℓ 1.5ℓ 2 3 log ℓ +3 LF ⌈ 2 ⌉− − ⌈ 2 ⌉ Comparison ℓ Equality EQ ℓ 1 log2 ℓ ℓ − ⌈ ⌉ Sequential greater than GTS ℓ ℓ D&C greater than GTℓ 3ℓ log ℓ 2 log ℓ +1 DC −⌈ 2 ⌉− ⌈ 2 ⌉ Selection Multiplexer MUXℓ ℓ 1 Minimum MIN(ℓ,n) (n 1)(S(GTℓ)+ℓ) log n (D(GTℓ)+1) − ⌈ 2 ⌉ Minimum indexCan MIN( ℓtrade-off,n) (n larger1)(S(GT sizeℓ)+ℓ +forlog bettern ) depth.log n (D(GTℓ)+1) IDX − ⌈ 2 ⌉ ⌈ 2 ⌉ 51 Set Operations Set union ℓ ℓ 1 ∪ Set intersection ℓ ℓ 1 ∩ Set inclusion ℓ 2ℓ 1 log ℓ +1 ⊆ − ⌈ 2 ⌉ Count ℓ Full Adder count CNTFA 2ℓ log2 ℓ 2 log2 ℓ ℓ −⌈ ⌉− ⌈ ⌉ Boyar-Peralta count CNT ℓ dH (ℓ) log ℓ BP − ⌊ 2 ⌋ Distances ℓ ℓ (ℓ,3) ℓ (ℓ,3) Manhattan distance DSTM 2S(SUB )+S(ADD )+1 D(SUB )+D(ADD )+1 2S(SUBℓ)+2S(SQRℓ) D(SUBℓ) Euclidean distance DSTℓ E +S(ADD(2ℓ,4))+2S(MUXℓ) +D(SQRℓ)+3

BDepthEfficient Distance Circuits B.1 Manhattan Distance ℓ ℓ ℓ The Manhattan distance DSTM between two points p1 =(x1,y1)andp2 = ℓ ℓ (x2,y2) is the distance in a two dimensional space allowing only horizontal and ℓ ℓ ℓ ℓ vertical moves and is computed as x1 x2 + y1 y2 . [5] give such a circuit ℓ ℓ | − | | − ℓ| DSTM,C with size S(DSTM,C)=9ℓ and depth D(DSTM,C)=2ℓ +2.Theyuse ℓ ℓ ℓ 4multiplexercircuitsMUX (cf. [18]), 2 GTS circuits ( 3.3), 2 SUBRC circuits (cf. [18]), and one ADDℓ circuit ( 3.1). § RC § 290 T. Schneider and M. Zohner

ASummaryofCircuitBuildingBlocks Example Circuits Summarized in [SchneiderZohner13] Table 7. Size and Depth of Circuit Constructions (dH :Hammingweight)

Circuit Size S Depth D Addition ℓ Ripple-carry ADD/SUBRC ℓ ℓ ℓ Ladner-Fischer ADDLF 1.25ℓ log2 ℓ + ℓ 2 log2 ℓ +1 ℓ ⌈ ⌉ ⌈ ⌉ LF subtraction SUBLF 1.25ℓ log2 ℓ +2ℓ 2 log2 ℓ +2 (ℓ,3) ⌈ ⌉ ℓ ⌈ ⌉ℓ Carry-save ADDCSA ℓ + S(ADD ) D(ADD )+1 RC network ADD(ℓ,n) ℓn ℓ + n log n 1 log n 1 + ℓ RC − −⌈ 2 ⌉− ⌈ 2 − ⌉ (ℓ,n) ℓn 2ℓ + n log2 n log2 n 1 CSA network ADDCSA − ℓ+−⌈log n ⌉ ⌈ ℓ+−log⌉ n ⌈ 2 ⌉ ⌈ 2 ⌉ +S(ADDLF ) +D(ADDLF ) Multiplication RCN school method MULℓ 2ℓ2 ℓ 2ℓ 1 RC − − CSN school method MULℓ 2ℓ2 +1.25ℓ log ℓ ℓ +2 3 log ℓ +4 CSN ⌈ 2 ⌉− ⌈ 2 ⌉ RC squaring SQRℓ ℓ2 ℓ 2ℓ 3 RC − − LF squaring SQRℓ ℓ2 +1.25ℓ log ℓ 1.5ℓ 2 3 log ℓ +3 LF ⌈ 2 ⌉− − ⌈ 2 ⌉ Comparison ℓ Equality EQ ℓ 1 log2 ℓ ℓ − ⌈ ⌉ Sequential greater than GTS ℓ ℓ D&C greater than GTℓ 3ℓ log ℓ 2 log ℓ +1 DC −⌈ 2 ⌉− ⌈ 2 ⌉ Selection Multiplexer MUXℓ ℓ 1 Minimum MIN(ℓ,n) (n 1)(S(GTℓ)+ℓ) log n (D(GTℓ)+1) − ⌈ 2 ⌉ Minimum indexCan MIN( ℓtrade-off,n) (n larger1)(S(GT sizeℓ)+ℓ +forlog bettern ) depth.log n (D(GTℓ)+1) IDX − ⌈ 2 ⌉ ⌈ 2 ⌉ 51 Set Operations Set union ℓ ℓ 1 ∪ Set intersection ℓ ℓ 1 ∩ Set inclusion ℓ 2ℓ 1 log ℓ +1 ⊆ − ⌈ 2 ⌉ Count ℓ Full Adder count CNTFA 2ℓ log2 ℓ 2 log2 ℓ ℓ −⌈ ⌉− ⌈ ⌉ Boyar-Peralta count CNT ℓ dH (ℓ) log ℓ BP − ⌊ 2 ⌋ Distances ℓ ℓ (ℓ,3) ℓ (ℓ,3) Manhattan distance DSTM 2S(SUB )+S(ADD )+1 D(SUB )+D(ADD )+1 2S(SUBℓ)+2S(SQRℓ) D(SUBℓ) Euclidean distance DSTℓ E +S(ADD(2ℓ,4))+2S(MUXℓ) +D(SQRℓ)+3

BDepthEfficient Distance Circuits B.1 Manhattan Distance ℓ ℓ ℓ The Manhattan distance DSTM between two points p1 =(x1,y1)andp2 = ℓ ℓ (x2,y2) is the distance in a two dimensional space allowing only horizontal and ℓ ℓ ℓ ℓ vertical moves and is computed as x1 x2 + y1 y2 . [5] give such a circuit ℓ ℓ | − | | − ℓ| DSTM,C with size S(DSTM,C)=9ℓ and depth D(DSTM,C)=2ℓ +2.Theyuse ℓ ℓ ℓ 4multiplexercircuitsMUX (cf. [18]), 2 GTS circuits ( 3.3), 2 SUBRC circuits (cf. [18]), and one ADDℓ circuit ( 3.1). § RC § 290 T. Schneider and M. Zohner

ASummaryofCircuitBuildingBlocks Example Circuits Summarized in [SchneiderZohner13] Table 7. Size and Depth of Circuit Constructions (dH :Hammingweight)

Circuit Size S Depth D Addition ℓ Ripple-carry ADD/SUBRC ℓ ℓ ℓ Ladner-Fischer ADDLF 1.25ℓ log2 ℓ + ℓ 2 log2 ℓ +1 ℓ ⌈ ⌉ ⌈ ⌉ LF subtraction SUBLF 1.25ℓ log2 ℓ +2ℓ 2 log2 ℓ +2 (ℓ,3) ⌈ ⌉ ℓ ⌈ ⌉ℓ Carry-save ADDCSA ℓ + S(ADD ) D(ADD )+1 RC network ADD(ℓ,n) ℓn ℓ + n log n 1 log n 1 + ℓ RC − −⌈ 2 ⌉− ⌈ 2 − ⌉ (ℓ,n) ℓn 2ℓ + n log2 n log2 n 1 CSA network ADDCSA − ℓ+−⌈log n ⌉ ⌈ ℓ+−log⌉ n ⌈ 2 ⌉ ⌈ 2 ⌉ +S(ADDLF ) +D(ADDLF ) Multiplication RCN school method MULℓ 2ℓ2 ℓ 2ℓ 1 RC − − CSN school method MULℓ 2ℓ2 +1.25ℓ log ℓ ℓ +2 3 log ℓ +4 CSN ⌈ 2 ⌉− ⌈ 2 ⌉ RC squaring SQRℓ ℓ2 ℓ 2ℓ 3 RC − − LF squaring SQRℓ ℓ2 +1.25ℓ log ℓ 1.5ℓ 2 3 log ℓ +3 LF ⌈ 2 ⌉− − ⌈ 2 ⌉ Comparison ℓ Equality EQ ℓ 1 log2 ℓ ℓ − ⌈ ⌉ Sequential greater than GTS ℓ ℓ D&C greater than GTℓ 3ℓ log ℓ 2 log ℓ +1 DC −⌈ 2 ⌉− ⌈ 2 ⌉ Selection Multiplexer MUXℓ ℓ 1 Minimum MIN(ℓ,n) (n 1)(S(GTℓ)+ℓ) log n (D(GTℓ)+1) − ⌈ 2 ⌉ Minimum indexCan MIN( ℓtrade-off,n) (n larger1)(S(GT sizeℓ)+ℓ +forlog bettern ) depth.log n (D(GTℓ)+1) IDX − ⌈ 2 ⌉ ⌈ 2 ⌉ 51 Set Operations Set union ℓ ℓ 1 ∪ Set intersection ℓ ℓ 1 ∩ Set inclusion ℓ 2ℓ 1 log ℓ +1 ⊆ − ⌈ 2 ⌉ Count ℓ Full Adder count CNTFA 2ℓ log2 ℓ 2 log2 ℓ ℓ −⌈ ⌉− ⌈ ⌉ Boyar-Peralta count CNT ℓ dH (ℓ) log ℓ BP − ⌊ 2 ⌋ Distances ℓ ℓ (ℓ,3) ℓ (ℓ,3) Manhattan distance DSTM 2S(SUB )+S(ADD )+1 D(SUB )+D(ADD )+1 2S(SUBℓ)+2S(SQRℓ) D(SUBℓ) Euclidean distance DSTℓ E +S(ADD(2ℓ,4))+2S(MUXℓ) +D(SQRℓ)+3

BDepthEfficient Distance Circuits B.1 Manhattan Distance ℓ ℓ ℓ The Manhattan distance DSTM between two points p1 =(x1,y1)andp2 = ℓ ℓ (x2,y2) is the distance in a two dimensional space allowing only horizontal and ℓ ℓ ℓ ℓ vertical moves and is computed as x1 x2 + y1 y2 . [5] give such a circuit ℓ ℓ | − | | − ℓ| DSTM,C with size S(DSTM,C)=9ℓ and depth D(DSTM,C)=2ℓ +2.Theyuse ℓ ℓ ℓ 4multiplexercircuitsMUX (cf. [18]), 2 GTS circuits ( 3.3), 2 SUBRC circuits (cf. [18]), and one ADDℓ circuit ( 3.1). § RC § Summary

52 Summary

Part 1: In Garbled Circuits, each non-XOR gate: - small # of fixed-key AES evaluations - send 2 ciphertexts

52 Summary

Part 1: In Garbled Circuits, each non-XOR gate: - small # of fixed-key AES evaluations - send 2 ciphertexts

Part 2: OT extension - send 1 ciphertext + |payload| - communication is essentially the bottleneck

52 Summary

Part 1: In Garbled Circuits, each non-XOR gate: - small # of fixed-key AES evaluations - send 2 ciphertexts

Part 2: OT extension - send 1 ciphertext + |payload| - communication is essentially the bottleneck

Part 3: Circuits and Yao vs. GMW - can trade-off size for depth - Yao has constant #rounds ⇒ good for networks with high latency (Internet) - GMW can precompute all crypto, good for low-latency networks (LAN)

52 Summary

Part 1: In Garbled Circuits, each non-XOR gate: - small # of fixed-key AES evaluations - send 2 ciphertexts

Part 2: OT extension - send 1 ciphertext + |payload| - communication is essentially the bottleneck

Part 3: Circuits and Yao vs. GMW - can trade-off size for depth - Yao has constant #rounds ⇒ good for networks with high latency (Internet) - GMW can precompute all crypto, good for low-latency networks (LAN)

Symmetric crypto is so efficient that communication is the bottleneck. 52 Thanks for your attention!

Questions?

Contact: http://encrypto.de