7: Password-Hashing

I Passwords, Passphrases, “Personal Identification Numbers” (PINs) are needed all the

user name: Max passwort: ********

I Can be part of a “two-factor-authentication” (e.g., chipcard + PIN) I Adversaries trying to “guess” them: I How probable is a ? I How to attack unknwon pwd by making X attempts to “guess” them I Which passwords are the X most probable ones? I How to choose pwd to thwart such attacks? I . . . while still being able to remember pwd? Our goal: the usage of passwords as secure as possible.

–130– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash Preliminaries . . . speaking words of wisdom, let it be, let it be, . . . (The Beatles)

I QUESTION: Which of the following passwords is OK? “53cur3 pa55w0rd”, “ci90n38!P0??3ah9vhv” or “123456”?

user name: Max passwort: ********

I ANSWER: None! (Not after being shown on my slides . . . ) I Passwords must be unpredictable ... I BUT this is actually a property of password-generation and password-handling (random choices, not published on slides, . . . ) I information-theory: a “password-source” must have high entropy! I very informally: k-bit of entropy ≥ 2k−1 “guesses” for the attacker

–131– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash Wisdom # 1: Choose “high-entropy” passwords! . . . this is harder than it looks

–132– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash Wisdom # 2: Do not allow Offline Attacks! . . . whenever you can avoid them

user name: Max passwort: ********

Online: Offline: I access to server I function F I adversary sends pwd-”guess” I adversary can compute F I server accepts or rejects without the server’s aid pwd-”guess” I F(pwd) = 1 ⇔ pwd accepted X: as large as the server allows X: as large as the adversary can and can handle afford

–133– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash Wisdom #3: Passwords in the Clear are bad! . . . worse than leaving the key to your flat under the door-mat

name pwd Anakin skywalker Dagobert moneymoneymoney Donald enwo34qindk!d Luke sykwalker Tick mysecretpassword Trick mysecretpassword Track mysecretpassword

–134– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash Wisdom # 4: Hashing Passwords helps . . . but is not good enough

name H(pwd) Anakin viqwbnqwtomwm Dagobert wer4mnrt4rnrm Donald r034jionksioe Luke viqwbnqwtomwm Tick sdjklasdle9nr Trick sdjklasdle9nr Track sdjklasdle9nr

best attack: I assume “dictionary” with N “common” passwords I compute another dictionary with N hashed passwords I attacking each account in time O(1), if password is “common”

–135– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash Wisdom # 5: Salt and Hash Passwords

name salt H(salt, pwd) Anakin 34892 4unuiio8nuue7 Dagobert 29495 ksni9m8k89kiu Donald 09858 cdk5jkambydyu Luke 45888 xumun6muzyqjo Tick 19495 cnjk9mk3msdfk Track 27849 dekcexcidklc7 Trick 90479 yei7kmdkx2dcx

best attack: I assume “dictionary” with N “common” passwords I attacking each account in time O(N), if password is “common” (if the salt never repeats)

–136– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.1: Key Stretching Wisdom # 6: Perform “Key Stretching”

I stretching by k bit I attacks slow down 2k times I “virtual” entropy goes from β to β + k I unfortunately, the defenders’ operations may also slow down by 2k times I so the idea is to I choose k as large as possible I without annoying the defender

–137– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.1: Key Stretching Generate Password Hashes with Stretching

log ( ) iteration: stretch by 2 N bit pepper: stretch by p bit input: pwd, salt, N input: pwd, salt, p X ← H(salt, pwd) choose random R < 2p for I ← 1 to N do X ← H(salt, R, pwd) X ← H(salt, X) return X end for return X

–138– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.1: Key Stretching Verify Password Hashes verify iteration verify pepper input: pwd, salt, N, X 0 input: pwd, salt, p, X 0 X ← H(salt, pwd) choose random R0 < 2p for I ← 1 to N do for I ← 0 to 2p − 1 do X ← H(salt, X) R00 ← (R0 + I) mod 2p end for X 0 ← H(salt, R00, X) accept if X = X 0 else reject accept if X = X 0 end for reject Advantages and disadvantages: +iteration generation and verification are the same +pepper fast generation +pepper parallelizable (why is this an advantage for the defender, and not a disadvantage?) -pepper R must be random – and even secret (why?)

–139– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.1: Key Stretching Iteration – Without Knowing the Count just let the user decide for herself

unknown iteration verify unknown iteration input: pwd, salt, N input: pwd, salt, N, X X ← H(salt, pwd) X 0 ← H(salt, pwd) while true do while X 6= X 0 do X ← H(salt, X) X 0 ← H(salt, X) end while end while when exception: {user: ctrl-c} accept return X when exception: {user: ctrl-c} reject

–140– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.1: Key Stretching Usage Scenarios for Password Scramblers

I user authentication I key derivation I proof of work I ... These uses imply different attack models / security requirements!

–141– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.1: Key Stretching History until 2010

1960s: Wilkes: plain passwords are bad → store hash and compare 1978: crypt I 25 iterations of DES-like operation I 12 bit salt to hinder dictionary attacks 1980s: shadow passwords I store (user, salt, dummy) in A I store H(pwd, salt) in File B 1995: Abadi, Lomas, Needham: pepper 1997: Kelsey, Schneier, Hall, Wagner: analyzed iteration 2007: Boyen: unknown iteration count 2010: Turan, Barker, Burr, Chen: First standard for KDFs (PBKDF1/2 – Password-Based Key Derivation Function)

–142– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.1: Key Stretching Practical Password Scramblers and KDFs until 2010

1978: crypt: in UNIX-based systems based on DES (25 iterations), 12-bit salt 1995: md5crypt by Poul-Henning Kamp 64-bit salt, 1000 iterations of MD5 1995: bcrypt by Provos and Mazières based on Blowfish (Schneier, 1993) needs a significant (but constant) amount of memory: S-boxes (4 × 1024 Bytes) + subkey (72 Bytes)) 2010: PBKDF2 by NIST first standard for KDFs, can use hash function or block cipher

–143– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.1: Key Stretching PBKDF2 Password-Based Key Derivation Function Two F(pwd, salt, N, i) I F is the core function PRF(Key, Input) is modelled U ← PRF(pwd, salt || i32) I X ← U as a “pseudorandom for I ← 1 to N do function” U ← PRF(pwd, U)(∗) I Why is pwd used in every X ← X ⊕ U round? Isn’t that dangerous? end for I Should we replace line (∗) return X by U ← PRF(U)? I PRF could be instantiated by I a block cipher E: PRF(K , U) = EK (U) I a MAC M: PRF(K , U) = MK (U) I a hash function H: PRF(K , U) = H(K || U) I the HMAC-construction, using H in a nested way:

PRF(K , U) = H(K ⊕ const2 || (H(K ⊕ const1 || U))

–144– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.1: Key Stretching PBKDF2 can Generate Large Outputs relevant for key derivation

I For a required number of output bits, PBKDF2 concatenates F(pwd, salt, N, 1) || F(pwd, salt, N, 2) || ... truncating the final call to F(pwd, salt, N, i) to the required number of bits I Example WPA2: I PBKDF2 with PRF = HMAC-SHA-1 and N = 4096 iterations I output 256 bits, but HMAC-SHA-1 provides 160 bits I thus, call F twice I use all the 160 bits from F(pwd, salt, N, 1) I and the first 96 bits from F(pwd, salt, N, 2)

–145– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.1: Key Stretching 7.2: Considering Memory Timeline

2009: scrypt (Percival) 2013: Catena (Forler, Lucks, Wenzel) 2013–2015: Password Hashing Competition 2015: Argon2 (Biryukov, Dinu, Khovratovich) wins competition 2015–now: theoretical results on amortised costs

–146– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.2: Memory The Advance of Massively Parallel Commodity Hardware

http://www.nvidia.com/object/what-is-gpu-computing.html

I commodity hardware with an abundance of parallel cores I attacker can try out any number of password in parallel I defender is hashing a single password I also, defender does not always have so many cores

–147– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.2: Memory Wisdom # 7: Storage is Expensive . . . use this, to make the adversary’s life harder CPU (Multiple Cores) GPU (Hundreds of Cores)

Core 1 Core 2

Core 3 Core 4

Cache Cache similar cache and memory sizes RAM RAM

Adversaries with cheap of-the-shelf parallelizable hardware (GPUs, FPGAs, . . . ) don not have much memory – especially fast cache-memory. The cost for expensive special-purpose hardware is driven up by memory costs.

–148– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.2: Memory Memory-Hard Functions Percival (2009)

I let f be a function which can be computed in time T using space S I consider a machine with S/k units of memory, instead of S

Definition 12 A function f is memory-hard if computing f (x) with an input of size n needs S(n) space and T 0(n) units of operations, where   S(n) ∗ T 0(n) ∈ Ω T 0(n)2−

for  > 0.

≈ computing f with S/k units of memory takes kT operations

–149– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.2: Memory Sequentially Memory-Hard Functions Percival (2009)

I if we consider physical “time” rather than number of operations, the speet-up on a parallel machine can be a concern Definition 13 A function f is sequentially memory-hard if (1) it is memory-hard and (2) it cannot be computed on a machine with S(n) processors and S(n) space in expected time T (n), where   S(n) ∗ T (n) = O T (n)2−

for any  > 0.

≈ computing f with T /k units of memory takes time kT – even with any number of parallel cores

–150– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.2: Memory The Outer Layer of scrypt Percival, 2009

scrypt (pwd, salt, N) B ← PBKDF2(pwd, salt, 1) B ← ROMix(B, N) B ← PBKDF2(pwd, B, 1) return B

I ROMix is the core operation (see below) I framed by two calls to PBKDF2 (HMAC-SHA-1 and one iteration) I input B of ROMix is password-dependent I postprocessing needs the pwd again (fix: use zero string, instead)

–151– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.2: Memory ROMix, the Core of scrypt Percival, 2009 ROMix(B, N) X ← B for i ← 0 to N − 1 {initialize V0,..., VN−1} do ← X X ← H(X) end for for i ← 0 to N − 1 {read V0,..., VN−1 random points} do j ← X mod N X ← H(X ⊕ Vj ) end for return X

I ROMIX uses a complex “BlockMix” operation, for H, based on the stream cipher SALSA20/8. But one can use any good hash function, instead.

–152– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.2: Memory Properties of ROMix

Theorem 14

1. Given sufficient storage for all the N values Vi , ROMix can be computed in time O(N).

2. Given sufficient storage for N/k of the Vi , ROMix can be computed in time O(kN). 3. ROMix is sequentially memory-hard.

The proof for the third property is technically surprisingly complicated, and Percival’s original proof is flawed. But the core idea for the proof is simple and sound (→ blackboard).

–153– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.2: Memory Garbage-Collector Attack (GCA)

X ← B for i = 0,..., N − 1 do for i ← 0 to N − 1 do j ← X mod N V ← X i x ← H(X ⊕ v ) X ← H(X) i end for end for

I After ROMix, v0 = B = H(pwd, salt) still resides in the RAM

I An adversary with access to the RAM may use v0 to search for the password (very time and memory efficient) Countermeasures: 1. zeroize v (v ← 0) (beware of optimizing compilers!) 2. memory-hard function updating v

–154– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.2: Memory Cache-Timing Attacks (CTAs): Shared Cache

P1 P2

cache

main memory

cache: fast memory, holding partical copies of data in slow main memory shared cache: different processes share same cache (even on different cores) security: no process reads data from other process (without specific authorisation)

–155– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.2: Memory Cache-Timing Attacks (CTAs): Setting

victim spy

(1) (2) (3) (4)

(1) victim allocates memory, copy is stored in cache (2) spy allocates memory, victim’s data ejected from cache (3) victim executes algorithm reads some data, ejecting some of spy’s data from cache (4) spy finds out which of its data have been ejected (cache-hit much faster than cache-miss)

–156– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.2: Memory CTAs: Application to ROMix recover the password-dependent memory-access pattern (MAP) victim spy

for i = 0,..., N − 1 do (1) j ← x mod N (2) x ← H(x ⊕ vj ) (3) end for (4)

(1) for 1st phase of ROMix (2) allocate array w[0 ... N − 1] for all i: something to w[i] (3) allow ROMix to run 2nd phase (4) for all i: w[i] ←time(read w[i])

if w[i] is small then cache-hit (2nd phase did not read vi )

–157– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.2: Memory CTA-Exploits, given the MAP

fast sieve interrupt 2nd phase after k  N iterations: (a (k/N)-fraction of the vi has not been read) memory-efficient sieve at the end of 2nd phase: 1 (a (1 − e )-fraction of the vi has not been read): I compute the vj on-the-fly, when needed I stop immediately, if j occurs with vj not read I space O(1) I time 1.8N (“legal” hash would take time 2N) de-anonmyize: compare the MAPs from different (anonymous) log-ins Completely new attack vector:

off-line attacks without knowing password-hashes

(would be impossible when using, say, PBKDF2!)

–158– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.2: Memory 7.3: The Pebble Game an all but forgotton methodology to analyze storage bounds main results here based on Lengauer & Tarjan, 1979, 1982

I program for a function F without any data-dependent branches (so-called “straight-line program”), I data flow a directed acyclic graph (DAG): I vertices without ancestors: inputs I vertices with ancestors represent the result of an operation e.g., if v3 = f (v1, v2) then v1 → v3 and v2 → v3 I edges represent the data flow I the fan-in for any vertex is bounded, e.g., by 2, if all operations are of the form v3 = f (v1, v2) or v4 = g(v5)

Note that DAGs can be topologically sorted, i.e., the vertices vi can be arranged such that i > max{i1, i2,..., ij } holds, if vi = f (i1, i2,..., ij ).

–159– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.3: Pebble Game Rules for the Pebble Game

Let vi be a vertex: pebble game moves: if all direct ancestors of v are pebbled, then I a new pebble can be placed on v or I a pebble can be moved from a direct ancestors to v (thus, an input vertex can be pebbled at any time) pebble recycling: at any time, a pebble can be removed pebble game goal: pebble all the vertices in the graph (or pebble one or vertices marked as “output”) ressources: (algorithmic goal: minimize ressources) I memory M = max(#pebbles) I (sequential) time T = (#moves)

–160– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.3: Pebble Game Example: Balanced Binary Tree (BBT)

I height t, one root, 2t leafs I n = 2t−1 − 1 vertices, n − 1 edges I can be pebbled in time n and space n − 1 (trivial) I can be pebbled in time n and space t + 1 (see below) A BBT of height 3.

–161– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.3: Pebble Game Theorem 15 A height-t-BBT can be pebbled with t + 1 pebbles, but not with less.

Sketch of Proof: The tricky part is “not with less”, i.e., given only t pebbles, we cannot pebble the BBT: Before pebbling the root, all paths to the root must be pebbled. Consider the first time all those paths become pebbled. This happens when pebbling an input vertex v. Now ensure all the other paths are pebbled while leaving the final path unpebbled. Count the pebbles.

Remark Depending on the operations, other graphs to compute the same function with less pebbles can exist. Consider, e.g., “+”.

–162– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.3: Pebble Game Permutation Graphs

I π: permutation over {0,..., n − 1}, graph G(π)

I n upper vertices σ0, . . . σn−1 and n lower vertices τ0, . . . τn−1

input vertex σ input I 0 n upper vertices n−1 horizontal edges I n − 1 edges σi−1 → σi n vertical edges I n edges σi → τπ(i) n−1 horizontal edges I n − 1 edges τi−1 → τi n lower vertices output I output vertex τn−1

–163– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.3: Pebble Game Pebbling Permutation Graphs

Theorem 16 G(π) can be pebbled

I with M = n + 1 pebbles in time Tn+1 = 3n − 2 ∈ O(n), and 2 2 I with M = 2 pebbles in time T2 ≤ n + n ∈ O(n ).

input n upper vertices n−1 horizontal edges

n vertical edges

n−1 horizontal edges n lower vertices output

Theorem 17 2 With M + 1 pebbles, G(π) can be pebbled in time TM+1 ∈ O(n /M).

–164– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.3: Pebble Game The Bit-Reversal Graph G(rev)

revn(a0, a1,..., an−1) = (an−1,..., a1, a0); e.g. rev(13)4 = rev1101 = (1011) = 11.

–165– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.3: Pebble Game A Property of the Bit-Reversal Graph

I pick 2x consecutive top vertices, starting at j = k ∗ 2x :

vk∗2x , vk∗2x +1,..., vk∗2x +2x −1

I their n − x most significant bits are the same

I thus, the n − x least significant bits of rev(vi ) are the same n−x I thus the rev(vi ) are 2 steps apart

–166– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.3: Pebble Game The Same Property in Reverse

–167– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.3: Pebble Game Pebbling the Bit-Reversal Graph G(rev) Theorem 18 G(rev) provides the same time-memory tradeoff as ROMix:

ST = Θ(n2).

–168– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.3: Pebble Game 7.4: Catena Forler, Lucks, Wenzel (2013)

I We started the reseach in 2012 I Countermeasure to cache-timing attacks: memory-hard function with password-independent MAP I formalism to analyze the function (and prove some form of memory-hardness): pebble game

–169– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.4: Catena Specification for the Catena Core

CatenaH (C, X) C for i ← 0 to 2 − 1 {initialize V0,..., VN−1} do Vi ← X X ← H(X) end for for i ← 0 to 2c − 1 do j ← F(i) {j depends on i, but not on X!} X ← H(X, Vj ) end for return X

Q: What choices for F do you recommend? A: We recommend variants based on 1. the Bit-Reversal Graph (BRG), and 2. the Double-Butterfly Graph (DBG).

–170– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.4: Catena Bit-Reversal Graph (BRG)

Idea (borrowed from Lengauer and Tarjan [12]) Construct a (hash) graph that exploits the bit reversal permutation τ (3-bit example: τ(6) = τ(110) = 011 = 3)

–171– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.4: Catena Bit-Reversal Hashing blue: initialization, red: mixing

–172– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.4: Catena Memory Hardness

I In [LengauerT82], Lengauer and Tarjan have shown that the BRG is memory-hard (pebble game proof) I =⇒ BRH is memory-hard I =⇒ Catena-BRG is memory-hard (Catena with F(g, x) = BRH(g, x))

–173– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.4: Catena Sweet Spotting Catena-BRG is not sequentially memory-hard

Observation √ I For S = G we have T = Ω(G1.5) √ I O( G) cores, each with S = O(1), can compute BRH in parallel I Total amount hash function calls: O(G1.5) I Runtime: O(G) I In general, sequential memory-hardness seems to be impossible with a password-independent MAP

–174– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.4: Catena λ-Memory-Hard Functions

I perhaps, we can “punish” the memory-saving adversary by over-proportionally increasing the time? Definition 19 A function f is λ-memory-hard, if computing f (s) with an input of size n needs S(n) space and T (n) units of operations, where   S(n) ∗ T (n) ∈ Ω T (n)λ+1−

for  > 0. ≈ computing f with T /k units of memory takes k λT operations !!! memory-hard = 1-memory-hard

–175– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.4: Catena λ-Memory Hardness

Q: Is a stack of (a constant) λ BRGs λ-memory hard? A: We did hope so. Alas, even a stack of λ BRGs is 1-memory hard, only (Biryukov, Khovratovich, 2014).

–176– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.4: Catena Double Butterfly Graph

Double Butterfly Graph (DBG) Two interlocked Cooley-Tukey FFT Graphs (omitting one middle row) + Sequential Layer

–177– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.4: Catena Superconcentrator

Q: Why the Double Butterfly Graph? A: Because it is a “superconcentrator” Lengauer and Tarjan (1982): a stack of λ superconcentrators is λ-memory-hard

–178– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.4: Catena 7.5: The Password Hashing Competition (PHC)

I ran from 2013 to 2015 as an open competition I 24 candidates, including Catena I selected one winner, Argon2: I heavily tweaked version of the original Argon submission I two variants I data-dependent Argon2d (somewhat similar to ROMix) I data-independent Argon2i (cache-timing resistant like Catena-BRH, but with a pseudo-random funcion F) I Catena was awarded a special recognition for its agile framework approach and side-channel resistance.

–179– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.5: The PHC 7.6: Amortised Costs for Parallel Pebbling the PHC is over, the research has just started

some authors, such as Alwen and Blocki at Crypto 2016, propose a parallel Random Oracle Model (pROM) as a (more realistic) approach to measure ressources/costs:

I σt−1 ⊆ V denotes the pebbled vertices before step t I qt ⊆ V denotes the vertices to be pebbled in step t (if v ∈ qt , all parents of v must be in σt−1!)

I σt ⊆ σt−1 ∪ qt denote the pebbled vertices after step t

–180– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.6: Amortised Costs Ressources

parallel time: maxt amortised memory: (cmc: cumulative memory complexity) X cmc = |σt | t

amortised work: (crc: cumulative random oracle complexity) X crc = |qt | t

power consumption: for some constants c1 and c2:

c1 ∗ cmc + c2 ∗ crc

Note that the model neglects the fixed-costs for the hardware and the overhead for communication/synchronisation!

–181– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.6: Amortised Costs Example

height t, one root, 2t leafs n = 2t−1 − 1 vertices, n − 1 edges can be pebbled in time d: 1. pebble vertices at level 0 (leafs) 2. for i in {1,..., d}: 2.1 pebble vertices at level i 2.2 remove pebbles from level i − 1 A BBT of height 3. d d X X cmc = 2i = 2d+1 − 1 crc = 2i i=0 i=1

–182– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.6: Amortised Costs A Randomly Generated Example DAG

0 1 2 3 4 5 6 7 8 9 10 11

>>> import random >>> for i in range(12): ... if i > 1: ... print(i, random.randrange(0,i-1))

–183– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.6: Amortised Costs Remove Some Vertices and the Connected Edges

0 1 2 3 4 5 6 7 8 9 10 11

all vertices, longest path: (0,1,2,3,4,5,6,7,8,9,10,11), depth 12

0 1 2 3 4 56 7 8 9 10 11

removed: 0,6; longest: (1,2,3,9,10,11) and (1,7,8,9,10,11), depth 6

0 1 23 4 56 7 89 10 11

removed: 0,3,6,9; longest paths: (1,2,4,5), and (1,2,7,8), depth 4

–184– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.6: Amortised Costs Depth Reducibility

0 1 23 4 56 7 89 10 11

Definition 20 A DAG G = (V , E) is (s, d)-depth reducible with respect to a set of vertices S ⊆ V , if |S| = s and G − S has depth d. Otherwise, G is (s, d)-depth robust.

As we have seen, our random example graph is both (2, 6)-depth reducible and (4, 4)-depth reducible.

–185– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.6: Amortised Costs Observation

0 1 2 3 4 5 6 7 8 9 10 11

Consider a (s, d)-depth reducible graph G = (V , E), with respect to S. I If all vertices in S are pebbled, we can pebble all other vertices in parallel time ≤ d.

I If V is topologically sorted, and the vertices in vj ∈ S are pebbled,

we can pebble all vertices vi ∈ V with i > j in parallel time ≤ d.

–186– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.6: Amortised Costs Core Idea to Save Power when pebbling a depth-reducible graph

Consider a (s, d)-depth reducible graph G = (V , E) with constant fan-in. I Keep all pebbles on vertices from S. I Switch between “balloon” and “light” phases. I Each “balloon” phase: up to |V | pebbles. I Each “light” phase: I removes all pebbles, except I for those on vertices in S I and the direct parents of the vi pebbled in the light phase I then consecutively pebble up to d vertices Password-Verification takes less power, but is slower!

–187– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.6: Amortised Costs Possible Implementation one “balloon” engine switched off most of the time, one “light” engine

–188– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.6: Amortised Costs Parallel Computation

Consider a (s, d)-depth reducible graph G = (V , E) with constant fan-in. I one central core with B memory for balloon phase I k decentralized cores with L memory for the light phases I total memory: B + kL

–189– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.6: Amortised Costs Possible Implementation one “balloon” engine, k “light” engines

–190– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.6: Amortised Costs Negative Results

For N storage units, the defender would need time and space N with cmc N2. At Crypto 2016, Alwen and Blocki found algorithms with improved cmc I Catena (all versions): cmc N1.67, using specific properties of the double-butterfly and the bit-reversal graphs I Argon2i (expected): cmc n1.75, using statistical properties of random graphs 2 I generic for password-independent functions: cmc N / log2 N

–191– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.6: Amortised Costs Positive Result Theorem 21 (Alwen, Chen, Pietrzak, Reyzin, Tessaro: Eurocrypt 2017) The cmc for evaluating scrypt is Θ(N2).

Sketch of Proof.

Recall that scrypt uses N units Vi = H(Vi−1) of memory. Consider an aversary A with M < N units of memory. The proof consists of three steps. 1. Single-shot game: a pROM game, where A is given a random j and it must compute h(Vj ). If M cells are pebbled, this needs parallel time ≥ M/2N (with probability ≥ 1/2).

2. Multi-challenge game: a pROM game, where A is given random ji after producing H(Vi−1). After Q such challenges, the pebbling CMC is in Ω(NQ) (with overwhelming probability). 3. Derive the memory cmc from the pebbling cmc.

–192– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.6: Amortised Costs Remark The need for the third proof step is surprising. H is a random oracle. Pebbling H(i) is the same as storing H(i), isn’t it?

And, if we need x values H(1),..., H(x), and we did only store y < x such values, we must call H x − y times, don’t we?

Well, this is wrong! For simplicity, consider x = 2 and y = 0.

We neither store H(1) nor H(2), but we store Z = H(1) ⊕ H(2). Now we can compute both H(1) and H(2) by calling H only once.

As it turns out, if H is a random oracle, then no adversary A can anyhow benefit from “compressing” the outputs of H.

This may seem obvious, but this is actually far from trivial.

Note that A is allowed to output wrong results! A just needs to output the correct result with some nonnegligible probability.

–193– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.6: Amortised Costs Discussion (1)

Is the amortised approach really practical? I lengthy discussion among experts, no definitive conclusion I I am not an expert on hardware, I but the power-saving approach seems plausible to me I the k-way parallel approach not so much (seems to require cost and power consumption to be in(O(k)))

–194– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.6: Amortised Costs Discussion (2)

Is the price for password-independent MAPs too high? I I am not sure! I Memory-demanding password scramblers have been introduced to improve the defender’s security, and to hinder the attackers. I For a given budget of “time”, scrypt-like password scramblers with a password-dependent MAP can utilize the memory much better. I Thus, scrypt and fellows give a better security if the password hash has been compromised. I On the other hand, thanks to their cache-time vulnerability, scrypt and fellows opens can be attacked even when the password hash has not been compromised.

–195– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.6: Amortised Costs Discussion (3)

What is the right model for memory-demanding functions, anyway? I I am not sure! I Remember that such functions are also used for proofs of work/space, digital currencies, etc. I Probably the model has to depend on the specific application and its threat model.

Plenty of open questions for further research!

–196– Stefan Lucks Hash Fun. (2019) 7: Passw.-Hash 7.6: Amortised Costs