<<

arXiv:2009.10060v5 [cs.CR] 16 Aug 2021 fiepswr takr odpswr ahfunction hash good A his attacker. maximizes password offline that guesses of number invest the [1 cracker utility. to choose password willing rational will is A password. [2] attacker the the crack that offline to resources trying wants o the he The server. is as limit guesses the with many interacting as without — check can attacker The u oprn hshs au ihtehse flkl password likely if of checking hashes by the i.e., with guesses value hash this comparing asodcakn.A fieatce h a bandthe obtained offline ( has of hash who threat cryptographic attacker dangerous (salted) offline the An to cracking. password user of billions tcebr Game Stackelberg ocrsta ol edt ecniee eoeaotn ou adopting socie before important considered solution. are be signaling to there strength need password as would cu of proof-of-concept the that results view concerns a we the that as stress While we solution attacks. positive are online) analysis empirical (resp. offline against aaesadso hti a euetettlnme fcrack of number to total up the reduce by can it passwords that passw several show on and mechanism sc datasets our signaling evaluate We optimal defender. the the for us compute We attack passwords. to fewer the algorithm cracking evolutionary encourage by costs will guessing his strategy reduce signaling well-defined a o eosmgm .. h takrspoti ie ythe passwords by given cracked is profit the cracki attacker’s password the of i.e., that game fact zero-sum a the not exploits scheme signaling The os itiuinfrtesga a fe etnds hta that so tuned the be that often show can crack we will signal passwo Surprisingly, attacker the (profit-maximizing) user find. rational for each to of distribution attacker strength noise offline the s about an authentication signal the for have (noisy) f pass to a the to is explore store Persuasion idea We Bayesian invest. from Our to limi authentication. ideas willing is applying is of attacker he sibility resources The ha the server. stolen by authentication the only can with breached value attacker a hash billions/trillions) their from offline comparing sometimes by An guesses (or word attacks. thr millions dangerous cracking check the quickly password breac to passwords offline Recent user of cracking. of billions password exposed have against defense potential spassword ’s asodhsigsre sals ieo ees gis an against defense of line last a as serves hashing Password ntels eae ag cl aabece aeexposed have data-breaches scale large decade, last the In ne Terms Index Abstract asodSrnt inln:ACounter-Intuitive A Signaling: Strength Password W nrdc asodsrnt inln sa as signaling strength password introduce —We et fCmue Science of Dept. etLfyte USA Lafayette, West Bysa esain asodAuthentication, Password Persuasion, —Bayesian ( [email protected] udeUniversity Purdue pw ees gis asodCracking Password Against Defense ejeBai Wenjie u ) 12% .I I. a tep ocaktepswr by password the crack to attempt can NTRODUCTION (resp. minus h u ′ h u = 5% h oa usigcs.Thus, cost. guessing total the = H faluesi defending in users all of ) ( H ( salt u pw , u pw , fewer ′ ) u o each for et fCmue Science Computer of Dept. ) fauser a of ) passwords. etLfyte USA Lafayette, West [email protected] udeUniversity Purdue fpass- of eeihBlocki Jeremiah heme value erver word rrent rto er gis ng pw an e our ord nly hes ted ea- eat tal H ed rd sh in ], ′ r . ihycounter-intuitive highly evalua a to needs server authentication H honest the pa make as of cannot compute billions we However, or guesses. millions becomes word check it that to so expensive compute prohibitively to expensive moderately be should h uhniainsre tr niy signal (noisy) hav a [3] to store strength Persuasion propose server we authentication password Bayesian Specifically, the apply costs: authentication. password we hashing to particular, impact In not signaling! does which tmnmmteatce ol lascos oinr the ignore to choose always could attacker signal. the minimum at rps ohv h uhniainsre nta tr the store instead ( server authentication the have to propose atvleand value salt olaigifrainaotyu cin oee,w stres we However, benefit is action. cracking in no password your e.g., that is about there [4] information rock-paper-scissors leaking signaling directly like to information is game ther from zero-sum game gain benefit a zero-sum attacker’s no a the In be loss. i.e., would defender’s game the to zero-sum proportional a is ing ( password. user’s the of strength the with correlated ae ntesrnt fteue’ password user’s the of strength the on based h takrwl nyuetesignal the use only will attacker The sig uhniainsre sbece.I at h authentica the uses fact, even In never breached. server is server authentication teghsgaigwudprud h takrt rc few crack passwo to attacker his that the possible minus th persuade is passwords would by it signaling cracked the given strength Thus, the costs. offset is of guessing costs utility all (expected) attacker’s of guessing value the when (expected) particular, w i.e., even In password marginal is reward. a utility attacker’s cracks contra the By he that cracked. are possible that is propor passwords it user inversely of is fraction utility the signal) to strength of sender (the nteae fBysa esain[3]. Persuasion Bayesian of area the in illookup will ,salt u, ,salt u, 2 1 ti aua,bticret oiaieta asodcrack password that imagine to incorrect, but natural, is It rdtoal,a uhniainsre trstetuple the stores server authentication an Traditionally, fauser a If h rps a els one-nutv otoefamiliar those to counter-intuitive less be may propose The vr ieaue uhniae.I hsppr eexplore we paper, this In authenticates. user a time every u ssml eoddfra fieatce ofidi the if find to attacker offline an for recorded simply is u u sig , h , salt u u tepst oi ihpassword with login to attempts ) u u h , and o ahuser each for h u u h ) u hr h niy signal (noisy) the where , = n accept and H et fCmue Science Computer of Dept. sig ( not salt 1 etLfyte USA Lafayette, West [email protected] u ees gis ainlattackers rational against defense udeUniversity Purdue eosmgm.Tedefender’s The game. zero-sum a pw hnteuser the when u e Harsha Ben u pw , ′ fadol if only and if where u ) sig pw stesle ah We hash. salted the is u ′ salt h uhniainserver authentication the H fi sbnfiil— beneficial is it if h u u o xesv to expensive too pw u sig = authenticates sig sarandom a is u H u ihpirwork prior with h signal The . ( u ssampled is salt hc is which u tional pw , tuple tion hen ss- st, rd ′ er te 2 ) e e e s - . . passwords to reduce guessing costs. Indeed, we show that attempts are handled exactly as they were before i.e. the signal the signal distribution can be tuned so that a rational (profit- information is ignored. maximizing) attacker will crack fewer passwords. We conclude by discussing several societal and ethical is- To provide some intuition of why information signaling sues that would need to be addressed before password strength might be beneficial, we give two examples. signaling is used. While password strength signaling decreases a) Example 1: Suppose that we add a signal sigu = 1 the total number of compromised accounts, there may be a few to indicate that user u’s password pwu is uncrackable (e.g., users whose accounts are cracked because they were assigned the entropy of the password is over 60-) and we add the an “unlucky” signal. One possible solution might be to allow signal sigu =0 otherwise. In this case, the attacker will simply users to opt-in (resp. opt-out). Another approach might try choose to ignore accounts with sigu = 1 to reduce his total to constrain the solution space to ensure that there are no guessing cost. However, the number of cracked user passwords “unlucky” users. stays unchanged. b) Example 2: Suppose that we modify the signaling II. RELATED WORK scheme above so that even when the user’s password pwu The human tendency to pick weaker passwords has been is not deemed to be uncrackable we still signal sigu =1 with well documented e.g., [5]. Convincing users to select stronger probability ǫ and sigu = 0 otherwise. If the user’s password passwords is a difficult task [8]–[13]. One line of research is uncrackable we always signal sigu =1. Assuming that ǫ is uses password strength meters to nudge users to select strong not too large a rational attacker might still choose to ignore passwords [14]–[16] though a common finding is that users any account with sigu =1 i.e., the attacker’s expected reward were not persuaded to select a stronger password [15], [16]. will decrease slightly, but the attacker’s guessing costs will Another approach is to require users to follow stringent also be reduced. In this example, the fraction of cracked user guidelines when they create their password. However it has passwords is reduced by up to ǫ i.e., any lucky user u with been shown that these methods also suffer from usability issues sigu =1 will not have their password cracked. [12], [17]–[19], and in some cases can even lead to users In this work, we explore the following questions: Can selecting weaker passwords [9], [20]. information signaling be used to protect passwords against Offline attacks have been around for rational attackers? If so, how can we compute the optimal decades [21]. There is a large body of research on password signaling strategy? cracking techniques. State of the art cracking methods employ methods like Probabilistic Context-Free Grammars [22]–[24], A. Contributions Markov models [25]–[28], and neural networks [29]. Further We introduce password information signaling as a novel, work [30] has described methods of retrieving guessing num- counter-intuitive, defense against rational password attackers. bers from commonly used tools like Hashcat [31] and John We adapt a Stackelberg game-theoretic model of Blocki and the Ripper [32]. Datta [1] to characterize the behavior of a rational pass- A good password hashing algorithm should be moderately word adversary and the optimal signaling strategy for an expensive so that it is prohibitively expensive for an offline authentication server (defender). We analyze the performance attacker to check billions of password guesses. Password of password information signaling using several large pass- BCRYPT [33] or PBKDF2 [34] attempt to increase guessing word datasets: Bfield, Brazzers, Clixsense, CSDN, Neopets, this by iterating the hash function some number of times. 000webhost, RockYou, Yahoo! [5], [6], and LinkedIn [7]. We However, Blocki et al. [2] argued that hash iteration cannot analyze our mechanism both in the idealistic setting, where the adequately deter an offline attacker due to the existence of defender has perfect knowledge of the user password distribu- sophisticated ASICs (e.g., Bitcoin miners) which can compute tion P and the attacker’s value v for each cracked password, the underling hash function trillions of times per second. as well as in a more realistic setting where the defender only Instead, they advocate for the use of Memory Hard Functions is given approximations of P and v. In our experiments, we (MHF) for password hashing. analyze the fraction xsig (v) (resp. xno−sig (v)) of passwords MHFs at their core require some large amount of memory that a rational attacker would crack if the authentication server to compute in addition to longer computation times. Candidate uses (resp. does not use) password information signaling. We MHFs include SCRYPT [35], Balloon hashing [36], and find that the reduction in the number of cracked passwords can Argon2 [37] (the winner of the Password Hashing Compe- be substantial e.g., xno−sig (v)−xsig (v) ≈ 8% under empirical tition [38]). MHFs can be split into two distinct categories distribution and 13% under Monte Carlo distribution. We also or modes of operation - data-independent MHFs (iMHFs) show that information signaling can be used to help deter and data-dependent MHFs(dMHFs) (along with the hybrid online attacks when CAPTCHAs are used for throttling. idMHF, which runs in both modes). dMHFs like SCRYPT are An additional advantage of our information signaling maximally memory hard [39], although they have the issue of method is that it is independent of the password hashing possible side-channel attacks. Closely related to the notion of method and requires no additional hashing work. Implementa- memory hardness is that of depth-robustness - a property of tion involves some determination of which signal to attach to directed acyclic graphs (DAG). Alwen and Blocki showed that a certain account, but beyond that, any future authentication a depth robust DAG is both necessary [40] and sufficient [41] to construct a data-independent memory-hard function. Recent password authentication: (1) the action space of the receiver work has proposed candidate iMHF constructions that show (attacker) is exponential in the size of (the support of) the resistance to currently-known attacks [42]. Harsha and Blocki password distribution, and (2) the sender’s objective function introduced a memory-hard KDF which accepts the input pass- is non-linear. words as a stream so that the hashing algorithm can perform extra computation while the user is typing the password [43]. III. PRELIMINARIES Blocki and Datta [1] used a Stackelberg game to model A. Password Representation the behavior of a rational (profit-motivated) attacker against We use P to denote the set of all passwords that a user might a cost-asymmetric secure hashing (CASH) scheme. However, select and we use P to denote a distribution over user selected the CASH mechanism is not easily integrated with modern passwords i.e., a new user will select the password pw ∈ P memory-hard functions. By contrast, information signaling with probability Prx∼P [x = pw] — we typically write Pr[pw] does not require any changes to the password hashing algo- for notational simplicity. rithm. 1) Password Datasets: Given a set of N users U = Distributed hashing methods (e.g. [44]–[47]) offer a method {u1,...,uN } the corresponding password dataset Du is given to distribute storage and/or computation over multiple servers. by the multiset Du = {pwu1 ,...,pwuN } where pwui denotes Thus, an attacker who only breaches one server would not be the password selected by user ui. Fixing a password dataset able to mount an offline attack. Juels and Rivest proposed the D we let fi denote the number of users who selected the inclusion of several false entries per user, with authentication ith most popular password in the dataset. We note that that attempts checked against an independent server to see if the f1 ≥ f2 ≥ ... and that i fi = N gives the total number N correct entry was selected [48]. These “Honeyword” pass- of users in the original dataset. words serve as an alarm that an offline cracking attack is being 2) Empirical PasswordP Distribution: Viewing our dataset attempted. Other methods of slowing down attackers include D as N independent samples from the (unknown) distri- requiring some hard (for ) problem to be solved bution P, we use fi/N as an empirical estimate of the after several failed authentication attempts (e.g. by using a probability of the ith most common password pwi and CAPTCHA) [49]–[51]. An orthogonal line of research aims Df = (f1,f2,...) as the corresponding frequency list. In to protect users against online guessing attacks [52], [53]. addition, De is used to denoted the corresponding empirical A large body of research has focused on alternatives to distribution i.e., Prx∼De [x = pwi] = fi/N. Because the text passwords. Alternatives have included one time pass- real distribution P is unknown we will typically work with words [54]–[56], challenge-response constructions [57], [58], the empirical distribution De. We remark that when fi ≫ 1 hardware tokens [59], [60], and biometrics [61]–[63]. While the empirical estimate will be close to the actual distribution all of these offer possible alternatives to traditional passwords i.e., Pr[pwi] ≈ fi/N, but when fi is small the empirical it has been noted that none of these strategies outperforms estimate will likely diverge from the true probability value. passwords in all areas [64]. Furthermore, it has been noted Thus, while the empirical distribution is useful to analyze that despite the shortcomings of passwords they remain the the performance of information signaling, when the password dominant method of authentication even today, and research value v is small this analysis will be less accurate for larger should acknowledge this fact and seek to better understand values of v i.e., once the rational attacker has incentive to start traditional password use [65]. cracking passwords with lower frequency. Password strength signaling is closely related to the litera- 3) Monte Carlo Password Distribution: Following [71] we ture on Bayesian Persuasion. Kamenica and Gentzkow [3] first also use the Monte Carlo Password Distribution Dm to eval- introduced the notion of Bayesian Persuasion where a person uate the performance of our password signaling mechanism (sender) chooses a signal to reveal to a receiver in an attempt to when v is large. The Monte Carlo distributions is derived convince the receiver to take an action that positively impacts by subsampling passwords from our dataset D, generating the welfare of both parties. Alonso and Camara [66] studied guessing numbers from state of the art password cracking the (sufficient) conditions under which a sender can benefit models, and fitting a distribution to the resulting guessing from persuasion. Dughmi et. al [67] and Hoefer et. al [68] curve. See more details in section VIII. study Bayesian Persuasion from an algorithmic standpoint 4) Password Equivalence Set: It is often convenient to in different contexts. There are a few prior results applying group passwords having (approxmiately) equal probability into Bayesian Persuasion in security contexts, e.g., patrols [69], an equivalence set es. Suppose there are N ′ equivalence sets, honeypots [70], with the sender (resp. receiver) playing the we typically have N ′ ≪ N. Thus, an algorithm whose running roles of defender (resp. attacker). To the best of our knowledge time scales with n′ is much faster than an algorithm whose Bayesian Persuasion has never been applied in the context running time scales with N, see Appendix A. of password authentication. In the most general case it is computationally intractable to compute the sender’s optimal B. Differential Privacy and Count Sketches strategy [67]. Most prior works use linear programming to find As part of our information signaling, we need a way for (or approximate) the sender’s optimal signaling strategy. We the authentication server to estimate the strength of each user’s stress that there are several unique challenges in the context of passwords. We propose to do this with a (differentially private) Count-Sketch data structure, which allows us to approximately C. Other Notation determine how many users have selected each particular Given a permutation π over all allowable passwords P we password. As a side-benefit the authentication server could B π let λ(π,B) := i=1 Pr [pwi ] denote the probability that a also use the Count-Sketch data structure to identify/ban overly randomly sampled password pw ∈ P would be cracked by popular passwords [72] and to defend against online guessing an attacker whoP checks the first B guesses according to the attacks [52], [53]. We first introduce the notion of differential π order π — here pwi is the ith password in the sequence π. privacy. Given an randomized algorithm A and a random string r we 1) ǫ-Differential Privacy: ǫ-Differential Privacy [73] is a use y ← A(x; r) to denote the output when we run A with mechanism that provides strong information-theoretic privacy input x fixing the outcome of the random coins to be r. We guarantees for all individuals in a dataset. Formally, an algo- use y ←$ A(x) to denote a random sample drawn by sampling rithm A preserves ǫ-differential privacy iff for all datasets D the random coins uniformly at random. Given a randomized ′ r and D that differ by only one element and all subsets S of (signaling) algorithm A : P → [0,b − 1] (where b is the Range(A): total number of signals) we define the conditional probability Pr[pw | y] := Pr [x = pw | y = A(pw)] and Pr [A(D) ∈ S] ≤ eǫ Pr [A(D′) ∈ S] . x∼P,r B ′ π In our context, we can think of D (resp. D ) as a password λ(π,B; y) := Pr[pwi | y] . i=1 dataset which does (resp. does not) include our user u’s X password pwu and we can think of A as a randomized algo- We remark that Pr[pw | y] can be evaluated using Bayes Law rithm that outputs a noisy count-sketch algorithm. Intuitively, given knowledge of the signaling algorithm A(x). differential privacy guarantees that an attacker cannot even tell if pwu was included when the count-sketch was generated. In IV. INFORMATION SIGNALING AND PASSWORD STORAGE ǫ particular, (up to a small multiplicative factor e ) the attacker In this section, we overview our basic signaling mechanism cannot tell the difference between A(D) and A(D′) the count- deferring until later how to optimally tune the parameters of sketch we sample when pwu was (resp. was not) included. the mechanism to minimize the number of cracked passwords. Thus, whatever the attacker hopes to know about u’s from A(D) the attacker could have learned from A(D′). A. Account Creation and Signaling 2) Count-sketch: A count sketch over some domain E is a When users create their accounts they provide a user name probabilistic data structure that stores some information about u and password pwu. First, the server runs the canonical the frequency of items seen in a stream of data — in our password storage procedure—randomly selecting a salt value password context we will use the domain E = P. A count- saltu and calculating the hash value hu = H(saltu, pwu). sketch functions as a table T with width ws columns and depth Next, the server calculates the (estimated) strength stru ← ds rows. Initially, T [i, j]=0 for all i ≤ ws and j ≤ ds. Each getStrength(pwu) of password pwu and samples the signal row is associated with a hash function P , with $ Hi : → [ws] sigu ← getSignal(stu). Finally, the server stores the tuple each of the hash functions used in the sketch being pairwise (u,saltu,sigu,hu) — later if the user u attempts to login with independent. a password pw′ the authentication server will accept pw′ if and P ′ To insert an element pw ∈ into the count sketch only if hu = H(saltu, pw ). The account creation process is we update T [i,Hi(pw)] ← T [i,Hi(pw)] + 1 for formally presented in Algorithm 1. 3 each i ≤ ds . To estimate the frequency of pw we would output f (T [1,H1(pw)],...,T [ds,Hds (pw)]) for Algorithm 1 Signaling during Account Creation Nds N some function f : → . In our experiments Input: u, pwu, L, d we instantiate a Count-Mean-Min Sketch where f = $ L 1: saltu ←{0, 1} #total−T [i,Hi(pw)] median T [i,Hi(pw)] − : i =1,...,ds dw−1 2: hu ← H(saltu, pwu) (#totalnis the total number of elements being inserted)o so 3: stru ← getStrength(pwu) that bias is subtracted from overall estimate. Other options $ 4: sigu ← getSignal(stru) are available too, e.g., f = min (Count-Min), f = mean 5: StoreRecord(u,saltu,sigu,hu) (Count-Mean-Sketch) and f = median (Count-Median) 4. Oserve that adding a password only alters the value of A traditional password hashing solution would simply T [i, j] at d locations. Thus, to preserve ǫ-differential privacy s store the tuple (u,salt ,h ) i.e., excluding the signal we can initialize each cell T [i, j] by adding Laplace noise with u u sig . Our mechanism requires two additionally subroutines scaling parameter d /ǫ [74]. u s getStrength() and getSignal() to generate this signal. The first algorithm is deterministic. It takes the user’s password 3In some instantiations of count sketch we would instead set pwu as input and outputs stru — (an estimate of) the T [i,Hi(pw)] ← T [i,Hi(pw)] + Gi(pw) where the hash function Gi : P → {−1, 1} password strength. The second randomized algorithm takes 4 Count-Median Sketch uses a different insersion method the (estimated) strength parameter stru and outputs a signal sigu. The whole signaling algorithm is the composition of conditional probability of cracking the user’s password by these two subroutines i.e., A = getSignal(getStrength(pw)). checking the first B guesses in permutation π. We use si,j to denote the probability of observing the signal sigu = j given that the estimated strength level was stru = i. C. Delayed Signaling Thus, getSignal() can be encoded using a signaling matrix S In some instances, the authentication server might imple- of dimension a × b, i.e., ment the password strength oracle getStrength() by training a (differentially private) Count-Sketch based on the user- s0,0 s0,1 ··· s0,b−1 selected passwords pw ∼ P. In this case, the strength s1,0 s1,1 ··· s1,b−1 u  . . . .  , estimation will not be accurate until a larger number N of . . .. .   users have registered. In this case, the authentication server sa , sa , ··· sa ,b   −1 0 −1 1 −1 −1 may want to delay signaling until after the Count-Sketch has   where a is the number of strength levels that passwords can been initialized. In particular, the authentication server will be labeled, b is the number of signals the server can generate store the tuple (u,saltu,sigu = ⊥,hu). During the next (successful) login with the password pwu the server can update and S[i, j]= si,j . We remark that for some signaling matrices (e.g., if sigu = getSignal (getStrength(pwu)). 5 S[i, 0] = 1 for all i ) then the actual signal sigu is V. ADVERSARY MODEL uncorrelated with the password pwu. In this case our mecha- nism is equivalent to the traditional (salted) password storage We adapt the economic model of [1] to capture the behavior mechanism where getSignal() is replaced with a constant/null of a rational attacker. We also make several assumptions: (1) function. getStrength() is password strength oracle that out- there is a value vu for each password pwu that the attacker puts the actual/estimated strength of a password. We discuss cracks; (2) the attacker is untargeted and that the value vu = ways that getStrength() could be implemented in Section VIII. v for each user u ∈ U; (3) by Kerckhoffs’s principle, the For now, we omit the implementation details of strength oracle password distribution P and the signaling matrix are known getStrength() for sake of readability. to the attacker. a) Value/Cost Estimates: One can derive a range of B. Generating Signals estimates for v based on black market studies e.g., Symantec We use [a]=0, 1,...,a − 1 (resp. [b]=0, 1,...,b − 1) reported that passwords generally sell for $4—$30 [75] and to denote the range of getStrength() (resp. getSignal()). For [76] reported that Yahoo! e-mail passwords sold for ≈ $1. example, if [a] = {0, 1, 2} then 0 would correspond to Similarly, we assume that the attacker pays a cost k each time weak passwords, 2 would correspond to strong passwords he evaluates the hash function H to check a password guess. and 1 would correspond to medium strength passwords. To We remark that one can estimate k ≈ $1 × 10−7 if we use a 6 generate signal for pwu, the server first invokes subroutine memory-hard function . getStrength(pwu) to get strength level stru = i ∈ [a] of pwu, then signals sigu = j ∈ [b] with probability A. Adversary Utility: No Signaling Pr[getSignal(pwu) = j | getStrength(pwu) = i] = S[i, j] = We first discuss how a rational adversary would behave si,j . when is no signal is available (traditional hashing). We defer a) Bayesian Update: An attacker who breaks into the the discussion of how the adversary would update his strategy authentication server will be able to observe the signal sigu after observing a signal y to the next section. In the no- and S. After observing the signal sigu = y and S the signaling case, the attacker’s strategy (π,B) is given by an attacker can perform a Bayesian update. In particular, given ordering π over passwords P and a threshold B. Intuitively, any password pw ∈ P with strength i = getStrength(pw) we this means that the attacker will check the first B guesses in have π and then give up. The expected reward for the attacker is Pr [pw | y] given by the simple formula v × λ(π,B), i.e., the probability Pr[pw]S[i,y] that the password is cracked times the value v. Similarly, the = ′ ′ expected guessing cost of the attacker is pw′∈P Pr [getSignal (getStrength(pw ))] · Pr [pw ] (1) S B P Pr[pw] [i,y] = ′ ′ ′ C(k,π,B)= k (1 − λ(π,i − 1)), (2) ′ Prpw′∼P [getStrength(pw )= i ] · S[i ,y] i ∈[a] i=1 X If theP attacker knew the original password distribution 6 P then s/he can update posterior distribution P with The energy cost of transferring 1GB of memory between RAM and cache y is approximately 0.3J on an [77], which translates to an energy cost of ≈ −8 Prx∼Py [x = pw] := Pr [pw | y]. We extend our notation, $3 × 10 per evaluation. Similarly, if we assume that our MHF can be 7 let λ(π,B; y) = B Pr [pwπ | y] where pwπ is the ith evaluated in 1 second [37], [78] then evaluating the hash function 6.3 × 10 i=1 i i times will tie up a 1GB RAM chip for 2 years. If it costs $5 to rent a 1GB password in the ordering π. Intuitively, λ(π,B; y) is the RAM chip for 2 years (equivalently purchase the RAM chip which lasts for P −8 2 years for $5) then the capital cost is ≈ $8 × 10 . Thus, our total cost 5The index of matrix elements start from 0 would be around $10−7 per password guess. Intuitively, (1 − λ(π,i − 1)) denotes the probability that the the attacker has to pay cost k to make the ith guess. We use adversary actually has to check the th password guess at s S ~ i Uadv v, k, { , (~π, B)} to denote the expected utility of the cost k. With probability λ(π,i − 1) the attacker will find the adversary with information signaling, password in the first i − 1 guesses and will not have to check the ith password guess pwπ. Specially, we define λ(π, 0)=0. s S ~ i Uadv v, k, { , (~π, B)} The adversary’s expected utility is the difference of expected   (6) gain and expected cost, namely, = Pr[Sig = y]Uadv(v,k,πy,By; S,y) , y∈[b] Uadv (v,k,π,B)= v · λ(π,B) − C(k,π,B). (3) X where Sometimes we omit parameters in the parenthesis and just write for short when the and are clear from P r[Sig = y]= Pr [getStrength(pw)= i] · S[i,y] . Uadv v, k B pw∼P context. iX∈[b] B. Optimal Attacker Strategy: No Signaling B. Optimal Attacker Strategy ∗ ~ ∗ A rational adversary would choose (π∗,B∗) ∈ Now we discuss how to find the optimal strategy (~π , B ). Since the attacker’s strategies in reponse to different sig- arg max Uadv (v,k,π,B). It is easy to verify that the ∗ ∗ optimal ordering π∗ is always to check passwords in nals are independent. It suffices to find (πy,By ) ∈ descending order of probability. The probability that a argmaxBy ,πy Uadv(v,k,πy,By; y) for each signal y. We first random user’s account is cracked is remark that the adversary can obtain the optimal checking ∗ sequence πy for pwu associated with signal y by sorting all ∗ ∗ Padv = λ(π ,B ). (4) pw ∈P in descending order of posterior probability according We remark that in practice usually to the posterior distribution Py. argmax Uadv (v,k,π,B) ∗ returns a singleton set ∗ ∗ . If instead the set contains Given the optimal guessing order πy, the adversary can (π ,B ) ∗ multiple strategies then we break ties adversarially i.e., determine the optimal budget By for signal y such that ∗ ∗ By = argmaxBy Uadv(v,k,πy,By; y). Each of the password ∗ ∗ Padv = max λ(π ,B ). distributions we analyze has a compact representation allowing (π∗,B∗)∈arg max U (v,k,π,B) adv us to apply techniques from [71] to further speed up the VI.INFORMATION SIGNALING AS A STACKELBERG GAME ∗ ∗ computation of the attacker’s optimal strategy πy and By — We model the interaction between the authentication server see discussion in the appendix. (leader) and the adversary (follower) as a two-stage Stack- We observe that an adversary who sets πy = π and By = elberg game. In a Stackelberg game, the leader moves first B for all y ∈ [b] is effectively ignoring the signal and is and then the follower may select its action after observing the equivalent to an adversary in the no signal case. Thus, action of the leader. s S ~ S In our setting the action of the defender is to commit max Uadv v, k, { , (~π, B)} ≥ max Uadv(v,k,π,B), ∀ , ~π,B~ π,B to a signaling matrix S as well as the implementation of   (7) getStrength() which maps passwords to strength levels. The implying that adversary’s expected utility will never de- attacker responds by selecting a cracking strategy (~π, B~ ) = crease by adapting its strategy according to the signal. {(π0,B0),..., (πb−1,Bb−1)}. Intuitively, this strategy means that whenever the attacker observes a signal y he will check C. Optimal Signaling Strategy the top By guesses according to the ordering πy. Once the function getStrength() is fixed we want to find the optimal signaling matrix S. We begin by introducing the A. Attacker Utility defender’s utility function. Intuitively, the defender wants to If the attacker checks the top By guesses according to minimize the total number of cracked passwords. the order πy then the attacker will crack the password with s S Let Padv (v, k, ) denote the expected adversary success probability λ(πy,By; y). Recall that λ(πy,By; y) denotes the rate with information signaling when playing with his/her probability of the first By passwords in πy according to the optimal strategy, then posterior distribution Py obtained by applying Bayes Law after observing a signal y. Extrapolating from no signal case, s S ∗ ∗ S the expected utility of adversary conditioned on observing the Padv (v, k, )= Pr[Sig = y]λ(πy ,By ; ,y), (8) y SL signal y is X∈ ∗ ∗ Uadv(v,k,πy,By; S,y) where (πy ,By ) is the optimal strategy of the adversary when receiving signal y, namely, By (5) = v · λ(π ,B ; y) − k · (1 − λ(π ,i − 1; y)) , ∗ ∗ S y y y (πy ,By ) = arg max Uadv(v,k,πy,By; ,y). i=1 πy,By X where By and πy are now both functions of the signal y. If argmaxπy ,By Uadv(v,k,πy,By; y) returns a set, we break Intuitively, (1 − λ(πy,i − 1; y)) denotes the probability that ties adversarially. s S The objective of the server is to minimize Padv (v, k, ), Therefore the expected utility is therefore we define −B Uadv(v,k,π,B)= R(v,k,B)−C(k,π,B)=(v−2k) 1 − 2  . s S ∗ ~ ∗ s S User v, k, { , (~π , B )} = −Padv (v, k, ) . (9) A profit-motivated adversary is interested in calculating ∗ Our focus of this paper is to find the optimal signaling B = argmax Uadv(v,k,π,B). With our sample distribution B strategy, namely, the signaling matrix S∗ such that S∗ = we have s S arg minS Padv (v, k, ). Finding the optimal signaling matrix ∗ 0, if v< 2k, S∗ is equivalent to solving the mixed strategy Subgame Perfect B = (∞ if v ≥ 2k. Equilibrium (SPE) of the Stackelberg game. At SPE no player has the incentive to derivate from his/her strategy. Namely, Since we assume that v =2k+ǫ> 2k the attackers optimal ∗ s ∗ ∗ ∗ s ∗ ∗ strategy is B = ∞ meaning that 100% of passwords will be User v,k, {S , (~π , B~ )} ≥ User v,k, {S, (~π , B~ )} , ∀S,      cracked.  s S∗ ∗ ~ ∗ s S∗ ~ ~ Uadv v,k, { , (~π , B )} ≥ Uadv v,k, { , (~π, B)} , ∀(~π, B). c) Signaling Strategy: Suppose that getStrength is define  (10) such that getStrength(pw1)=0 and getStrength(pwi)=1 Notice that a signaling matrix of dimension a × b can be for each i> 1. Intuitively, the strength level is 0 if and only if fully specified by a(b−1) variables since the elements in each we sampled the weakest password from the distribution. Now Ra(b−1) R row sum up to 1. Fixing v and k, we define f : → suppose that we select our signaling matrix S s S to be the map from to Padv (v, k, ). Then we can formulate 1/2 1/2 the optimization problem as S = , min f(s ,...s ,...,s ,s ) 0 1 S 0,0 0,(b−2) (a−1),0 (a−1),(b−2)   1 such that Pr[Sig =0 | pw = pw1]= = Pr[Sig =1 | pw = s.t. 0 ≤ si,j ≤ 1, ∀0 ≤ i ≤ a − 1, 0 ≤ j ≤ b − 2, 2 (11) pw ] and Pr[Sig =1 | pw 6= pw ]=1. b−2 1 1 d) Optimal Attacker Strategy with Information Signaling: si,j ≤ 1, ∀0 ≤ i ≤ a − 1. We now analyze the behavior of a rational attacker under sig- j=0 X naling when given this signal matrix and password distribution The feasible region is a a(b − 1)-dimensional probability assuming that the attacker’s value v =2k + ǫ is the same. simplex. Notice that in 2-D (a = b = 2), the second We first note that attacker observes the signal constraint would be equivalent to the first constraint. In our Sig = 0 we know for sure that the user selected the most common experiments we will treat f as a black box and use derivative- password as so as long as free optimization methods to find good signaling matrices S. Pr[pw = pw1 | Sig = 0] = 1 v ≥ k the attacker will crack the password. VII. THEORETICAL EXAMPLE Next we consider the case that the attacker observes the Having presented our Stackelberg Game model for informa- signal Sig =1. We have the following posterior probabilities tion signaling we now give an (admittedly contrived) example for each of the passwords in the distribution: of a password distribution where information signaling can dramatically reduce the percentage of cracked passwords. We 0.5 × 0.5 1 Pr[pw = pw1 | Sig =1]= = , assume that the attacker has value v =2k+ǫ for each cracked 0.75 3 1 × 2−i 4 × 2−i password where the cost of each password guess is k and ǫ> 0 Pr[pw = pwi,i> 1 | Sig =1]= = . is a small constant. 0.75 3 P a) Password Distribution: Suppose that = {pwi}i≥1 Now we compute the attacker’s expected costs conditioned −i and that each password pwi has probability 2 i.e., on Sig =1. −i Pr [pw = i]=2 . The weakest password pw1 would be B B pw∼P 1 4 2 4 selected with probability 1/2. C(k,π,B; S, 1) = k + i ∗ 2−i + kB − 2−i 3 3 3 3 b) Optimal Attacker Strategy without Signaling: By i=2 ! i=2 !! X X checking passwords in descending order of probability (the 7 23−B = k − , checking sequence is π) the adversary has an expected cost 3 3 B   The expected gain of the attacker is C(k,π,B)= k i × 2−i +2−B × B × k =2k(1 − 2−B). i=1 B X 1 4 22−B R(v,k,π,B; S, 1) = v + 2−i = v 1 − , The expression above is equivalent to equation (2), the attacker 3 3 3 i=2 ! succeeds in ith guess with probability 2−i at the cost of ik; X   the attacker fails after making B guess with probability 2−B Thus, the attacker’s utility is given by: at the cost of Bk. The the expected gain is Uadv(v,k,π,B; S, 1) = R(v,k,π,B; S, 1) − C(k,π,B; S, 1) B −i −B 22−B 7 23−B R(v,k,π,B)= v i2 = v 1 − 2 . = v 1 − − k − . i=1 3 3 3 X      1 Assuming that ǫ< 3 k the attacker will have negative utility differential privacy [6] and other privacy-preserving measures Uadv(2k + ǫ,k,π,B; S, 1) < 0 whenever B > 1. Thus, when [5]. All the other datasets come from server breaches. the signal is Sig =1 the optimal attacker strategy is to select a) Empirical Distribution: For all 9 datasets we ∗ B =0 (i.e., don’t attack) to ensure zero utility. In particular, can derive an empirical password distribution De where the attacker cracks the password if and only if Sig = 0 Prpw∼De [pwi] = fi/N. Here, N is the number of users in which happens with probability 1 − Pr[Sig = 1] = 0.25 the dataset and fi is the number of occurences of pwi in the since Pr[Sig = 1] = Pr[pw = pw1] Pr[Sig = 1 | pw = dataset. We remark that for datasets like Yahoo! and LinkedIn 3 pw1]+Pr[pw 6= pw1] Pr[Sig =1 | pw 6= pw1]= 4 . Thus, the where the datasets only include frequencies fi without the attacker will only crack 25% of passwords when v =2k + ǫ7. original plaintext password we can derive a distribution simply e) Discussion: In our example an attacker with value by generating unique strings for each password. The empirical v = 2k + ǫ cracks 100% of passwords when we don’t use distribution is useful to analyze the performance of informa- information signaling. However, if our information signaling tion signaling when the password value v is small this analysis mechanism (above) were deployed, the attacker will only crack will be less accurate for larger values of v i.e., once the rational 25% of passwords — a reduction of 75%! Given this (con- attacker has incentive to start cracking passwords with lower trived) example it is natural to ask whether or not information frequency. Following an approach taken in [71], we use Good- signaling produces similar results for more realistic password Turing frequency estimation to identify and highlight regions distributions. We explore this question in the next sections. of uncertainty where the CDF for the empirical distribution might significantly diverge from the real password distribution. VIII. EXPERIMENTAL DESIGN To simulate an attacker with imperfect knowledge of the We now describe our empirical experiments to evalu- distribution we train a differentially private Count-Mean-Min- ate the performance of information signaling. Fixing the Sketch. In turn, the Count-Sketch is used to derive a distri- parameters v,k,a,b, a password distribution D and the bution Dtrain, to implement getStrength() and to generate S∗ ∗ strength oracle getStrength(·) we define a procedure ← the signaling matrix S ← genSigMat(v, k, a, b, Dtrain) (see genSigMat(v, k, a, b, D) which uses derivate-free optimiza- details below). tion to solve the optimization problem defined in equation b) Monte Carlo Distribution: To derive the Monte Carlo (11) and find a good generate a signaling matrix S∗ of password distribution from a dataset we follow a process from S∗ dimension a × b. Similarly, given a signaling matrix we [71]. In particular, we subsample passwords Ds ⊆ D from S∗ define a procedure evaluate(v, k, a, b, , D) which returns the dataset and derive guessing numbers #guessingm(pw) the percentage of passwords that a rational adversary will for each pw ∈ Ds. Here, #guessingm(pw) denotes the crack given that the value of a cracked password is v, the number of guesses needed to crack pw with a state of the art cost of checking each password is k. To simulate settings password cracking model m e.g., Probabilistic Context-Free where the defender has imperfect knowledge of the pass- Grammars [22]–[24], Markov models [25]–[28], and neural word distribution we use different distributions D1 (train- networks [29]. We used the password guessing service [28] to ing) and D2 (evaluation) to generate the signaling matrix generate the guessing numbers for each dataset. We then fit S∗ ← genSigMat(v, k, a, b, D1) and evaluate the success our distribution to the guessing curve i.e., fixing thresholds S∗ rate of a rational attacker evaluate(v, k, a, b, , D2). We can t0 = 0 < t1 < t2 ... we assign any password pw with also set D1 = D2 to evaluate our mechanism under the ti−1 < minm{#guessingm(pw)} ≤ ti to have probability gi idealized setting in which defender has perfect knowledge of where gi counts the number of sampled pass- |Ds|(ti−ti−1) the distribution. words in Ds with guessing number between ti−1 and ti. In the remainder of this section we describe how the Intuitively, the Monte Carlo distribution Dm models password oracle getStrength() is implemented in different experiments, distribution from the attacker’s perspective. One drawback is the password distribution(s) derived from empirical password that the distribution would change if the attacker were to datasets and how we implement genSigMat(). develop an improved password cracking model. A. Password Distribution We extract Monte Carlo distribution from 6 datasets (Bfield, Brazzers, Clixsense, CSDN, Neopets, 000webhost) for which We evaluate the performance of our information signaling we have plain text passwords so that we can query Password mechanism using 9 password datasets: Bfield (0.54 million), Guessing Service [28] about password guessing numbers. In Brazzers (N =0.93 million), Clixsense (2.2 million), CSDN the imperfect knowledge setting we repeated the process above (6.4 million), LinkedIn (174 million), Neopets (68.3 million), twice for each dataset with disjoint sub-samples to derive two RockYou (32.6 million), 000webhost (153 million) and Yahoo! distributions Dtrain and Deval. (69.3 million). The Yahoo! frequency corpus (N ≈ 7 × 107) was collected and released with permission from Yahoo! using B. Differentially Private Count-Sketch When using the empirical distribution D for evaluation we 7If the attacker’s value was increased to v = 4k then the attacker would e crack 100% of passwords since we would have Uadv(4k,k,π,B; S, 1) = evaluate the performance of an imperfect knowledge defender 5 23−B ∗ who trains a differentially private Count-Mean-Min-Sketch.As k  3 − 3  which is maximized at B = ∞. In this case the defender would want to look for a different password signaling matrix. users register their accounts, the server can feed passwords into a Count-Mean-Min-Sketch initialized with Laplace noise In each experiment we use BITEOPT with 104 iterations ∗ to ensure differential privacy. to generate signaling matrix S for each different v/Cmax When working with empirical distributions in an imperfect ratio, where Cmax is server’s maximum authentication cost ∗ knowledge setting we split the original dataset D in half to satisfying k ≤ Cmax. We refer to the procedure as S ← obtain D1 and D2. Our noise-initialized Count-Mean-Min- genSigMat(v, k, a, b, D1) . Sketch is trained with D1. We fix the width dw (resp. depth ds) 8 MPIRICAL NALYSIS of our count sketch to be dw = 10 (resp. ds = 10) and add IX. E A Laplace Noise with scaling factor b = ds/ǫpri =5 to preserve We describe the results of our experiments. In the first batch ǫpri = 2-differential privacy. Since we were not optimizing of experiments we evaluate the performance of information for space we set the number of columns dw to be large to signaling against an offline and an online attacker where the minimize the probability of hash collisions and increase the ratio v/Cmax is typically much smaller. accuracy of frequency estimation. Each cell is encoded by an 4- int type so the total size of the sketch is 4 GB. A. Password Signaling against Offine Attacks We then use this count sketch along with D2 to extract a We consider four scenarios using the empirical/Monte noisy distribution Dtrain. In particular, for every pw ∈ D2 we Carlo distribution in a setting where the defender has per- query the the count sketch to get f˜pw, a noisy estimate of the . f˜pw fect/imperfect knowledge of the distribution. frequency of pw in D2 and set PrDtrain [pw] = . P f˜w w∈D2 1) Empirical Distribution: From each password dataset we We also use the Count-Mean-Min Sketch as a frequency derived an empirical distribution De and set Deval = De. In oracle in our implementation of getStrength ) (see details ( the perfect knowledge setting we also set Dtrain = De while below). We then use Dtrain to derive frequency thresholds in the imperfect knowledge setting we used a Count-Min-Mean for getStrength ) and to generate the signaling matrix S∗ ( = Sketch to derive Dtrain (see details in the previous section). genSigMat(v, k, a, b, Dtrain). Finally we evaluate results on We fix dimension of signaling matrix to be 11 by 3 (the the original empirical distribution De for the original dataset server issues 3 signals for 11 password strength levels) and s S∗ D i.e., Padv = evaluate(v, k, a, b, , De). compute attacker’s success rate for different value-to-cost j C. Implementing getStrength() ratios v/Cmax ∈ {i × 10 : 1 ≤ i ≤ 9, 3 ≤ j ≤ 7}∪{(i + 0.5) × 10j : 1 ≤ i ≤ 9, 6 ≤ j ≤ 7} . Given a distribution D and a frequency oracle O which In particular, for each value-to-cost ratio v/Cmax we run outputs f(pw) in the perfect knowledge setting and an ∗ S ← genSigMat(v, k, a, b, De) to generate a signaling matrix estimate of frequency fˆ(pw) in the imperfect knowledge ∗ and then run evaluate(v, k, a, b, S , De) to get the attacker’s setting, we can specify getStrength() by selecting thresh- success rate. The same experiment is repeated for all 9 olds x1 > ... > xa−1 > xa = 1. In particular, if password datasets. We plot the attacker’s success rate vs. xi+1 ≤ O(pw) < xi then getStrength(pw) = i and . v/Cmax in Fig. 1. Due to space limitations Fig. 1 only shows if O(pw) ≥ x1 then getStrength(pw) = 0. Let Yi = results for 6 datasets — additional plots can be found in Fig Prpw∼D[xi ≤ getStrength(pw) < xi−1] for i > 1 and 5 in the Appendix. getStrength Y1 = Prpw∼D[ (pw) > x1]. We fix the thresholds We follow the approach of [71], highlighting the uncertain x1 ≥ ... ≥ xa−1 to (approximately) balance the probability regions of the plot where the cumulative density function of the mass of each strength level i.e., to ensure that Yi ≈ Yj . In im- empirical distribution might diverge from the real distribution. perfect (resp. perfect) knowledge settings we use D = Dtrain In particular, the red (resp. yellow) region indicates E > 0.1 (resp. ) to select the thresholds. D = Deval (resp. E > 0.01) where E can be interpreted as an upper bound on the difference between the two CDFs. D. Derivative-Free Optimization Fig. 1 demonstrates that information signaling reduces the Given a value v and hash cost k we want to find a fraction of cracked passwords. The mechanism performs best signaling matrix which optimizes the defenders utility. Recall when the defender has perfect knowledge of the distribution that this is equivalent to minimizing the function f(S) = (blue curve), but even with imperfect knowledge there is evaluate(v, k, a, b, S, D) subject to the constraints that S is still a large advantage. For example, for the neopets dataset 6 a valid signaling matrix. In our experiments we will treat f when v/Cmax = 5 × 10 the percentage of cracked pass- as a black box and use derivative-free optimization methods words is reduced from 44.6% to 36.9% (resp. 39.1%) when to find good signaling matrices S∗. the defender has perfect (resp. imperfect) knowledge of the In our experiment, we choose BITmask Evolution OPTi- password distribution. Similar results hold for other datasets. mization (BITEOPT) algorithm [79] to compute the quasi- The green curve (signaling with imperfect knowledge) curve optimal signaling matrix S∗. BITEOPT is a free open-source generally lies in between the black curve (no signaling) and the stochastic non-linear bound-constrained derivative-free opti- blue curve (signaling with perfect knowledge), but sometimes mization method (heuristic or strategy). BiteOpt took 2nd has an adverse affect affect when v/Cmax is large. This is place (1st by sum of ranks) in BBComp2018-1OBJ-expensive because the noisy distribution will be less accurate for stronger competition track [80]. passwords that were sampled only once. no signal no signal no signal 1 perfect knowledge signaling 1 perfect knowledge signaling 1 perfect knowledge signaling imperfect knowledge signaling imperfect knowledge signaling imperfect knowledge signaling improvement: black- blue improvement: black- blue improvement: black- blue ∗ ∗ ∗ 0.8 S = S (105) 0.8 S = S (105 ) 0.8 S = S (105) ∗ ∗ ∗ S = S (106) S = S (106 ) S = S (106)

0.6 0.6 0.6

0.4 uncertain region 0.4 uncertain region 0.4 uncertain region

0.2 0.2 0.2 Fraction of Cracked Passwords Fraction of Cracked Passwords Fraction of Cracked Passwords

0 0 0 103 104 105 106 107 108 103 104 105 106 107 108 103 104 105 106 107 108

v/Cmax v/Cmax v/Cmax (a) Bfield (b) Brazzers (c) Clixsense

no signal no signal no signal 1 perfect knowledge signaling 1 perfect knowledge signaling 1 perfect knowledge signaling imperfect knowledge signaling imperfect knowledge signaling imperfect knowledge signaling improvement: black- blue improvement: black- blue improvement: black- blue ∗ ∗ ∗ 0.8 S = S (105) 0.8 S = S (105 ) 0.8 S = S (105) ∗ ∗ ∗ S = S (106) S = S (106 ) S = S (106)

0.6 0.6 0.6

0.4 uncertain region 0.4 uncertain region 0.4 uncertain region

0.2 0.2 0.2 Fraction of Cracked Passwords Fraction of Cracked Passwords Fraction of Cracked Passwords

0 0 0 103 104 105 106 107 108 103 104 105 106 107 108 103 104 105 106 107 108

v/Cmax v/Cmax v/Cmax (d) CSDN (e) Linkedin (f) Neopets

Fig. 1. Adversary Success Rate vs v/Cmax for Empirical Distributions the red (resp. yellow) shaded areas denote unconfident regions where the the empirical distribution might diverges from the real distribution E ≥ 0.1 (resp. E ≥ 0.01).

a) Which accounts are cracked?: As Fig 1 demonstrates b) Robustness: We also evaluated the robustness of the information signaling can substantially reduce the overall signaling matrix when the defender’s estimate of the ratio fraction of cracked passwords i.e., many previously cracked v/Cmax is inaccurate. In particular, for each dataset we passwords are now protected. It is natural to ask whether generated the signaling matrix S(105) (resp. S(106)) which 5 there are any unlucky users u whose password is cracked was optimized with respect to the ratio v/Cmax = 10 6 after information signaling even though their account was safe (resp. v/Cmax = 10 ) and evaluated the performance of both before signaling. Let Xu (resp. Lu) denote the event that signaling matrices against an attacker with different v/Cmax user u is unlucky (resp. lucky) i.e., a rational attacker would ratios. We find that password signaling is tolerant even if our originally not crack pwu, but after information signaling the estimate of v/k is off by a small multiplicative constant factor 6 account is cracked. We measure E[Xu] and E[Lu] (See Fig. e.g., 2. For example, in Fig. 1f the signaling matrix S(10 ) 2) for various v/Cmax values under each dataset. Generally, outperforms the no-signaling case even when the real v/Cmax 6 we find that the fraction of unlucky users E[Xu] is small in ratio is as large as 2×10 . In the “downhill” direction, even if most cases e.g. ≤ 0.04. For example, when v/k = 2 × 107 the estimation of v/k deviates from its true value up to 5×105 6 we have that E[Xu] ≈ 0.03% and E[Lu] ≈ 6% for LinkedIn. at anchor point 10 it is still advantageous for the server to In all instances the net advantage E[Lu] − E[Xu] remains deploy password signaling. positive. We remark that the reduction in cracked passwords does not come from persuading the attacker to crack weak 2) Monte Carlo Distribution: We use the Monte Carlo passwords though the attacker might shift his attention. The distribution to evaluate information signaling when v/Cmax shift of attacker’s attention is directionless, not necessarily is large. In particular, we subsample 25k passwords from towards weaker passwords. The contribution of attention shift each datast for which we have plain text passwords (ex- to reduction in cracked passwords is very small since the cluding Yahoo! and LinkedIn) and obtain guessing numbers passwords ordering of posterior distribution upon receiving from the Password Guessing Service. Then we split our 25k a signal is very close to that of prior distribution, which subsamples in half to obtain two guessing curves and we means the attacker cracks passwords (almost) in the same extract two Monte Carlo distributions Dtrain and Deval from order whether given the signal or not. Strength signaling works these curves (see details in the last section). In the perfect mainly because the attacker would like to save cost by making knowledge setting the signaling matrix is both optimized S∗ less futile guesses. and tested on Deval i.e., = genSigMat(v, k, a, b, Deval), s S∗ Padv = evaluate(v, k, a, b, , Deval). In the imperfect knowl- edge setting the signaling matrix is tuned on Dtrain while the ·10−2 ·10−2 ·10−2 6 bfield 5 csdn rockyou brazzers linkedin 000webhost 1.5 clixsense neopets yahoo 4 4 1 3

2 2 0.5 1 Proportion of Unlucky Users Proportion of Unlucky Users Proportion of Unlucky Users

0 0 0 103 104 105 106 107 103 104 105 106 107 108 103 104 105 106 107 108

v/Cmax v/Cmax v/Cmax (a) bfield, brazzers, clixsense (b) csdn, LinkedIn, neopets (c) RockYou, 000webhost, Yahoo!

Fig. 2. Proportion of Unlucky Users for Various Datasets (E [Xu]) attacker’s success rate is evaluated on Deval. One advantage Since an online attacker will be primarily focused on the of simulating Monte Carlo distribution is that it allows us to most common passwords (e.g., top 103 to 104) we modify evaluate the performance of information signaling against state getStrength() accordingly. We consider two modifications of 3 of the art password cracking models when the v/Cmax is large. getStrength() which split passwords in the top 10 (resp. j 4 We consider v/Cmax ∈ {i ∗ 10 : 1 ≤ i ≤ 9, 5 ≤ j ≤ 10} 10 ) passwords into 11 strength levels. By contrast, our prior in performance evaluation for Monte Carlo distribution. As implementation of getStrength() would have placed most of before we set a = 11 and b = 3 so that the signaling matrix the top 103 passwords in the bottom two strength levels. As is in dimension of 11 × 3. We present our results in Fig. 3. before we fix the signaling matrix dimension to be 11 × 3. Fig. 3 shows that information signaling can significantly Our results are shown in Fig. 4. Due to space limitations the reduce the number of cracked passwords. In particular, for results for 6 datasets are in Fig. 6 in the appendix. 7 the neopets dataset when v/Cmax = 6 × 10 the num- Our results demonstrate that information signaling can be an ber of cracked passwords is reduced from 52.2% to 40% effective defense against online attackers as well. For example, 4 (resp. 43.8%) when the defender has perfect (resp. imperfect) in Fig. 4b, when v/Cmax =9 × 10 , our mechanism reduces knowledge of the distribution. The green curve (signaling the fraction of cracked passwords from 20.4% to just 15.3%. with imperfect knowledge) generally lies between the black Similar, observations hold true for other datasets. curve (no signaling) and the blue curve (signaling with perfect We observe that the red curve (partitioning the top 103 information) though we occasionally find points where the passwords into 11 strength levels) performs better than the green curve lies slightly above the black curve. blue curve (partitioning the top 103 passwords into 11 strength 4 B. Password Signaling against Online Attacks levels) when v/k is small e.g., v/Cmax < 2 × 10 in Fig. 4b). The blue curve performs better when v/Cmax is larger. We can extend the experiment from password signaling Intuitively, this is because we want to have a fine-grained with perfect knowledge to an online attack scenario. One partition for the weaker (top 103) passwords that the adversary common way to throttle online attackers is to require the might target when v/Cmax is small. attacker to solve a CAPTCHA challenge [81], or provide a) Implementing Password Signaling: One naive way to some other proof of work (PoW), after each incorrect login implement password signaling in an online would simply be attempt [82]. One advantage of this approach is that a mali- to explicitly send back the signal noisy signal sigu in response cious attacker cannot lockout an honest user by repeatedly to any incorrect login attempt. As an alternative we propose a submitting incorrect passwords [83]. However, the solution solution where users with a weaker signal sigu are throttled also allows an attacker to continue trying to crack the password more aggressively. For example, if sigu indicates that the as long as s/he is willing to continue paying the cost to solve password is strong then it might be reasonable to allow for the CAPTCHA/PoW challenges. Thus, information signaling 10 consecutive incorrect login attempts before throttling the could be a useful tool to mitigate the risk of online attacks. account by requiring the user to solve a CAPTCHA challenge When modeling a rational online password we will assume before every login attempt. On the other hand if the signal that v/C ≤ 105 since the cost to pay a human to solve max sigu indicates that the password is weak the server might begin −3 2 a CAPTCHA challenge (e.g., $10 to 10 [84]) is typically throttling after just 3 incorrect login attempts. The attacker can much larger than the cost to evaluate a memory-hard cryp- indirectly infer the signal sigu by measuring how many login −7 5 tographic hash function (e.g., $10 ). Since v/Cmax ≤ 10 attempts s/he gets before throttling begins. This solution might we use the empirical distribution to evaluate the performance also provide motivation for users to pick stronger passwords. of information signaling against an online attacker. In the previous subsection we found that the uncertain regions of C. Discussion 5 the curve started when v/Cmax ≫ 10 so the empirical While our experimental results are positive, we stress that distribution is guaranteed to closely match the real one. there are several questions that would need to be addressed 0.8 no signal no signal 0.8 no signal perfect knowledge signaling perfect knowledge signaling perfect knowledge signaling imperfect knowledge signaling 0.8 imperfect knowledge signaling imperfect knowledge signaling improvement: black- blue improvement: black- blue improvement: black- blue 0.6 0.6 0.6

0.4 0.4 0.4

0.2 0.2 0.2 Fraction of Cracked Passwords Fraction of Cracked Passwords Fraction of Cracked Passwords

0 0 0 105 106 107 108 109 1010 1011 105 106 107 108 109 1010 1011 105 106 107 108 109 1010 1011

v/Cmax v/Cmax v/Cmax (a) Bfield (b) Brazzers (c) Clixsense

no signal no signal no signal 0.6 perfect knowledge signaling 0.8 perfect knowledge signaling perfect knowledge signaling imperfect knowledge signaling imperfect knowledge signaling 0.4 imperfect knowledge signaling improvement: black- blue improvement: black- blue improvement: black- blue

0.6 0.4 0.3

0.4 0.2 0.2 0.2 0.1 Fraction of Cracked Passwords Fraction of Cracked Passwords Fraction of Cracked Passwords

0 0 0 105 106 107 108 109 1010 1011 105 106 107 108 109 1010 1011 105 106 107 108 109 1010 1011

v/Cmax v/Cmax v/Cmax (d) CSDN (e) Neopets (f) 000webhost Fig. 3. Adversary Success Rate vs v/k for Monte Carlo Distributions

0.15 no signal no signal no signal partition top 1k 0.2 partition top 1k 0.1 partition top 1k partition top 10k partition top 10k partition top 10k improvement: black- blue improvement: black- blue improvement: black- blue 8 · 10−2 0.1 0.15 6 · 10−2 0.1 4 · 10−2 5 · 10−2 5 · 10−2 2 · 10−2 Fraction of Cracked Passwords Fraction of Cracked Passwords Fraction of Cracked Passwords

0 0 0 102 103 104 105 102 103 104 105 102 103 104 105

v/Cmax v/Cmax v/Cmax (a) Bfield (b) Brazzers (c) Clixsense

Fig. 4. Adversary Success Rate vs v/Cmax in Defense of Online Attacks before we recommend deploying information signaling to u would need to make this decision without observing protect against offline attacks. their signal. Otherwise the decision to opt in/out might be strongly correlated with the signal allowing the attacker • Can we accurately predict the value to cost ratio v/Cmax? to perform another Bayesian update. Another possible Our results suggest that information signaling is use- way to address these concerns would be to modify the ful even when our estimates deviate by a factor of 2. objective function (eq 11) to penalize solutions with However, if our estimates are wildly off then information unlucky users. signaling could be harmful. • Can we analyze the behavior of rational targeted attack- • While information signaling reduced the total number ers? We only consider an untargeted attacker. In some of cracked passwords a few unlucky users might be settings, an attacker might place a higher value on some harmed i.e., instead of being deterred the unlucky signal passwords e.g., celebrity accounts. Can we predict how a helps the rational attacker to crack a password that they targeted attacker would behave if the value vu varied from would not otherwise have cracked. The usage of password user to user? Similarly, a targeted adversary could exploit signaling raises important ethical and societal questions. demographic and/or biographical knowledge to improve How would users react to such a solution knowing that password guessing attacks e.g., see [85]. they could be one of the unlucky users? One possible way to address these concerns would be to allow user’s to opt in/out of information signaling. However, each user X. CONCLUSIONS [15] B. Ur, P. G. Kelley, S. Komanduri, J. Lee, M. Maass, M. Mazurek, T. Passaro, R. Shay, T. Vidas, L. Bauer, N. Christin, and L. F. Cranor, We introduced password strength signaling as a novel, yet “How does your password measure up? the effect of strength meters counter-intuitive defense against rational password attackers. on password creation,” in Proceedings of USENIX Security Symposium, We use Stackelberg game to model the interaction between 2012. [16] X. Carnavalet and M. Mannan, “From very weak to very strong: Ana- the defender and attacker, and present an algorithm for the lyzing password-strength meters,” in NDSS 2014, The Society, server to optimize its signaling matrix. We ran experiments to Feb. 2014. empirically evaluate the effectiveness of information signaling [17] M. Steves, D. Chisnell, A. Sasse, K. Krol, M. Theofanos, and H. Wald, “Report: Authentication diary study,” Tech. Rep. NISTIR 7983, National on 9 password datasets. When testing on the empirical (resp. Institute of Standards and Technology (NIST), 2014. Monte Carlo) password distribution distribution we find that [18] D. Florˆencio, C. Herley, and P. C. Van Oorschot, “An administrator’s information signaling reduces the number of passwords that guide to Internet password research,” in Proceedings of the 28th USENIX Conference on Large Installation System Administration, LISA’14, would have been cracked by up to 8% (resp. 12%). Addition- pp. 35–52, 2014. ally, we find that information signaling can help to dissuade [19] A. Adams and M. A. Sasse, “Users are not the enemy,” Communications an online attacker by saving 5% of all user accounts. We view of the ACM, vol. 42, no. 12, pp. 40–46, 1999. [20] J. Blocki, S. Komanduri, A. Procaccia, and O. Sheffet, “Optimizing our positive experimental results as a proof of concept which password composition policies,” in Proceedings of the fourteenth ACM motivates further exploration of password strength signaling. conference on Electronic commerce, pp. 105–122, ACM, 2013. [21] R. Morris and K. Thompson, “Password security: A case history,” ACKNOWLEDGEMENT Communications of the ACM, vol. 22, no. 11, pp. 594–597, 1979. [22] M. Weir, S. Aggarwal, B. de Medeiros, and B. Glodek, “Password This work was supported by NSF grant number 1755708 cracking using probabilistic context-free grammars,” in 2009 IEEE and Rolls-Royce through a Doctoral Fellowship. Symposium on Security and Privacy, pp. 391–405, IEEE Computer Society Press, May 2009. [23] P. G. Kelley, S. Komanduri, M. L. Mazurek, R. Shay, T. Vidas, L. Bauer, REFERENCES N. Christin, L. F. Cranor, and J. Lopez, “Guess again (and again and again): Measuring password strength by simulating password-cracking [1] J. Blocki and A. Datta, “CASH: A cost asymmetric secure hash algorithms,” in 2012 IEEE Symposium on Security and Privacy, pp. 523– algorithm for optimal password protection,” in IEEE 29th Computer 537, IEEE Computer Society Press, May 2012. Security Foundations Symposium, pp. 371–386, 2016. [24] R. Veras, C. Collins, and J. Thorpe, “On semantic patterns of passwords [2] J. Blocki, B. Harsha, and S. Zhou, “On the economics of offline and their security impact,” in NDSS 2014, The Internet Society, Feb. password cracking,” in 2018 IEEE Symposium on Security and Privacy, 2014. pp. 853–871, IEEE Computer Society Press, May 2018. [25] C. Castelluccia, M. D¨urmuth, and D. Perito, “Adaptive password- American Eco- [3] E. Kamenica and M. Gentzkow, “Bayesian persuasion,” strength meters from Markov models,” in NDSS 2012, The Internet nomic Review, vol. 101, pp. 2590–2615, October 2011. Society, Feb. 2012. [4] H. Xu and R. Freeman, “Signaling in bayesian stackelberg games,” in [26] C. Castelluccia, A. Chaabane, M. D¨urmuth, and D. Perito, “When Proceedings of the 15th International Conference on Autonomous Agents privacy meets security: Leveraging personal information for password and Multiagent Systems, 2016. cracking,” arXiv preprint arXiv:1304.6584, 2013. [5] J. Bonneau, “The science of guessing: Analyzing an anonymized corpus [27] J. Ma, W. Yang, M. Luo, and N. Li, “A study of probabilistic password of 70 million passwords,” in 2012 IEEE Symposium on Security and models,” in 2014 IEEE Symposium on Security and Privacy, pp. 689– Privacy, pp. 538–552, IEEE Computer Society Press, May 2012. 704, IEEE Computer Society Press, May 2014. [6] J. Blocki, A. Datta, and J. Bonneau, “Differentially private password [28] B. Ur, S. M. Segreti, L. Bauer, N. Christin, L. F. Cranor, S. Komanduri, frequency lists,” in NDSS 2016, The Internet Society, Feb. 2016. D. Kurilova, M. L. Mazurek, W. Melicher, and R. Shay, “Measuring [7] J. Blocki and B. Harsha, “Linkedin password frequency corpus,” 2019. real-world accuracies and biases in modeling password guessability,” [8] J. Campbell, W. Ma, and D. Kleeman, “Impact of restrictive composition in USENIX Security 2015 (J. Jung and T. Holz, eds.), pp. 463–481, policy on user password choices,” Behaviour & Information Technology, USENIX Association, Aug. 2015. vol. 30, no. 3, pp. 379–388, 2011. [29] W. Melicher, B. Ur, S. M. Segreti, S. Komanduri, L. Bauer, N. Christin, [9] S. Komanduri, R. Shay, P. G. Kelley, M. L. Mazurek, L. Bauer, and L. F. Cranor, “Fast, lean, and accurate: Modeling password guess- N. Christin, L. F. Cranor, and S. Egelman, “Of passwords and peo- ability using neural networks,” in USENIX Security 2016 (T. Holz and ple: measuring the effect of password-composition policies,” in CHI, S. Savage, eds.), pp. 175–191, USENIX Association, Aug. 2016. pp. 2595–2604, 2011. [10] R. Shay, S. Komanduri, P. G. Kelley, P. G. Leon, M. L. Mazurek, [30] E. Liu, A. Nakanishi, M. Golla, D. Cash, and B. Ur, “Reasoning ana- L. Bauer, N. Christin, and L. F. Cranor, “Encountering stronger password lytically about password-cracking software,” in 2019 IEEE Symposium requirements: user attitudes and behaviors,” in Proceedings of the Sixth on Security and Privacy (SP), pp. 380–397, IEEE, 2019. Symposium on Usable Privacy and Security, SOUPS ’10, (New York, [31] “Hashcast: advanced password recovery.” NY, USA), pp. 2:1–2:20, ACM, 2010. [32] S. Designer, “John the ripper password cracker,” 2006. [11] J. M. Stanton, K. R. Stam, P. Mastrangelo, and J. Jolton, “Analysis of [33] N. Provos and D. Mazieres, “Bcrypt algorithm,” USENIX, 1999. end user security behaviors,” Comput. Secur., vol. 24, pp. 124–133, Mar. [34] B. Kaliski, “Pkcs# 5: Password-based cryptography specification version 2005. 2.0,” 2000. [12] P. G. Inglesant and M. A. Sasse, “The true cost of unusable password [35] C. Percival, “Stronger key derivation via sequential memory-hard func- policies: Password use in the wild,” in Proceedings of the SIGCHI tions,” in BSDCan 2009, 2009. Conference on Human Factors in Computing Systems, CHI ’10, (New [36] D. Boneh, H. Corrigan-Gibbs, and S. E. Schechter, “Balloon hashing: A York, NY, USA), pp. 383–392, ACM, 2010. memory-hard function providing provable protection against sequential [13] R. Shay, S. Komanduri, A. L. Durity, P. S. Huh, M. L. Mazurek, attacks,” in ASIACRYPT 2016, Part I (J. H. Cheon and T. Takagi, eds.), S. M. Segreti, B. Ur, L. Bauer, N. Christin, and L. F. Cranor, “Can vol. 10031 of LNCS, pp. 220–248, Springer, Heidelberg, Dec. 2016. long passwords be secure and usable?,” in Proceedings of the SIGCHI [37] A. Biryukov, D. Dinu, and D. Khovratovich, “Argon2: new generation Conference on Human Factors in Computing Systems, CHI ’14, (New of memory-hard functions for password hashing and other applications,” York, NY, USA), pp. 2927–2936, ACM, 2014. in Security and Privacy (EuroS&P), 2016 IEEE European Symposium [14] S. Komanduri, R. Shay, L. F. Cranor, C. Herley, and S. Schechter, on, pp. 292–302, IEEE, 2016. “Telepathwords: Preventing weak passwords by reading users’ minds,” [38] “Password hashing competition.” https://password-hashing.net/. in 23rd USENIX Security Symposium (USENIX Security 14), (San Diego, [39] J. Alwen, B. Chen, K. Pietrzak, L. Reyzin, and S. Tessaro, “Scrypt is CA), pp. 591–606, USENIX Association, Aug. 2014. maximally memory-hard,” in EUROCRYPT 2017, Part III (J.-S. Coron and J. B. Nielsen, eds.), vol. 10212 of LNCS, pp. 33–62, Springer, [63] P. S. Aleksic and A. K. Katsaggelos, “Audio-visual biometrics,” Pro- Heidelberg, Apr. / May 2017. ceedings of the IEEE, vol. 94, no. 11, pp. 2025–2044, 2006. [40] J. Alwen and J. Blocki, “Efficiently computing data-independent [64] J. Bonneau, C. Herley, P. C. van Oorschot, and F. Stajano, “The quest memory-hard functions,” in CRYPTO 2016, Part II (M. Robshaw and to replace passwords: A framework for comparative evaluation of web J. Katz, eds.), vol. 9815 of LNCS, pp. 241–271, Springer, Heidelberg, authentication schemes,” in 2012 IEEE Symposium on Security and Aug. 2016. Privacy, pp. 553–567, IEEE Computer Society Press, May 2012. [41] J. Alwen, J. Blocki, and K. Pietrzak, “Depth-robust graphs and their [65] C. Herley and P. Van Oorschot, “A research agenda acknowledging the cumulative memory complexity,” in EUROCRYPT 2017, Part III (J.-S. persistence of passwords,” IEEE Security & Privacy, vol. 10, no. 1, Coron and J. B. Nielsen, eds.), vol. 10212 of LNCS, pp. 3–32, Springer, pp. 28–36, 2011. Heidelberg, Apr. / May 2017. [66] R. Alonso and O. Cˆamara, “Bayesian persuasion with heterogeneous [42] J. Blocki, B. Harsha, S. Kang, S. Lee, L. Xing, and S. Zhou, “Data- priors,” Journal of Economic Theory, vol. 165, pp. 672–706, 2016. independent memory hard functions: New attacks and stronger construc- [67] S. Dughmi and H. Xu, “Algorithmic bayesian persuasion,” SIAM Journal tions,” in Annual International Cryptology Conference, pp. 573–607, on Computing, vol. 0, no. 0, pp. STOC16–68–STOC16–97, 0. Springer, 2019. [68] M. Hoefer, P. Manurangsi, and A. Psomas, “Algorithmic persuasion with [43] B. Harsha and J. Blocki, “Just in time hashing,” in 2018 IEEE European evidence,” 2020. Symposium on Security and Privacy (EuroS&P), pp. 368–383, IEEE, [69] T. E. Carroll and D. Grosu, “A game theoretic investigation of deception 2018. in network security,” in 2009 Proceedings of 18th International Confer- [44] A. Everspaugh, R. Chatterjee, S. Scott, A. Juels, and T. Ristenpart, “The ence on Computer Communications and Networks, pp. 1–6, 2009. pythia PRF service,” in USENIX Security 2015 (J. Jung and T. Holz, [70] Z. Rabinovich, A. X. Jiang, M. Jain, and H. Xu, “Information disclosure eds.), pp. 547–562, USENIX Association, Aug. 2015. as a means to security,” in Proceedings of the 2015 International Con- [45] J. Camenisch, A. Lysyanskaya, and G. Neven, “Practical yet universally ference on Autonomous Agents and Multiagent Systems, AAMAS ’15, composable two-server password-authenticated secret sharing,” in ACM (Richland, SC), p. 645–653, International Foundation for Autonomous CCS 2012 (T. Yu, G. Danezis, and V. D. Gligor, eds.), pp. 525–536, Agents and Multiagent Systems, 2015. ACM Press, Oct. 2012. [71] W. Bai and J. Blocki, “Dahash: Distribution aware tuning of password [46] R. W. F. Lai, C. Egger, D. Schr¨oder, and S. S. M. Chow, “Phoenix: hashing costs,” in Financial Cryptography and Data Security, Springer Rebirth of a cryptographic password-hardening service,” in USENIX International Publishing, 2021. Security 2017 (E. Kirda and T. Ristenpart, eds.), pp. 899–916, USENIX [72] S. Schechter, C. Herley, and M. Mitzenmacher, “Popularity is every- Association, Aug. 2017. thing: A new approach to protecting passwords from statistical-guessing [47] J. G. Brainard, A. Juels, B. Kaliski, and M. Szydlo, “A new two-server attacks,” in Proceedings of the 5th USENIX conference on Hot topics in approach for authentication with short secrets,” in USENIX Security security, pp. 1–8, USENIX Association, 2010. 2003, USENIX Association, Aug. 2003. [73] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise [48] A. Juels and R. L. Rivest, “Honeywords: making password-cracking to sensitivity in private data analysis,” in TCC 2006 (S. Halevi and ACM CCS 2013 detectable,” in (A.-R. Sadeghi, V. D. Gligor, and T. Rabin, eds.), vol. 3876 of LNCS, pp. 265–284, Springer, Heidelberg, M. Yung, eds.), pp. 145–160, ACM Press, Nov. 2013. Mar. 2006. [49] R. Canetti, S. Halevi, and M. Steiner, “Mitigating dictionary attacks on [74] G. Cormode, C. Procopiuc, D. Srivastava, and T. T. Tran, “Differen- password-protected local storage,” in CRYPTO 2006 (C. Dwork, ed.), tially private summaries for sparse data,” in Proceedings of the 15th vol. 4117 of LNCS, pp. 160–179, Springer, Heidelberg, Aug. 2006. International Conference on Database Theory, pp. 299–311, 2012. [50] J. Blocki, M. Blum, and A. Datta, “Gotcha password hackers!,” in [75] M. Fossi, E. Johnson, D. Turner, T. Mack, J. Blackbird, D. McKinney, Proceedings of the 2013 ACM workshop on Artificial intelligence and M. K. Low, T. Adams, M. P. Laucht, and J. Gough, “Symantec report security, pp. 25–34, ACM, 2013. on the underground economy,” November 2008. Retrieved 1/8/2013. [51] J. Blocki and H.-S. Zhou, “Designing proof of human-work puzzles for cryptocurrency and beyond,” in TCC 2016-B, Part II (M. Hirt and A. D. [76] M. Stockley, “What your hacked account is worth on the dark web,” Smith, eds.), vol. 9986 of LNCS, pp. 517–546, Springer, Heidelberg, Aug 2016. Oct. / Nov. 2016. [77] L. Ren and S. Devadas, “Bandwidth hard functions for ASIC resistance,” [52] Y. Tian, C. Herley, and S. Schechter, “Stopguessing: Using guessed in TCC 2017, Part I (Y. Kalai and L. Reyzin, eds.), vol. 10677 of LNCS, passwords to thwart online guessing,” in Proc. IEEE European Symp. pp. 466–492, Springer, Heidelberg, Nov. 2017. Security and Privacy (EuroS&P 2019), pp. 17–19, 2019. [78] J. Blocki, B. Harsha, S. Kang, S. Lee, L. Xing, and S. Zhou, [53] J. Blocki and W. Zhang, “Dalock: Distribution aware password throt- “Data-independent memory hard functions: New attacks and stronger tling,” arXiv preprint arXiv:2005.09039, 2020. constructions.” Cryptology ePrint Archive, Report 2018/944, 2018. [54] D. A. F. Florˆencio and C. Herley, “One-time password access to any https://eprint.iacr.org/2018/944. server without changing the server,” in ISC 2008 (T.-C. Wu, C.-L. Lei, [79] A. Vaneev, “BITEOPT - Derivative-free optimization method.” Available V. Rijmen, and D.-T. Lee, eds.), vol. 5222 of LNCS, pp. 401–420, at https://github.com/avaneev/biteopt, 2021. C++ source code, with Springer, Heidelberg, Sept. 2008. description and examples. [55] A. Pashalidis and C. J. Mitchell, “Impostor: A single sign-on system [80] “Black box optimization competition.” for use from untrusted devices,” in IEEE Global Telecommunications https://www.ini.rub.de/PEOPLE/glasmtbl/projects/bbcomp/index.html. Conference, 2004. GLOBECOM’04., vol. 4, pp. 2191–2195, IEEE, 2004. [81] L. von Ahn, M. Blum, N. J. Hopper, and J. Langford, “CAPTCHA: [56] M. Kuhn, “Otpw—a one-time password login package,” 1998. Using hard AI problems for security,” in EUROCRYPT 2003 (E. Biham, [57] S. Chiasson, P. C. van Oorschot, and R. Biddle, “Graphical password ed.), vol. 2656 of LNCS, pp. 294–311, Springer, Heidelberg, May 2003. authentication using cued click points,” in ESORICS 2007 (J. Biskup and [82] B. Pinkas and T. Sander, “Securing passwords against dictionary at- J. L´opez, eds.), vol. 4734 of LNCS, pp. 359–374, Springer, Heidelberg, tacks,” in ACM CCS 2002 (V. Atluri, ed.), pp. 161–170, ACM Press, Sept. 2007. Nov. 2002. [58] R. Jhawar, P. Inglesant, N. Courtois, and M. A. Sasse, “Make mine [83] “Hackers find new way to bilk eBay users - CNET,” 2019. a quadruple: Strengthening the security of graphical one-time pin [84] M. Motoyama, K. Levchenko, C. Kanich, D. McCoy, G. M. Voelker, and authentication,” in 2011 5th International Conference on Network and S. Savage, “Re: CAPTCHAs-understanding CAPTCHA-solving services System Security, pp. 81–88, IEEE, 2011. in an economic context,” in USENIX Security 2010, pp. 435–462, [59] RSA, “Rsa securid® 6100 usb token,” 2003. USENIX Association, Aug. 2010. [60] B. Parno, C. Kuo, and A. Perrig, “Phoolproof prevention,” in [85] D. Wang, Z. Zhang, P. Wang, J. Yan, and X. Huang, “Targeted online FC 2006 (G. Di Crescenzo and A. Rubin, eds.), vol. 4107 of LNCS, password guessing: An underestimated threat,” in ACM CCS 2016 (E. R. pp. 1–19, Springer, Heidelberg, Feb. / Mar. 2006. Weippl, S. Katzenbeisser, C. Kruegel, A. C. Myers, and S. Halevi, eds.), [61] A. Ross, J. Shah, and A. K. Jain, “From template to image: Reconstruct- pp. 1242–1254, ACM Press, Oct. 2016. ing fingerprints from minutiae points,” IEEE transactions on pattern [86] J. Blocki and A. Datta, “CASH: A cost asymmetric secure hash algo- analysis and machine intelligence, vol. 29, no. 4, pp. 544–560, 2007. rithm for optimal password protection,” in CSF 2016Computer Security [62] J. Daugman, “How iris recognition works,” in The essential guide to Foundations Symposium (M. Hicks and B. K¨opf, eds.), pp. 371–386, image processing, pp. 715–739, Elsevier, 2009. IEEE Computer Society Press, 2016. APPENDIX COMPRESSED PASSWORD DISTRIBUTIONS Given a distribution over N passwords with corresponding probabilities p1 ≥ p2 ≥ ...pN we can often encode the probabilities in compressed form by grouping passwords with equal probability. Viewed in this way we can encode the distri- ′ ′ bution as a sequence of N ≤ N tuples (p1,c1),..., (pN ′ ,cN ) where p1 > p2 >...>pN ′ and ci denotes the number of passwords with probability exactly in the distribution. In all of the distributions we analyze we have N ′ ≪ N e.g., for the RockYou empirical we have N ≥ 3.2 × 107 while N ′ ≤ 2.3 × 103. Thus, it is desirable to ensure that our optimization algorithms scale with N ′ instead of N. Similar observations were used by Blocki and Datta [86] and Bai and Blocki [71]. Bai and Blocki [71, Lemma 1] showed that a rational ad- versary never splits equivalence sets i.e., if Pr[pwi] = Pr[pwj ] then the attacker will either check both passwords or neither. Thus, a rational attacker’s optimal strategy will be to check the B∗ most popular passwords where B∗ is guaranteed to be N ′ in the set {0,c1,c1 + c2,..., i=1 ci}. When the distribution is compact this substantially narrows down the search space in comparison to a brute-forceP search over all possible choices of B∗ ∈ [N]. We observe that if the original password distribution has a ′ ′ compact representation (p1,c1),..., (pN ′ ,cN ) with N equiv- alence sets then, after observing the password signaling y ∈ [b], the posterior distribution has a compact representation with at most aN ′ equivalence sets where the dimension of the signaling matrix is a × b i.e., S ∈ Ra×b. To see this notice that we can partition all passwords pw in equivalence set i into a groups based on the value getStrength(pw) ∈ [a]. If Pr[pw] = Pr[pw′] and getStrength(pw) = getStrength(pw) then the posterior probabilities are also equal i.e., Pr[pw|y]= Pr[pw′|y]. Thus, given a signaling matrix S and a signal y ∈ [b] and ∗ ∗ we can compute the adversaries optimal response (πb ,By ) to the signal y by 1) computing the compact representation of the posterior distribution, and 2) checking all aN ′ possible ∗ values of By to find the budget that maximizes the attacker’s expected utility conditioning on the signal y. After pre- processing the original dataset the first step only requires time O(aN ′ log aN ′) for each new signaling matrix S and y ∈ [b].

EXTRA PLOTS no signal no signal no signal 1 perfect knowledge signaling 1 perfect knowledge signaling 1 perfect knowledge signaling imperfect knowledge signaling imperfect knowledge signaling imperfect knowledge signaling improvement: black- blue improvement: black- blue improvement: black- blue ∗ ∗ ∗ 0.8 S = S (105 ) 0.8 S = S (105) 0.8 S = S (105) ∗ ∗ ∗ S = S (106 ) S = S (106) S = S (106)

0.6 0.6 0.6

0.4 uncertain region 0.4 uncertain region 0.4 uncertain region

0.2 0.2 0.2 Fraction of Cracked Passwords Fraction of Cracked Passwords Fraction of Cracked Passwords

0 0 0 103 104 105 106 107 108 103 104 105 106 107 108 103 104 105 106 107 108

v/Cmax v/Cmax v/Cmax (a) Rockyou (b) 000webhost (c) Yahoo

Fig. 5. Adversary Success Rate vs v/Cmax for Empirical Distributions the red (resp. yellow) shaded areas denote unconfident regions where the the empirical distribution might diverges from the real distribution E ≥ 0.1 (resp. E ≥ 0.01).

·10−2 0.2 no signal no signal no signal partition top 1k partition top 1k partition top 1k partition top 10k 4 partition top 10k partition top 10k improvement: black- blue improvement: black- blue improvement: black- blue 0.15 0.1 3

0.1 2 5 · 10−2

−2 5 · 10 1 Fraction of Cracked Passwords Fraction of Cracked Passwords Fraction of Cracked Passwords

0 0 0 102 103 104 105 102 103 104 105 102 103 104 105

v/Cmax v/Cmax v/Cmax (a) Rockyou (b) 000webhost (c) Yahoo

0.1 no signal no signal no signal partition top 1k 0.1 partition top 1k partition top 1k 0.2 partition top 10k partition top 10k partition top 10k improvement: black- blue improvement: black- blue 8 · 10−2 improvement: black- blue 8 · 10−2 0.15 6 · 10−2 6 · 10−2 0.1 4 · 10−2 4 · 10−2

5 · 10−2 −2 2 · 10−2 2 · 10 Fraction of Cracked Passwords Fraction of Cracked Passwords Fraction of Cracked Passwords

0 0 0 102 103 104 105 102 103 104 105 102 103 104 105

v/Cmax v/Cmax v/Cmax (d) CSDN (e) Linkedin (f) Neopets

Fig. 6. Adversary Success Rate vs v/Cmax in Defense of Online Attacks