GPU-Assisted Password Hashing: the Example of Scrypt
Total Page:16
File Type:pdf, Size:1020Kb
GPU-Assisted Password Hashing: The Example of scrypt Thorsten Kranz Master’s Thesis. April 14, 2014. Chair for Embedded Security – Prof. Dr.-Ing. Christof Paar Advisor: Dr. Markus Dürmuth EMSEC Abstract Many big leaks of password data in the past have emphasized the importance of protecting stored passwords. Traditionally, cryptographic hash functions have been used to this end. However, an attacker equipped with special hardware is able to run a parallelized guessing attack on such hashes. Therefore, new ideas for password hashing functions have been presented. In 2009, scrypt was published as a password hashing function that is designed to be very expensive to attack with custom hardware. In this thesis, we present a GPU-based attack on scrypt. We examine the behavior of the different cost determining parameters for the GPU and the CPU. Furthermore, we compare the hash rates achievable for scrypt and bcrypt. Our measurements show that particularly the block size parameter 푟 of scrypt is well- suited for thwarting a GPU-assisted attack. However, the hash rates show that scrypt is not as successful as bcrypt in preventing GPU-assisted attacks for cost parameters that are reasonable in an interactive login scenario. Our results imply that the factors between costs for ASIC hardware crackers of scrypt and bcrypt that have been estimated by scrypt’s author do not hold for graphics cards. i Declaration I hereby declare that this submission is my own work and that, to the best of my knowledge and belief, it contains no material previously published or written by another person nor material which to a substantial extent has been accepted for the award of any other degree or diploma of the university or other institute of higher learning, except where due acknowledgment has been made in the text. Erklärung Hiermit versichere ich, dass ich die vorliegende Arbeit selbstständig verfasst und keine anderen als die angegebenen Quellen und Hilfsmittel benutzt habe, dass alle Stellen der Arbeit, die wörtlich oder sinngemäß aus anderen Quellen übernommen wurden, als solche kenntlich gemacht sind und dass die Arbeit in gleicher oder ähnlicher Form noch keiner Prüfungsbehörde vorgelegt wurde. Thorsten Kranz Contents 1 Introduction 1 1.1 Contribution . .3 1.2 Related Work . .3 1.3 Outline . .4 2 Background 5 2.1 Password Security . .5 2.1.1 Passwords as Means of Authentication . .5 2.1.2 Online Attacks vs Offline Attacks . .6 2.1.3 Naive Password Hashing . .6 2.1.4 Attacks on Naive Password Hashing . .8 2.1.5 Advanced Password Hashing . 11 2.2 scrypt . 12 2.2.1 Overview . 13 2.2.2 scryptROMix . 13 2.2.3 scryptBlockMix . 14 2.2.4 Salsa20/8 Core . 15 2.2.5 Size Restrictions . 17 3 Cost Estimations 19 3.1 Provided Information . 19 3.2 Example: Cost Estimations for MD5 . 21 3.3 Counting the Cryptographic Primitives . 22 3.4 Estimating the Memory Costs for bcrypt and scrypt . 26 3.5 Compression Functions vs Hashes . 27 4 Implementation 31 4.1 CUDA C Programming . 31 4.1.1 Heterogeneous Model . 31 4.1.2 Thread Hierarchy . 32 4.1.3 Memory Model . 33 4.2 CUDA Based Password Hashing with scrypt . 35 4.2.1 Host Implementation . 35 4.2.2 Device Implementation . 38 5 Results 45 5.1 CPU Running Times . 45 iv Contents 5.2 GPU Hash Rates . 47 5.2.1 The N Parameter . 48 5.2.2 The r Parameter . 48 5.2.3 The p Parameter . 49 5.2.4 More Results . 50 5.3 Comparing scrypt with bcrypt . 50 6 Conclusion 55 6.1 Future Work . 55 A Acronyms 57 B GeForce GTX 480 - Device Query 59 C Measurements 61 List of Figures 63 List of Tables 65 List of Algorithms 67 List of Listings 69 Bibliography 71 1 Introduction In today’s world, passwords are omnipresent. The average Internet user is requested to authenticate with user name and password multiple times per day: when signing in as a user of an operating system, when checking emails, when logging in to a social network, when using online banking, and so forth. In the case of online services, that require a login, large amounts of passwords have to be stored on the server-side. Experience has shown that it is not easy at all to protect those passwords from unauthorized access. Just recently, 130 million encrypted passwords of Adobe users showed up on the Internet together with user names and password hints, directly revealing lots of passwords due to bad implementation [Sop13]. Table 1.1 lists the most famous password leaks from the last years. Year Affected entity # of passwords Implementation details 2013 Adobe [Tec13] ≈ 130,000,000 Encrypted. Unsalted. 2012 gamigo [For12] ≈ 8,000,000 Hashed. Unsalted. 2012 LinkedIn [Tec12a] ≈ 6,000,000 Hashed. Unsalted. 2012 eHarmony [Tec12b] ≈ 1,500,000 Hashed. Unsalted. 2009 RockYou [Dev10, Tec09] ≈ 32,000,000 Plain. Table 1.1: Overview of the most famous password leaks in the last years. It should be pointed out that Table 1.1 only includes breaches were the according password files have been posted on the Internet and could be associated with anentity. There are three things to note here. First, it must be assumed that there also exist many password leaks that never show up on the Internet, since this might adversely affect the criminal intentions of the attackers. To gain knowledge of such a breach,the affected entity must notice it and also make it public. A famous example for thatisthe breach of Sony’s Playstation Network in 2011 [Reu11]. 77 million users were affected and, according to Sony, the stolen data also included hashed passwords [Pla11]. But it would be naive to assume that every breach is detected and also made public as in this particular case. The second thing to realize is that the number of leaked passwords showing up on the Internet is always a best-case scenario. Especially in those cases where password hashes are posted without according user names, it is conceivable that the original attacker published merely that fraction of the passwords that he did not manage to crack on his own. Finally, every day there are also lots of passwords published on the Internet that cannot be associated with a definite entity, that is, we do not know where these passwords come from and if they are actually real. Putting all those arguments together, one must assume that there is a large dark figure of leaked passwords, even 2 1 Introduction though the numbers we can be sure of are already sufficient for motivating research for better ways of implementing password storage. This implementation can be divided into two layers as depictured in Figure 1.1. On the one hand, a data breach should not occur in the first place. This first layer in password security involves software engineering, database programming, and security policies. Breaches might for example take place due to SQL injections or thoughtless disposal of old backups. As we just motivated, such breaches prove to be hard to thwart. Therefore, we need a second security layer that takes effect in the event of password theft. Such a scenario is called an offline attack, that is, the attacker has stored the password file on a local disk and is not subject to any restrictions imposed by online authentication. This thesis will cover the latter of those security layers while preventing password leaks will not be taken into account. Figure 1.1: Passwords are secured with two layers. The first layer prevents data theft, the second layer protects the passwords for the case that the first layer fails. To thwart the security threats that arise from offline attacks, techniques like hashing and salting are applied. Nevertheless, since trying out different passwords as input of a hash function is an embarrassingly parallel problem and hash functions are typically created to be computable as fast as possible, an attacker applied with parallel hardware still represents a big threat in such a scenario. In most cases of hashed and unsalted leaks, it only takes a few days for way above 50% of the hashes to be solved. More than 90% of the hashes from gamigo, LinkedIn, and eHarmony have been recovered and are now publicly accessible in the form of so called password dictionaries. While applying salting will significantly slow down the cracking process, it is still desirable to have a particularly suited hash function that is harder to attack. This is all the more true when we also want to protect against attackers with large resources, such as intelligence agencies. In contrast to “small” attackers who are typically equipped with a GPU, or maybe with an FPGA, those “large” attackers might use ASICs that can compute way more hashes per second. For that reason, algorithms [PM99, Kal00] and techniques [ALN97, Man96, KSHW98] have been proposed with the specific goal of slowing down the process of trying out many inputs. Since continuous hardware improvements must be taken into account, the computational cost is typically parameterized. One of these algorithms is called scrypt [Per09] and is 1.1 Contribution 3 based on the idea of making parallel hardware implementations expensive by forcing large utilization of RAM. The author claims that scrypt is significantly more expensive to attack than other well-known password hash functions. This thesis describes an implementation of the scrypt algorithm on a GPU. The cost of cracking passwords is measured for various parameters. Based on the results, the scrypt algorithm is evaluated and compared to the bcrypt algorithm. 1.1 Contribution Our main contribution is the implementation and evaluation of a GPU-based exhaustive search for passwords protected with scrypt.