Brute Force Attack on UNIX Passwords with SIMD
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
GPU-Based Password Cracking on the Security of Password Hashing Schemes Regarding Advances in Graphics Processing Units
Radboud University Nijmegen Faculty of Science Kerckhoffs Institute Master of Science Thesis GPU-based Password Cracking On the Security of Password Hashing Schemes regarding Advances in Graphics Processing Units by Martijn Sprengers [email protected] Supervisors: Dr. L. Batina (Radboud University Nijmegen) Ir. S. Hegt (KPMG IT Advisory) Ir. P. Ceelen (KPMG IT Advisory) Thesis number: 646 Final Version Abstract Since users rely on passwords to authenticate themselves to computer systems, ad- versaries attempt to recover those passwords. To prevent such a recovery, various password hashing schemes can be used to store passwords securely. However, recent advances in the graphics processing unit (GPU) hardware challenge the way we have to look at secure password storage. GPU's have proven to be suitable for crypto- graphic operations and provide a significant speedup in performance compared to traditional central processing units (CPU's). This research focuses on the security requirements and properties of prevalent pass- word hashing schemes. Moreover, we present a proof of concept that launches an exhaustive search attack on the MD5-crypt password hashing scheme using modern GPU's. We show that it is possible to achieve a performance of 880 000 hashes per second, using different optimization techniques. Therefore our implementation, executed on a typical GPU, is more than 30 times faster than equally priced CPU hardware. With this performance increase, `complex' passwords with a length of 8 characters are now becoming feasible to crack. In addition, we show that between 50% and 80% of the passwords in a leaked database could be recovered within 2 months of computation time on one Nvidia GeForce 295 GTX. -
Argon and Argon2
Argon and Argon2 Designers: Alex Biryukov, Daniel Dinu, and Dmitry Khovratovich University of Luxembourg, Luxembourg [email protected], [email protected], [email protected] https://www.cryptolux.org/index.php/Password https://github.com/khovratovich/Argon https://github.com/khovratovich/Argon2 Version 1.1 of Argon Version 1.0 of Argon2 31th January, 2015 Contents 1 Introduction 3 2 Argon 5 2.1 Specification . 5 2.1.1 Input . 5 2.1.2 SubGroups . 6 2.1.3 ShuffleSlices . 7 2.2 Recommended parameters . 8 2.3 Security claims . 8 2.4 Features . 9 2.4.1 Main features . 9 2.4.2 Server relief . 10 2.4.3 Client-independent update . 10 2.4.4 Possible future extensions . 10 2.5 Security analysis . 10 2.5.1 Avalanche properties . 10 2.5.2 Invariants . 11 2.5.3 Collision and preimage attacks . 11 2.5.4 Tradeoff attacks . 11 2.6 Design rationale . 14 2.6.1 SubGroups . 14 2.6.2 ShuffleSlices . 16 2.6.3 Permutation ...................................... 16 2.6.4 No weakness,F no patents . 16 2.7 Tweaks . 17 2.8 Efficiency analysis . 17 2.8.1 Modern x86/x64 architecture . 17 2.8.2 Older CPU . 17 2.8.3 Other architectures . 17 3 Argon2 19 3.1 Specification . 19 3.1.1 Inputs . 19 3.1.2 Operation . 20 3.1.3 Indexing . 20 3.1.4 Compression function G ................................. 21 3.2 Features . 22 3.2.1 Available features . 22 3.2.2 Server relief . 23 3.2.3 Client-independent update . -
2.5 Classification of Parallel Computers
52 // Architectures 2.5 Classification of Parallel Computers 2.5 Classification of Parallel Computers 2.5.1 Granularity In parallel computing, granularity means the amount of computation in relation to communication or synchronisation Periods of computation are typically separated from periods of communication by synchronization events. • fine level (same operations with different data) ◦ vector processors ◦ instruction level parallelism ◦ fine-grain parallelism: – Relatively small amounts of computational work are done between communication events – Low computation to communication ratio – Facilitates load balancing 53 // Architectures 2.5 Classification of Parallel Computers – Implies high communication overhead and less opportunity for per- formance enhancement – If granularity is too fine it is possible that the overhead required for communications and synchronization between tasks takes longer than the computation. • operation level (different operations simultaneously) • problem level (independent subtasks) ◦ coarse-grain parallelism: – Relatively large amounts of computational work are done between communication/synchronization events – High computation to communication ratio – Implies more opportunity for performance increase – Harder to load balance efficiently 54 // Architectures 2.5 Classification of Parallel Computers 2.5.2 Hardware: Pipelining (was used in supercomputers, e.g. Cray-1) In N elements in pipeline and for 8 element L clock cycles =) for calculation it would take L + N cycles; without pipeline L ∗ N cycles Example of good code for pipelineing: §doi =1 ,k ¤ z ( i ) =x ( i ) +y ( i ) end do ¦ 55 // Architectures 2.5 Classification of Parallel Computers Vector processors, fast vector operations (operations on arrays). Previous example good also for vector processor (vector addition) , but, e.g. recursion – hard to optimise for vector processors Example: IntelMMX – simple vector processor. -
CASH: a Cost Asymmetric Secure Hash Algorithm for Optimal Password Protection
CASH: A Cost Asymmetric Secure Hash Algorithm for Optimal Password Protection Jeremiah Blocki Anupam Datta Microsoft Research Carnegie Mellon University August 23, 2018 Abstract An adversary who has obtained the cryptographic hash of a user's password can mount an offline attack to crack the password by comparing this hash value with the cryptographic hashes of likely password guesses. This offline attacker is limited only by the resources he is willing to invest to crack the password. Key-stretching techniques like hash iteration and memory hard functions have been proposed to mitigate the threat of offline attacks by making each password guess more expensive for the adversary to verify. However, these techniques also increase costs for a legitimate authentication server. We introduce a novel Stackelberg game model which captures the essential elements of this interaction between a defender and an offline attacker. In the game the defender first commits to a key-stretching mechanism, and the offline attacker responds in a manner that optimizes his utility (expected reward minus expected guessing costs). We then introduce Cost Asymmetric Secure Hash (CASH), a randomized key-stretching mechanism that minimizes the fraction of passwords that would be cracked by a rational offline attacker without increasing amortized authentication costs for the legitimate authentication server. CASH is motivated by the observation that the legitimate authentication server will typically run the authentication procedure to verify a correct password, while an offline adversary will typically use incorrect password guesses. By using randomization we can ensure that the amortized cost of running CASH to verify a correct password guess is significantly smaller than the cost of rejecting an incorrect password. -
Su(3) Gluodynamics on Graphics Processing Units V
Proceedings of the International School-seminar \New Physics and Quantum Chromodynamics at External Conditions", pp. 29 { 33, 3-6 May 2011, Dnipropetrovsk, Ukraine SU(3) GLUODYNAMICS ON GRAPHICS PROCESSING UNITS V. I. Demchik Dnipropetrovsk National University, Dnipropetrovsk, Ukraine SU(3) gluodynamics is implemented on graphics processing units (GPU) with the open cross-platform compu- tational language OpenCL. The architecture of the first open computer cluster for high performance computing on graphics processing units (HGPU) with peak performance over 12 TFlops is described. 1 Introduction Most of the modern achievements in science and technology are closely related to the use of high performance computing. The ever-increasing performance requirements of computing systems cause high rates of their de- velopment. Every year the computing power, which completely covers the overall performance of all existing high-end systems reached before is introduced according to popular TOP-500 list of the most powerful (non- distributed) computer systems in the world [1]. Obvious desire to reduce the cost of acquisition and maintenance of computer systems simultaneously, along with the growing demands on their performance shift the supercom- puting architecture in the direction of heterogeneous systems. These kind of architecture also contains special computational accelerators in addition to traditional central processing units (CPU). One of such accelerators becomes graphics processing units (GPU), which are traditionally designed and used primarily for narrow tasks visualization of 2D and 3D scenes in games and applications. The peak performance of modern high-end GPUs (AMD Radeon HD 6970, AMD Radeon HD 5870, nVidia GeForce GTX 580) is about 20 times faster than the peak performance of comparable CPUs (see Fig. -
BLAKE2: Simpler, Smaller, Fast As MD5
BLAKE2: simpler, smaller, fast as MD5 Jean-Philippe Aumasson1, Samuel Neves2, Zooko Wilcox-O'Hearn3, and Christian Winnerlein4 1 Kudelski Security, Switzerland [email protected] 2 University of Coimbra, Portugal [email protected] 3 Least Authority Enterprises, USA [email protected] 4 Ludwig Maximilian University of Munich, Germany [email protected] Abstract. We present the hash function BLAKE2, an improved version of the SHA-3 finalist BLAKE optimized for speed in software. Target applications include cloud storage, intrusion detection, or version control systems. BLAKE2 comes in two main flavors: BLAKE2b is optimized for 64-bit platforms, and BLAKE2s for smaller architectures. On 64- bit platforms, BLAKE2 is often faster than MD5, yet provides security similar to that of SHA-3: up to 256-bit collision resistance, immunity to length extension, indifferentiability from a random oracle, etc. We specify parallel versions BLAKE2bp and BLAKE2sp that are up to 4 and 8 times faster, by taking advantage of SIMD and/or multiple cores. BLAKE2 reduces the RAM requirements of BLAKE down to 168 bytes, making it smaller than any of the five SHA-3 finalists, and 32% smaller than BLAKE. Finally, BLAKE2 provides a comprehensive support for tree-hashing as well as keyed hashing (be it in sequential or tree mode). 1 Introduction The SHA-3 Competition succeeded in selecting a hash function that comple- ments SHA-2 and is much faster than SHA-2 in hardware [1]. There is nev- ertheless a demand for fast software hashing for applications such as integrity checking and deduplication in filesystems and cloud storage, host-based intrusion detection, version control systems, or secure boot schemes. -
Rifflescrambler – a Memory-Hard Password Storing Function ⋆
RiffleScrambler – a memory-hard password storing function ? Karol Gotfryd1, Paweł Lorek2, and Filip Zagórski1;3 1 Wrocław University of Science and Technology Faculty of Fundamental Problems of Technology Department of Computer Science 2 Wrocław University Faculty of Mathematics and Computer Science Mathematical Institute 3 Oktawave Abstract. We introduce RiffleScrambler: a new family of directed acyclic graphs and a corresponding data-independent memory hard function with password independent memory access. We prove its memory hard- ness in the random oracle model. RiffleScrambler is similar to Catena – updates of hashes are determined by a graph (bit-reversal or double-butterfly graph in Catena). The ad- vantage of the RiffleScrambler over Catena is that the underlying graphs are not predefined but are generated per salt, as in Balloon Hashing. Such an approach leads to higher immunity against practical parallel at- tacks. RiffleScrambler offers better efficiency than Balloon Hashing since the in-degree of the underlying graph is equal to 3 (and is much smaller than in Ballon Hashing). At the same time, because the underlying graph is an instance of a Superconcentrator, our construction achieves the same time-memory trade-offs. Keywords: Memory hardness, password storing, Markov chains, mixing time. 1 Introduction In early days of computers’ era passwords were stored in plaintext in the form of pairs (user; password). Back in 1960s it was observed, that it is not secure. It took around a decade to incorporate a more secure way of storing users’ passwords – via a DES-based function crypt, as (user; fk(password)) for a se- cret key k or as (user; f(password)) for a one-way function. -
Hash Functions
11 Hash Functions Suppose you share a huge le with a friend, but you are not sure whether you both have the same version of the le. You could send your version of the le to your friend and they could compare to their version. Is there any way to check that involves less communication than this? Let’s call your version of the le x (a string) and your friend’s version y. The goal is to determine whether x = y. A natural approach is to agree on some deterministic function H, compute H¹xº, and send it to your friend. Your friend can compute H¹yº and, since H is deterministic, compare the result to your H¹xº. In order for this method to be fool-proof, we need H to have the property that dierent inputs always map to dierent outputs — in other words, H must be injective (1-to-1). Unfortunately, if H is injective and H : f0; 1gin ! f0; 1gout is injective, then out > in. This means that sending H¹xº is no better/shorter than sending x itself! Let us call a pair ¹x;yº a collision in H if x , y and H¹xº = H¹yº. An injective function has no collisions. One common theme in cryptography is that you don’t always need something to be impossible; it’s often enough for that thing to be just highly unlikely. Instead of saying that H should have no collisions, what if we just say that collisions should be hard (for polynomial-time algorithms) to nd? An H with this property will probably be good enough for anything we care about. -
Cs 255 (Introduction to Cryptography)
CS 255 (INTRODUCTION TO CRYPTOGRAPHY) DAVID WU Abstract. Notes taken in Professor Boneh’s Introduction to Cryptography course (CS 255) in Winter, 2012. There may be errors! Be warned! Contents 1. 1/11: Introduction and Stream Ciphers 2 1.1. Introduction 2 1.2. History of Cryptography 3 1.3. Stream Ciphers 4 1.4. Pseudorandom Generators (PRGs) 5 1.5. Attacks on Stream Ciphers and OTP 6 1.6. Stream Ciphers in Practice 6 2. 1/18: PRGs and Semantic Security 7 2.1. Secure PRGs 7 2.2. Semantic Security 8 2.3. Generating Random Bits in Practice 9 2.4. Block Ciphers 9 3. 1/23: Block Ciphers 9 3.1. Pseudorandom Functions (PRF) 9 3.2. Data Encryption Standard (DES) 10 3.3. Advanced Encryption Standard (AES) 12 3.4. Exhaustive Search Attacks 12 3.5. More Attacks on Block Ciphers 13 3.6. Block Cipher Modes of Operation 13 4. 1/25: Message Integrity 15 4.1. Message Integrity 15 5. 1/27: Proofs in Cryptography 17 5.1. Time/Space Tradeoff 17 5.2. Proofs in Cryptography 17 6. 1/30: MAC Functions 18 6.1. Message Integrity 18 6.2. MAC Padding 18 6.3. Parallel MAC (PMAC) 19 6.4. One-time MAC 20 6.5. Collision Resistance 21 7. 2/1: Collision Resistance 21 7.1. Collision Resistant Hash Functions 21 7.2. Construction of Collision Resistant Hash Functions 22 7.3. Provably Secure Compression Functions 23 8. 2/6: HMAC And Timing Attacks 23 8.1. HMAC 23 8.2. -
Efficient Hashing Using the AES Instruction
Efficient Hashing Using the AES Instruction Set Joppe W. Bos1, Onur Özen1, and Martijn Stam2 1 Laboratory for Cryptologic Algorithms, EPFL, Station 14, CH-1015 Lausanne, Switzerland {joppe.bos,onur.ozen}@epfl.ch 2 Department of Computer Science, University of Bristol, Merchant Venturers Building, Woodland Road, Bristol, BS8 1UB, United Kingdom [email protected] Abstract. In this work, we provide a software benchmark for a large range of 256-bit blockcipher-based hash functions. We instantiate the underlying blockci- pher with AES, which allows us to exploit the recent AES instruction set (AES- NI). Since AES itself only outputs 128 bits, we consider double-block-length constructions, as well as (single-block-length) constructions based on RIJNDAEL- 256. Although we primarily target architectures supporting AES-NI, our frame- work has much broader applications by estimating the performance of these hash functions on any (micro-)architecture given AES-benchmark results. As far as we are aware, this is the first comprehensive performance comparison of multi- block-length hash functions in software. 1 Introduction Historically, the most popular way of constructing a hash function is to iterate a com- pression function that itself is based on a blockcipher (this idea dates back to Ra- bin [49]). This approach has the practical advantage—especially on resource-constrained devices—that only a single primitive is needed to implement two functionalities (namely encrypting and hashing). Moreover, trust in the blockcipher can be conferred to the cor- responding hash function. The wisdom of blockcipher-based hashing is still valid today. Indeed, the current cryptographic hash function standard SHA-2 and some of the SHA- 3 candidates are, or can be regarded as, blockcipher-based designs. -
Improving Tasks Throughput on Accelerators Using Opencl Command Concurrency∗
Improving tasks throughput on accelerators using OpenCL command concurrency∗ A.J. L´azaro-Mu~noz1, J.M. Gonz´alez-Linares1, J. G´omez-Luna2, and N. Guil1 1 Dep. of Computer Architecture University of M´alaga,Spain [email protected] 2 Dep. of Computer Architecture and Electronics University of C´ordoba,Spain Abstract A heterogeneous architecture composed by a host and an accelerator must frequently deal with situations where several independent tasks are available to be offloaded onto the accelerator. These tasks can be generated by concurrent applications executing in the host or, in case the host is a node of a computer cluster, by applications running on other cluster nodes that are willing to offload tasks in the accelerator connected to the host. In this work we show that a runtime scheduler that selects the best execution order of a group of tasks on the accelerator can significantly reduce the total execution time of the tasks and, consequently, increase the accelerator use. Our solution is based on a temporal execution model that is able to predict with high accuracy the execution time of a set of concurrent tasks launched on the accelerator. The execution model has been validated in AMD, NVIDIA, and Xeon Phi devices using synthetic benchmarks. Moreover, employing the temporal execution model, a heuristic is proposed which is able to establish a near-optimal tasks execution ordering that signicantly reduces the total execution time, including data transfers. The heuristic has been evaluated with five different benchmarks composed of dominant kernel and dominant transfer real tasks. Experiments indicate the heuristic is able to find always an ordering with a better execution time than the average of every possible execution order and, most times, it achieves a near-optimal ordering (very close to the execution time of the best execution order) with a negligible overhead. -
NISTIR 7620 Status Report on the First Round of the SHA-3
NISTIR 7620 Status Report on the First Round of the SHA-3 Cryptographic Hash Algorithm Competition Andrew Regenscheid Ray Perlner Shu-jen Chang John Kelsey Mridul Nandi Souradyuti Paul NISTIR 7620 Status Report on the First Round of the SHA-3 Cryptographic Hash Algorithm Competition Andrew Regenscheid Ray Perlner Shu-jen Chang John Kelsey Mridul Nandi Souradyuti Paul Information Technology Laboratory National Institute of Standards and Technology Gaithersburg, MD 20899-8930 September 2009 U.S. Department of Commerce Gary Locke, Secretary National Institute of Standards and Technology Patrick D. Gallagher, Deputy Director NISTIR 7620: Status Report on the First Round of the SHA-3 Cryptographic Hash Algorithm Competition Abstract The National Institute of Standards and Technology is in the process of selecting a new cryptographic hash algorithm through a public competition. The new hash algorithm will be referred to as “SHA-3” and will complement the SHA-2 hash algorithms currently specified in FIPS 180-3, Secure Hash Standard. In October, 2008, 64 candidate algorithms were submitted to NIST for consideration. Among these, 51 met the minimum acceptance criteria and were accepted as First-Round Candidates on Dec. 10, 2008, marking the beginning of the First Round of the SHA-3 cryptographic hash algorithm competition. This report describes the evaluation criteria and selection process, based on public feedback and internal review of the first-round candidates, and summarizes the 14 candidate algorithms announced on July 24, 2009 for moving forward to the second round of the competition. The 14 Second-Round Candidates are BLAKE, BLUE MIDNIGHT WISH, CubeHash, ECHO, Fugue, Grøstl, Hamsi, JH, Keccak, Luffa, Shabal, SHAvite-3, SIMD, and Skein.