Improved Forensic Recovery of PKZIP Stream Cipher Passwords

Improved Forensic Recovery of PKZIP Stream Cipher Passwords Sein Coray, Iwen Coisel and Ignacio Sanchez European Commission, Joint Research Centre (DG JRC) - Via Enrico Fermi 2749, 21027 Ispra (VA), Italy Keywords: Stream Cipher, High Performance Computing, Forensics, Passwords, Cybersecurity, Cryptanalysis. Abstract: Data archives are often compressed following the PKZIP format and can optionally be encrypted with either the PKZIP stream cipher or the AES block cipher. In this article, we present new implementations of two attacks against the PKZIP stream cipher. To our knowledge, this is the first time those attacks have been demonstrated on Graphical Processing Unit (GPU). Our first implementation is retrieving archive passwords using the internal state of the PKZIP stream cipher obtained through the known-plaintext attack of Biham and Kocher. Passwords up to length 14 can be recovered within a month considering a single Nvidia 1080 Ti GPU. If one hundred of those cards are available, passwords up to length 15 would be recovered in less than 27 days. The second implementation is a more direct attack designed to retrieve an archive’s password without requiring any additional knowledge than the ciphertext. Experimental results show that our two implementations are at least ten times faster than the state of the art. This is an undeniable asset for investigators who may be particularly interested in further deepening their forensic analysis on an encrypted archive. 1 INTRODUCTION internal state of the cipher using the knowledge of at least 12 bytes of the plaintext. The PKZIP standard Digital forensic investigations often come across the supports nowadays AES encryption, yet, an analy- need to access content protected by a password based sis of the main ZIP encryption software available in authentication mechanism. Forensic investigators the market reveals that in up to 50% of the cases, the have a variety of techniques at their disposal to ac- PKZIP stream cipher remains the default encryption cess such content, usually by retrieving the associated method for ZIP archives. There is no statistics avail- password. The most common password guessing ap- able about the usage of one encryption method or the proaches are an exhaustive search trying all possible other, however, we assume that in those tools where it combinations of characters from a specific alphabet is set as a default method, the original PKZIP stream (e.g. all numbers, letter and special characters) or a cipher encryption is still heavily used. more targeted search exploring most common pass- In this paper, we present a new implementation words with or without classical modification (e.g. dic- of the known-plaintext attack of Biham and Kocher tionary attack with mangling rules). The available (Biham and Kocher, 1994) that takes advantage of computational power as well as the speed of the algo- Graphical Processing Units (GPUs) hardware allow- rithm used to test password candidates heavily limit ing the retrieval of even long passwords (up to 15 the success rate of those attacks. Therefore only short characters depending on the available hardware re- passwords or predictable ones are generally recovered sources) of ZIP archives encrypted using the PKZIP in practice. If the algorithm used by the target of the stream cipher. Furthermore, we also present a novel forensic analysis has a known weakness or vulnera- GPU implementation that can recover the PKZIP bility, it can be sometimes exploited to recover the stream cipher password when no plaintext is known. target’s password in a more efficient way than an ex- It is the first time that such attacks against the haustive search over all possible candidates. PKZIP steam cipher are implemented for GPU hard- The PKZIP standard falls into such category of ware. The obtained benchmark results invalidate vulnerable algorithms as Biham and Kocher (Biham the existing hypothesis1 that GPU implementations and Kocher, 1994) have exhibit a known-plaintext attack against the PKZIP stream cipher used to encrypt 1As discussed in https://hashcat.net/forum/ the compressed archive. Such attack can retrieve the thread-5709.html and https://github.com/magnumripper/ JohnTheRipper/issues/718 328 Coray, S., Coisel, I. and Sanchez, I. Improved Forensic Recovery of PKZIP Stream Cipher Passwords. DOI: 10.5220/0007360503280335 In Proceedings of the 5th International Conference on Information Systems Security and Privacy (ICISSP 2019), pages 328-335 ISBN: 978-989-758-359-9 Copyright c 2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved Improved Forensic Recovery of PKZIP Stream Cipher Passwords would not make a meaningful performance difference prepended bytes). ⊕ is the exclusive or, j the inclu- compared to existing CPU implementations due to the sive or, >> is a right shift and LSB and MSB de- nature of the encryption algorithm. Our experimental notes respectively the least and the most significant results show that our two implementations are at least byte of a value. The CRC32 used in this algorithm ten times faster than the current implementations. is the classical cyclic redundancy check code used in The rest of the paper is structured as follows. In this specific case with the reversed polynomial repre- Section 2, we describe the PKZIP stream cipher and sentations 0xEDB88320. review the existing attacks and respective implemen- Algorithm 1: Encryption(P, pwd). tations. Our GPU based implementation to retrieve the PKZIP stream cipher password from a known in- initialize internal state(pwd); for p 2 P do ternal state is described in Section 3. In Section 4, we i c = p ⊕ k ; describe our more generic attack to retrieve the en- i i i key(i+1) = CRC (key(i); p ) cryption password without any plaintext knowledge. 0 32 0 i ; (i+1) (i) (i+1) Finally, we present our conclusions in Section 5. key1 = (key1 + LSB(key0 )) ∗0x08088405 + 1; (i+1) (i) (i+1) key2 = CRC32(key2 ;MSB(key1 )); temp = key(i+1)j3; 2 BACKGROUND 2 k(i+1) = LSB(temp ∗ (temp ⊕ 1) >> 8); PKZIP is a data compression program designed in The initialization phase corresponds to the deriva- 1989 by Phil Katz and his company PKWARE mostly tion of the password into the initial state. The three famous for the introduction of the .zip format. In keys key0, key1, and key2 are respectively set to 1993, PKZIP v2.X was released with the introduction 0x12345678, 0x23456789 and 0x34567890. The in- of the DEFLATE algorithm, an efficient lossless data ternal state is then updated with each byte of the pass- compression algorithm. Whereas the main objective word as it is done with each plaintext byte except of the software is to archive file(s) in a compressed that the keystream byte is not computed. We denote manner, the archive can optionally be encrypted to (1) (1) (1) (key ;key ;key ) the obtained initial internal state secure the data at rest. As mentioned earlier, several 0 1 2 considered to be the secret key of the stream cipher. encryption algorithms are available, yet in this paper we will only focus on the PKZIP stream cipher that is described in what follows. 2.2 Related Attacks Biham and Kocher (Biham and Kocher, 1994) have 2.1 PKZIP Encryption Format designed a known plaintext attack aiming at retrieving the secret key of the stream cipher. Once in posses- The PKZIP stream cipher is a symmetric encryption sion of the internal state, the encrypted archive can be scheme, where the secret key is required for both en- fully decrypted, as well as any other archive encrypted cryption and decryption. This key is used to produce with the same password. It requires the knowledge of a keystream of bytes for the one-time pad algorithm. some consecutive bytes, at least twelve, of the plain- The stream cipher has a 96-bit internal state split text that is encrypted in the archive. into three 32-bit values denoted key0, key1 and key2. This strong requirement is not that inconceivable This internal state is updated every round using a in practice. The filenames contained in the archive are plaintext byte to produce a single byte of keystream. always accessible even when the archive is encrypted. Twelve bytes, denoted p1;:::; p12, are prepended to If headers of known file types can be extracted or if the plaintext in a matter of randomization of the pro- known files (e.g. system files) are included into the duced keystream. The last (or two lasts for some ver- archive, this knowledge can be used to perform the sions of PKZIP) of those bytes, called the checksum attack. Michael Stay later on softened this knowledge byte(s), are dependent of the plaintext and are used as requirement in his article (Stay, 2001) at a price of control bytes to detect wrong decryption (e.g. decryp- a higher complexity of the attack and/or a minimum tion process carried out with an incorrect password). number of files. Jeong et al (Jeong et al., 2011) have The algorithm producing the keystream byte is de- also exploited the presence of several files to reduce scribed in Algorithm 1. The notations used in this the complexity of the attack of Biham and Kocher. algorithm are the following. pi and ci denote re- In all cases, the retrieved internal state does not al- spectively a byte of plaintext and a byte of cipher- low the direct recovery of the password that was used text. P is the complete plaintext (including the twelve to encrypt the archive. An additional process needs to 329 ICISSP 2019 - 5th International Conference on Information Systems Security and Privacy be undertaken to retrieve this password with a com- edge of the plaintext and exploiting the linearity of plexity of 28(l−6) where l is the length of the pass- most of the functions used in the key update.

Load more