Study on a Known-Plaintext Attack on ZIP Encryption
Total Page:16
File Type:pdf, Size:1020Kb
Study on a known-plaintext attack on ZIP encryption Dragos Barosan [email protected] February 8, 2015 Abstract The ZIP file format is one of the most popular compression format and it provides a stream cipher encryption for protecting data. A successful known plaintext attack has been developed since 1994, but there is no open source im- plementation for it. The research has focused on the feasibility of a successful, since the necessary plaintext is considered hard to obtain, and analyzed the al- gorithm. It has been found that, while difficult, plaintext can be found through varied resources. From an implementation point of view the algorithm contains sections that can be run in parallel, improving the execution speed. As future work, a full implementation of the algorithm is planned and it will be released as open source. Contents 1 Introduction 3 2 Research questions 4 3 Related work 5 4 Approach 7 5 Feasibility of obtaining plaintex 8 5.1 ZIP Defaults . 8 5.2 ZIP Encryption . 9 5.3 Difficulty of obtaining plaintext . 9 5.4 Solution . 11 6 Attack Implementation 13 6.1 Overview . 13 6.2 Locate Data . 14 6.3 First stage of the attack . 16 6.4 Implementation . 17 6.5 Measurements . 18 7 Conclusions and Future Work 20 8 Appendices 23 2 Chapter 1 Introduction The ZIP archive file format was originally created in 1989 by Phil Katz to sup- port lossless data compression and replace the ARC archiving system. The first version has been has been released in the PKZIP package from the PKWARE software company[1]. Since then the format has been released in public do- main and new versions are available on PKWAREs website under the name APPNOTE - .ZIP File Format Specification[2]. An important milestone in the history of the format is the introduction of the ZIP encryption starting with version 2.0 in 1993[3]. The encryption was based on an algorithm developed by the mathematician Roger Schlafly. This was the only available method of encryption specified in the standard until 2002 when support was introduced for other ciphers like 3DES, RC4 and AES[4]. The first attack against the ZIP encryption was developed in 1994 by Eli Biham and Paul C. Kocher[5]. They developed a known plaintext attack algo- rithm that is able to break the encryption and recover the original password in a reasonable time . The duration of a successful attack of this type depends primarily on the amount of known text that is available to the attacker. Al- though AES encryption has been introduced starting with the fifth specification of the zip file format, the old encryption is still used by tools today because it is considered to be difficult to obtain the required plaintext from the original file. The paper is focused on analyzing the feasibility of obtaining the necessary amount of plaintext for a successful known plaintext attack and on exploring different implementation options because, although there are tools that make use of the algorithm, there is no open source implementation available. An open source implementation would help research the present weaknesses and raise awareness that people should switch to stronger encryption. 3 Chapter 2 Research questions The investigation focuses on researching the feasibility of the known-plaintext attack on the ZIP encryption, what implementation possibilities are for it and providing a proof of concept. This can be formulated under the following re- search questions: • How feasible is to obtain the known plaintext for a successful attack? • What implementations are possible for the attack? 4 Chapter 3 Related work Even though the encryption algorithm and the first attack on it were developed more than twenty years ago. Based on the literature review, not much research has been done on them. Further on there are mentioned works on which part of this paper is based on. The algorithm on which this investigation is based was originally developed by Eli Biham and Paul C. Kocher[5]. They demonstrated that the encryption can be broken with as little as eleven bytes of known plaintext and with a complexity of at most 238. In the paper the complexity refers to how many items have to be processed at one stage in the algorithm. Peter Conrad developed PkCrack, the only known tool with the source code available that implements the Biham and Kocher algorithm[6]. This implemen- tation is in the C programming language and his tool has been used with good results. He explicitly states that any software using parts of his code without his consent is forbidden and so is the distribution of it in any commercial form. The last update to the code was in January 2003 according to the source files properties. An improvement of the chosen plaintext attack was introduced by Michael Stay[7]. His paper illustrates ways of reducing the amount of plaintext required. First, he makes a refinement of the Biham and Kocher attack that results in a decrease to only six bytes of necessary plaintext if there are at least four encrypted files in the same archive at the trade-off of an increase in complexity to 11* 240. Secondly, he introduces a new attack approach that requires only two bytes of known plaintext with a complexity of 263. Using this attack, by exploiting the pseudo random number generator used by WinZip versions prior to 8.1, an attack could succeed without the need of any plaintext and with a complexity of 239, which is comparable to the original attack . A commercial tool that implements this is the Advanced Archive Password Recovery from Elcomsoft. Another attack was developed by Mike Stevens and Elisa Flanders that ex- ploits the pseudo random number generator provided by the library IBDL32.DLL[8]. Their attack does also not require any known plaintext. No implementations, 5 open source or not, could be found of this attack. 6 Chapter 4 Approach For the development of the proof of concept the Python programming language was used, with the CPython version 2.7 reference implementation available from the Debian distribution packages. It was chosen because the absolute running time of the PoC was not of interest, but the relative speed between multiple implementations. The Linux /bin/usr/time tool was used to measure the running time of the different applications tested. In some cases the Python datetime module was used to measure the running time of certain sections of code. All tests were run on an Intel Core I7-3610QM with four cores running on 2.3 Ghz frequency. For compression and creating archives the Linux zip 3.0 utility was used unless specified otherwise. The PkCrack software was used for multiple tests regarding the ZIP en- cryption. All tests that implied breaking the ZIP encryption were done with PkCrack. The investigation first focused on the research how ZIP encryption and com- pression works and what solutions are available for obtaining plaintext. The second part of the research focused on implementing two proof of concepts: one that will run in parallel and a serial implementation. Only part of the whole algorithm was studied. 7 Chapter 5 Feasibility of obtaining plaintex 5.1 ZIP Defaults The zip encryption is old so it is interesting to see if this method of protecting zip archives is still used. For the investigation, three of the most popular ZIP compression software applications[9] were selected: WinZip, WinRar and 7ZIP. To add to this ones the Linux zip utility and PKcrack from PKWare, which owns the ZIP specification, were also taken into consideration. To check the vulner- ability of each tool, test files were archived and encrypted using the methods available. Then, using PKcrack, it was tested which ones can be decrypted to the original value. The tested versions and the results are presented in table 1.1 Application Version Support for Support Default ZIP encryp- for AES tion encryption WinZIP 19.0 Yes Yes AES WinRAR 5.21 Yes No ZIP encyption 7ZIP 9.2 Yes Yes ZIP encryp- tion PKZIP 14.20 Yes Yes ZIP encryp- tion zip 3.0 Yes No ZIP encryp- tion Table 5.1: Zip utilities As the results from the table illustrate, all considered applications support ZIP encryption, while only WinZip, 7zip and PKZIP support AES encryption. Furthermore, it was noticed that WINRAR and zip do not warn the user about 8 the insecure encryption that is used. The possibility emerges that the average user will choose weak encryption when using applications that do not default to AES as the encryption method. As a result, archives vulnerable to the known plaintext attack are still created and used. 5.2 ZIP Encryption Here the ZIP encryption algorithm, which functions as a byte-oriented stream cipher, will be presented as specified in the zip file format specification[2]. It is important to mention that first 12 random bytes are prepended to the plaintext before the encryption process. No header fields are encrypted, only the data. The cipher mechanism has an internal state of 96 bits that consists of three 32 bit words referred as key0, key1 and key2. These are initialized with 0x12345678, 0x23456789, and 0x34567890. From this internal state the actual encryption key is derived. The encryption key is referred to as key3 and represents an 8 bit value. The internal state, and subsequently key3, is updated as follows: key0i = crc32(key0i−1; characterbyte) (5.1) 3 key1i = (key1i−1 + LSB(key0i)) ∗ 134775813 + 1(mod2 2) (5.2) key2i = crc32(key2i−1; MSB(key1i)) (5.3) tempi = key2iOR3(2LSB) (5.4) key3i = LSB((tempi ∗ (tempiXOR1)) >> 8) (5.5) Equations 5.1 and 5.3 use a linear feedback shift register known as CRC32, the Cyclic Redundancy Check function: given a 32 bit value and an 8 bit value as input it returns another 32 bit value.