Object Oriented Programming Using Java Zipping and Unzipping using Java Streams

ZIP is an archive file format that supports lossless . A file may contain one or more files or directories in compressed or uncompressed form. The ZIP file format supports a number of compression algorithms, being is the most common. This format was originally created in 1989 and was first implemented as PKZIP utility in PKWARE. The .ZIP file format was designed by Phil Katz of PKWARE and Gary Conway of Infinity Design Concepts. The ZIP format was quickly supported by many utilities other than PKZIP. Microsoft and Apple has built-in ZIP support. Most free operating systems have built in support for ZIP file fomat. ZIP files generally use the file extensions .zip or .ZIP and the MIME media type application/zip. .ZIP files are archives that store multiple files. ZIP allows files to be compressed using many different methods(Shrink (LZW), Reduce (levels 1-4; RLE + probabilistic), Implode, Deflate, Deflate64, bzip2, LZMA, WavPack, PPMd, and a LZ77) as well as simply storing a file without compressing it(Store - no compression), Each file is stored separately, allowing different files in the same archive to be compressed using different methods. Because the files in a ZIP archive are compressed individually it is possible to extract them, or add new ones, without applying compression or decompression to the entire archive.

Structure of Zip File

Each entry stored in a ZIP archive is introduced by a local file header 4-byte signature with information about the file like comment, file size and file name, followed by optional "extra" data fields, and then the possibly compressed/ encrypted file data. A Central directory is placed at the end of a ZIP file. This identifies what files are in the ZIP and identifies where in the ZIP that file is located. This allows ZIP readers to load the list of files without reading the entire ZIP archive. The name of each file or directory within the archive should be specified in a central directory entry, along with other metadata about the entry, and an offset into the ZIP file, pointing to the actual entry data.

Java Support for zip files

• The Java API/JDK provides full support to create,write and read ZIP files in Java.

• java.util.zip contains all classes related zipping and unzipping standard ZIP and GZIP file formats. The following are the important classes in in java.util.zip package.

 ZipFile - This class is used to read entries from a zip file.

 ZipEntry - This class is used to represent a ZIP file entry.

 Deflater - provides support for general purpose compression using the popular ZLIB compression library.

 Inflater - provides support for general purpose decompression using the popular ZLIB compression library.

 GZipInputStream - implements a stream filter for reading compressed data in the GZIP file format

 GZipOutputStream - implements a stream filter for writing compressed data in the GZIP file format.

 ZipInputStream - implements an input stream filter for reading files in the ZIP file format.

 ZipOutputStream- implements an output stream filter for writing files in the ZIP file format

DEFLATE is a patent-free compression algorithm used for lossless data compression. There are many open source implementations of the algorithm. Mostly used standard implementation library is zlib. It provides functions for compressing and decompressing data using DEFLATE/INFLATE. The zlib library also provides a data format which wraps DEFLATE compressed data with a header and a checksum and it is named as zlib.

GZIP is a compression library which compresses data uses the zlib library internally to DEFLATE/INFLATE compression operations. GZIP also provides its own data format which wraps DEFLATE compressed data with a header and a checksum and it is named as GZIP. Important methods of Zip Classes

Methods of ZipFile

Method Purpose Enumeration entries() Returns enumeration on zipfile entries

ZipEntry getEntry(String name ) Searches and returns the file entry with specified name InputStream getInputStream(ZipEntry entry) Returns InputStream to read the specified entry

String getName() Returns path name of the zip file

int size() Returns Number of entries in Zip file

void close() Closes the Zip file

Methods of ZipEntry

Method Purpose void setCompressedSize(long) Sets the size of compressed entry data

long getCompressedSize() Returns the size of compressed entry data

void setSize(long) Set the uncompressed size

long getSize() Returns uncompressed size

String getName() Returns Name of the entry of Zip file

int getMethod() Returns the compression method

void setMethod(int method) Sets the compression method

String toString() Returns String Representation of zip entry

boolean isDirectory() Returns true if an entry is directory

Methods of ZipInputStream

Method Purpose Protected ZipEntry Creates a new ZipEntry with given name createZipEntry(String name)

ZipEntry getNextEntry() Returns the next zip file entry and position the stream at the beginning

int read(byte[] b, int off, int len) Reads the current zip file entry into array

int available() Returns 0 if EOF reached otherwise 1

void closeEntry() Closes the current zip file entry and position the stream for reading next entry

void close() Closes the entire stream

Long skip(long n) Skips number of bytes in current zip file entry

Methods of ZipOutputStream

Method Purpose void setMethod(int method) Sets the default compression method for subsequent entries

void putNextEntry(ZipEntry ze) Put a new zip file entry and position the stream at the beginning to write

int write(byte[] b, int off, int len) Writes array of bytes to the current zip file

void finish() Finishes writing zip file content without closing the stream

void closeEntry() Closes the current zip file entry and position the stream for Writing next entry

void close() Closes the entire stream

void setLevel(int level) Sets Compression level for subsequent deflated entries

Zipping Files and Folders Zipping is the process of archiving multiple files and folders into a single zip file with an extension of .zip which can be easily shared and transferred. We use ZipOutputStream class of Java API in java.util.zip package for this purpose.

Steps to Create a Zip File

• Create instances of FileOutputStream and ZipOutputStream

 Create a FileOutputStream or BufferedOutputStream to write, with the name of the zipfile  Create a new ZipOutputStream from the FileOutputStream, that is an output stream filter for writing files in the ZIP file format.

• For each File create a new File instance with the given pathname of the file.

 Create a FileInputStream to read the content from the file.  Create a new ZipEntry with the name of the File.  Put the ZipEntry into ZipOutputStream and position the stream at beginning using putNextEntry() method  Read multiples of 1024 bytes of data from the file into an array of bytes, using the read(byte[] b) method of FileInputStream  write the data to the current ZipEntry data, using write(byte[] b, int off, int len) method of ZipOutputStream.

• Close the ZipEntry, the ZipOutputStream and the FileInputStream, using closeEntry() and close() methods

Program import java.io.*; import java.util.ArrayList; import java.util.List; import java.util.zip.ZipEntry; import java.util.zip.ZipOutputStream; public class FileZipWrite { public void zip(List files){ FileOutputStream fout = null; ZipOutputStream zout = null; FileInputStream fin = null; try { fout = new FileOutputStream("E:/test/ziptest.zip"); zout = new ZipOutputStream(new BufferedOutputStream(fout)); for(String filePath:files){ File input = new File(filePath); fin = new FileInputStream(input); ZipEntry z = new ZipEntry(input.getName()); System.out.println("Zipping the file: "+input.getName()); zout.putNextEntry(z); byte[] buf = new byte[1024]; int size = 0; while((size = fin.read(buf)) != -1){ zout.write(buf, 0, size); } zout.flush(); fin.close(); } zout.close(); fout.close(); System.out.println("Completed Zipping the files"); } catch (FileNotFoundException e) { System.out.println("Error :" + e.getMessage()); } catch (IOException e) { System.out.println("Error :" + e.getMessage()); } } public static void main(String a[]){ FileZipWrite fzw = new FileZipWrite (); List files = new ArrayList(); files.add("E:/test/FileBuffCopy.java"); files.add("E:/test/FileWordCopy.java"); files.add("E:/test/FileWordCount.java"); files.add("E:/test/FileLineCount.java"); fzw.zip(files); } } Output

Unzipping the zip files Unzipping is a process to read a zip file and display what are the entries(files and folders) it contains and extract those entries into a specified destination folder. We can use directly ZipFile class if we want use random access to iterate over different entries. To access entries sequentially with a stream while unzipping we use ZipInputStream class.

Steps to unzip a zip file

• Ckeck whether the destination folders exists or not, if doesn’t exists create a new folder/directoty

Create instances of FileInputStream and ZipInputStream  Create a FileInputStream to read, with the name of the zipfile  Create a new ZipInputStream from the FileInputStream, that is an input stream filter for reading files and folders from the ZIP file.

• For each Entry in a Zip file read with getZipEntry() method.

 Check if entry is a folder/directory then create a directory in the path specified  If entry is a file, create a FileOutputStream to write the content to a file at the path specified  Read multiples of 1024 bytes of data from the ZipInputStream into an array of bytes, using the read(byte[] b) method  write the data to the FileOutputStream object, using write(byte[] b, int off, int len) method.

• Close the ZipEntry, the ZipOutputStream and the FileInputStream, using closeEntry() and close() methods

Program import java.io.*; import java.util.zip.ZipEntry; import java.util.zip.ZipInputStream; public class FileUnzip { private static void unzip(String zipFile, String destFolder) { File dir = new File(destFolder); if(!dir.exists()) dir.mkdirs(); byte[] buffer = new byte[1024]; try { FileInputStream fin = new FileInputStream(zipFile); ZipInputStream zin = new ZipInputStream(fin); ZipEntry z = zin.getNextEntry(); while(z != null){ String eName = z.getName(); File file = new File(destFolder + File.separator + eName); System.out.println("Unzipping file " + eName + " to " + file.getAbsolutePath()); if(z.isDirectory()) { File newDir = new File(file.getAbsolutePath()); if(!newDir.exists()) { boolean state = newDir.mkdirs(); if(state == false) { System.out.println("Unale to create Folder"); } } } else { FileOutputStream fout = new FileOutputStream(file); int count = 0; while ((count = zin.read(buffer)) > 0) { fout.write(buffer, 0, count); } fout.close(); } zin.closeEntry(); z = zin.getNextEntry(); }

zin.closeEntry(); zin.close(); fin.close(); } catch (IOException e) { System.out.println("Error : "+e.getMessage()); } }

public static void main(String[] args) { String zipFile = "e:/test/ziptest.zip"; String destFolder = "E:/TestUnzip"; unzip(zipFile,destFolder); } }

Output

KeyPoints

• ZIP is an archive file format that supports lossless data compression. • ZIP files generally use the file extensions .zip and store multiple files. • The Java API provides support to create, write and read ZIP files in Java. • ZipFile - is used to read entries from a zip file. • ZipEntry is used to represent a ZIP file entry in a ZIpFile. • ZipInputStream - implements an input stream filter for reading files in the ZIP file format. • ZipOutputStream- implements an output stream filter for writing files in the ZIP file format • By default DEFLATE is used for compression as its a patent-free compression algorithm used for lossless data compression • By default INFLATE is used for decompression Web References https://en.wikipedia.org/wiki/Zip_(file_format) https://docs.oracle.com/javase/7/docs/api/java/util/zip/package-summary.html https://docs.oracle.com/javase/7/docs/api/java/util/zip/ZipEntry.html https://docs.oracle.com/javase/7/docs/api/java/util/zip/ZipInputStream.html https://docs.oracle.com/javase/7/docs/api/java/util/zip/ZipOutputStream.html