Performance of Compression Algorithms Used in Data Management Software

Int'l Conf. Information and Knowledge Engineering | IKE'16 | 33 Performance of Compression Algorithms used in Data Management Software Pierre Andre Bouabre Tyrone S. Toland Department of Informatics Department of Informatics University of South Carolina Upstate University of South Carolina Upstate 800 University Way, Spartanburg, SC 29303 800 University Way, Spartanburg, SC 29303 [email protected] [email protected] Abstract – Information growth is expanding to an almost other networks for storage. While these systems solve some of unmanageable state in today’s society. A challenging and active area the data storage space issue, storage space can still be of information management is data compression. This paper provides expensive in the long run. an analysis of compression and decompression algorithms. In particular, this research analyzes the run time efficiency and Reducing the amount of space by compressing data can performance of compression algorithms. Several commercial help reduce the storage cost and save some storage space. Data compression algorithms (e.g., Huffman, Flate/Deflate, LZW) were compression can bring an array of benefits if performed used to gather empirical results. The results in this study show that effectively [3]. Accessing compressed data relies on the the algorithms performance (i.e., run time, compression efficiency) amount of time it takes to decompress the data; also, the can be different when executed either individually or combined. compression time for a larger data set can take additional Keywords— Compression algorithms, lossy compression, processing time. The problem with saving space with data lossless compression, Huffman , Flate/Deflate, LZW compression compression raises the competing issues of processing time vs. compression efficiency. I. INTRODUCTION To be able to compress and decompress as fast as possible Technology has given society the ability to learn and adapt to without losing data, the best algorithm must be selected. This the environment. Collecting data helps to provide answers to research provides an analysis of compression and challenging questions; moreover, collecting and managing decompression (compression) algorithms. In particular, this data is essential to the sustainability of society. Data collected research analyzes the run time efficiency and effectiveness of over many years poses the following questions for society: compression algorithms. Several commercial compression “How can data storage methods be improved for easy algorithms (e.g., Huffman, Flate/Deflate, LZW) are used in accessibility?”; “How much space can be saved when storing this study. information?”; and “How much time does it take to process This paper is organized as follows. In Section II, related and access data?” work is discussed. In Section III, an overview of the The exponential growth of the amount of data is rising compression and decompression process is presented. Section quickly. Due to the yearly rate of increase managing data can IV discusses the analysis. Section V concludes the paper. be difficult. The large volume of data growth poses both a technological and economical dilemma, especially in relation II. RELATED WORK to storage space and processing time. In fact, large amounts of This section discusses research using compression data collected over years can make an organization’s data algorithm methods for data storage. The more efficient data storage management expensive. Equation 1 shows the compression methods used in data storage eliminates the computed cost of outsourcing data storage [1, 4]. redundancies of data items in order to improve storage space TotalCost = InitialCost + FloorCost + and to reduce data storage cost [20]. Lossless data EnergyCost + ServiceCost + DisposalCost compression algorithms are considered the best approach to + EnvironmentalCost (1) encode and decode data without losing data in the process. Researchers have developed many ways to improve data File system compression is a lossless algorithm that storage processing. In [8], methods such as 1) Direct Attached compresses every data component [20]. This algorithm Storage (DAS) (subsystems to store data are linked locally on originates from DiskDoubler and SuperStor methods; these a computer), 2) Network Attached Storage (NAS) (subsystems methods were used in early computers to support hard drives to store data linked to a network via simple file-serving that had limited storage capacity. The disadvantage of file appliance), and 3) Storage Area Networks (SAN) (subsystems system compression is that processing running time is high [2, to store data linked together on a network) are used to access 20]. ISBN: 1-60132-441-3, CSREA Press © 34 Int'l Conf. Information and Knowledge Engineering | IKE'16 | NetApp and Rival EMC Corp are companies that propose data compression solutions [17]. NetApp proposes technology to address the increasing demand of managing data storage. The technology proposed by NetApp is considered as one of the most efficient data storage system; however, the technology does not perform well when 1) locating data (i.e., files or directories) on tape or 2) restoring data from tape [17]. To compete with NetApp, Rival EMC Corp developed an application called Celerra Data Deduplication, which focuses on data duplication and data storage; this application compresses data before handling deduplication of data [17]. Strom and Wennersten [19] discuss that lossless Fig. 1. Data compression and decompression example taken from [5] compression of already compressed textures, due to the fact Data compression presses the data to allow a smaller 1) that texture codecs are usually not adept to pausing, involves amount of disk space in the storage unit and 2) bandwidth on an issue texture which is downloaded over a network or the data broadcast channel. The best examples of systems reading on a disc. Texture compression aids rendering by using data compression are network routers, phone and most minimizing the footmark in graphics memory. The solution electronic systems of information exchange. Data compression proposes to address compression and decompression is valuable because it allows faster transference of data than efficiency is to predict compression parameters [19]. The uncompressed data. Because compressed data takes less space, limitation encountered in the proposed solution was that the it is also cost efficient to store. In data compression there are system could only resolve the slow transmission time of data two eminent compression concepts in use which are lossy over a network but could not improve the graphics memory compression and lossless compression [5, 6, 10]. footprint [19]. Lossy compression is the class of data encoding that uses a Peel, Wirth, and Zobel [11] discuss how their scheme limited quantity of data discarding procedures to symbolize accomplishes a more efficient (i.e., better scale) compression the data content. Using the lossy method to compress and for larger data than current compression systems. The space decompress data may affect data integrity by converting the necessary for their compression algorithm is sub-linear to the data into a slightly different state than the original state, i.e., data size. They accomplish amelioration by achieving decompressed data may not be a completely restored. compression of multiple files, several times better than the However, this slightly different state is sufficient to use in the compression of gzip or 7-zip. Although their application compression process; all the bits of data remain in the data file compresses data faster than 7-zip, Peel et al. [11] only after the data is decompressed which guarantees that the data compared their application against a small number of systems. is not lost. The lossy compression algorithm removes Also, the proposed application in [11] avoids reading other information that is considered insignificant from the original files while compressing new data. data state when it performs the compression process. The algorithm builds the data by using space efficiently to produce Millard, Nunez and Mulvane [9] discuss a hardware an efficient data format; it also generates a much smaller solution, which is a high- performance application for a compressed data file than the lossless method [5, 6, 10]. parallel multi-compressor chip. This research shows that input and output choices can have a negative effect on performances Lossless compression is a data compression method that regarding the routing strategies, which shows that the scheme permits the original compressed data to be decompressed of parallel compression system is affected by the compression without losing data integrity. Lossless data compression performance system [9]. To solve this issue, Millard et al. [9] algorithms find and repeat patterns to ultimately reduce data proposes a scalable compression solution to be used at redundancy before encoding the data. If data redundancy is throughputs into the field programming gate array hardware high in the input data, then the file size of the compressed data cable to handle the modern high-bandwidth system for will be low [5]. intensive data processing. This process may prove successful, but the performance and run time are still in question. IV. ANALYSIS Throughout the research done on data compressing and decompression, all the solutions have running time issues A. Experiment Overview between compressing and decompressing information. Fewer This research shows the problem

Performance of Compression Algorithms Used in Data Management Software

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support