Adaptive On-The-Fly Compression
Total Page:16
File Type:pdf, Size:1020Kb
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 17, NO. 1, JANUARY 2006 15 Adaptive On-the-Fly Compression Chandra Krintz and Sezgin Sucu Abstract—We present a system called the Adaptive Compression Environment (ACE) that automatically and transparently applies compression (on-the-fly) to a communication stream to improve network transfer performance. ACE uses a series of estimation techniques to make short-term forecasts of compressed and uncompressed transfer time at the 32KB block level. ACE considers underlying networking technology, available resource performance, and data characteristics as part of its estimations to determine which compression algorithm to apply (if any). Our empirical evaluation shows that, on average, ACE improves transfer performance given changing network types and performance characteristics by 8 to 93 percent over using the popular compression techniques that we studied (Bzip, Zlib, LZO, and no compression) alone. Index Terms—Adaptive compression, dynamic, performance prediction, mobile systems. æ 1 INTRODUCTION UE to recent advances in Internet technology, the networking technology available changes regularly, e.g., a Ddemand for network bandwidth has grown rapidly. user connects her laptop to an ISDN link at home, takes her New distributed computing technologies, such as mobile laptop to work and connects via a 100Mb/s Ethernet link, and computing, P2P systems [10], Grid computing [11], Web attends a conference or visits a coffee shop where she uses the Services [7], and multimedia applications (that transmit wireless communication infrastructure that is available. The video, music, data, graphics files), cause bandwidth compression technique that performs best for this user will demand to double every year [21]. Not only does the depend on the network she is connected to. Moreover, as the volume of data transmitted by these applications increase, load changes at communication end-points or on the network but so does the frequency of transmission. As a result, novel itself, the best-performing compression technique for the techniques are needed that increase the network bandwidth same network can also change. available to Internet applications. Each of these limitations makes it increasingly difficult Compression is one such technique that increases the for users to identify the best compression technique in all amount of bandwidth available to an application. Compres- circumstances. In this paper, we present a novel compres- sion can be used offline (before compressed transfer sion system that identifies the best compression technique commences) or on-the-fly (as the data is generated). automatically and transparently. Our system, called the Compression techniques reduce the amount of data Adaptive Compression Environment (ACE), intercepts transmitted by eliminating much of the redundancy that program communication and applies compression on-the- is characteristic in most data sets. fly. On-the-fly compression is useful to or required by many Unfortunately, there are several limitations inherent in applications due to storage capacity (limited or abundant), the use of compression for efficient data communication. or to the nature of the data-generation process, e.g., First, compression techniques vary in their performance dynamic Web and data streaming applications. characteristics: compressed size, compression time, and ACE adaptively selects a compression technique that best decompression time. Techniques are either optimized to be suites the underlying networking technology, performance fast or to enable significant compaction. This trade-off is available, and data characteristics, to improve communica- fundamental to performing compression due to the algo- tion performance. To enable this, ACE predicts whether rithmic complexity required to remove a significant portion applying compression will be profitable, and when it is, of the redundancy in the data. Thus, no single compression which compression algorithm to apply. ACE selects technique enables the best performance for all data formats. between a number of well-known, competitive, compres- Second, compression performance is dependent upon the sion techniques including Bzip [3], Zlib [26], and LZO [18]. performance and availability of the underlying resources, To make predictions of underlying resource performance, e.g., network bandwidth and latency, CPU loads. This ACE employs the Network Weather Service (NWS) [24], performance varies significantly across technologies as well [23], an efficient and accurate forecasting toolkit used in as over time for the same technology. The latter impacts Computational Grid [9] systems. ACE couples NWS mobile devices like laptops for which the underlying predictions with those from its own internal models that estimate compression performance and changes in data . The authors are with the Computer Science Department, University of compressibility, i.e., entropy. California, Santa Barbara, CA 93106-5110. We empirically compare ACE to commonly used E-mail: {ckrintz, sucu}@cs.ucsb.edu. compression techniques: no compression, Bzip, LZO, and Manuscript received 28 Jan. 2004; revised 16 Nov. 2004; accepted 30 Mar. Zlib. Given different network and end-point performance, 2005; published online 28 Nov. 2005. For information on obtaining reprints of this article, please send e-mail to: each of these techniques outperforms the others for [email protected], and reference IEEECS Log Number TPDS-0027-0104. different transfer scenarios. ACE is able to perform 1045-9219/06/$20.00 ß 2006 IEEE Published by the IEEE Computer Society 16 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 17, NO. 1, JANUARY 2006 similarly to the best performing technique in all scenarios. interface through which ACE can efficiently acquire the The key contribution made by ACE is that it can adapt to predictions and actual measurements; ACE uses the latter changing performance levels (CPU and network perfor- to evaluate the efficacy of its decisions. mance). We investigate three such scenarios in which the ACE intercepts (blocking) TCP/IP socket communica- underlying network technology changes and when CPU tions made by Java programs. When a connection between and network load changes. We show that ACE is able to two hosts is established, ACE registers the host IP addresses improve transfer performance by 8 to 93 percent on average with the (possibly remote) NWS system. Each host executes across the experiments and scenarios that we investigated. an NWS CPU and network bandwidth sensor that are In the following section, we overview the ACE system initiated by ACE if they are not already running. and the extant technologies that it employs and extends. We When ACE determines that a transmission of sufficient overview the ACE design and implementation in Section 3 length is being made, it queries the NWS system for forecasts and its prediction system in Section 4. We then present our of future network bandwidth and latency between the hosts empirical evaluation of ACE in Section 5, the related work and the future CPU availability of each. In addition to in Section 6, and conclude in Section 7. predictions of underlying resource performance, ACE uses past socket behavior, predicted compression ratio, feedback 2 THE ADAPTIVE COMPRESSION ENVIRONMENT on the accuracy of its previous compression decisions, and (ACE) the characteristics of the available compression algorithms, to To evaluate the efficacy with which we can apply compute the predicted transfer performance. Using this in- compression automatically and transparently based on formation, ACE decides when to apply compression and predictions of future resource performance and data which compression technique to use (if any). entropy, we developed the Adaptive Compression Envir- ACE identifies when an incoming stream is compressed onment (ACE). ACE couples two different extant technol- and decompresses it. ACE forwards the decompressed data ogies: The Open Runtime Platform (ORP) [6], [1] from the to the application. We next describe the primary compo- Intel Microprocessor Research Lab (MRL), and the Network nents of the ACE. Weather Service (NWS) [25], [23] from the University of California, Santa Barbara. 3 ACE IMPLEMENTATION ORP is an open-source, dual compiler, adaptive compila- tion system for Java. We selected ORP because of its efficient To enable transparent compression at the TCP socket level, implementation and cutting-edge technologies (dynamic ACE intercepts calls within the native socket implementa- compilation and adaptive optimization). ACE, however, is tion. That is, ACE does not modify the code of the programs not Java-dependent. That is, we can implement the ACE that are executing, in any way. ACE intercepts socket modules in an operating system, e.g., Linux. However, we connect, accept, read, write, send, and recv calls. chose to prototype our techniques in a Java Virtual Machine ACE determines when to compress by monitoring socket due to the reduced complexity of the system implementation write1 behavior. ACE considers data in 32KB blocks. We (over Linux) and to our extensive prior experience with the selected a 32KB block size by collecting compression language and execution environment. performance metrics for a large number of diverse file We extended ORP with a module that transparently types and three different compression algorithms (LZO, intercepts TCP/IP socket communications and automati- Bzip, and Zlib). We empirically evaluated