The Affection of HTTP Compression in the Performance of Core Web Components Khushali Tirghoda

International Journal of Scientific & Engineering Research, Volume 3, Issue 5, May-2012 1 ISSN 2229-5518 The Affection of HTTP Compression in the Performance of Core web Components Khushali Tirghoda Abstract— HTTP compression addresses some of the performance problems of the Web by attempting to reduce the size of resources transferred between a server and client thereby conserving bandwidth and reducing user perceived latency.Currently, most modern browsers and web servers support some form of content compression. Additionally, a number of browsers are able to perform streaming decompression of gzipped content. Despite this existing support for HTTP compression, it remains an underutilized feature of the Web today. This can perhaps be explained, in part, by the fact that there currently exists little proxy support for the Vary header, which is necessary for a proxy cache to correctly store and handle compressed content.To demonstrate some of the quantitative benefits of compression, I conducted a test to determine the potential byte savings for a number of popular web sites. Index Terms HTTP Compression Comparison, Compression Ratio Measurements, Apache Web Server Performance —————————— —————————— 1 INTRODUCTION User perceived latency is one of the main performance efficient use of network bandwidth. Compressed content can problems plaguing the World Wide Web today. At one point also provide monetary savings for those individuals who pay a or another every Internet user has experienced just how pain- fee based on the amount of bandwidth they consume. More fully slow the ―World Wide Wait‖ can be. As a result, there importantly, though, since fewer bytes are transmitted, clients has been a great deal of research and development focused on would typically receive the resource in less time than if it had improving Web performance. been sent uncompressed. This is especially true for narrowband Currently there exist a number of techniques designed to clients. Modems typically present what is referred to as the bring content closer to the end user in the hopes of conserving weakest link or longest mile in a data transfer; hence methods to bandwidth and reducing user perceived latency, among other reduce download times are especially pertinent to these users. things. Such techniques include prefetching, caching and con- Furthermore, compression can potentially alleviate some of tent delivery networks. However, one area that seems to have the burden imposed by the TCP slow start phase. The TCP slow drawn only a modest amount of attention involves HTTP start phase is a means of controlling the amount of congestion compression. on a network. It works by forcing a small initial congestion Many Web resources, such as HTML, JavaScript, CSS and window on each new TCP connection thereby limiting the XML documents, are simply ASCII text files. Given the fact that number of maximum-size packets that can initially be transmit- such files often contain many repeated sequences of identical ted by the sender [2]. Upon the reception of an ACK packet, the information they are ideal candidates for compression. Other sender’s congestion window is increased. This continues until a resources, such as JPEG and GIF images and streaming audio packet is lost, at which point the size of the congestion window and video files, are precompressed and hence would not benefit is decreased [2]. This process of increasing and decreasing the from further compression. As such, when dealing with HTTP congestion window continues throughout the connection in compression, focus is typically limited to text resources, which order to constantly maintain an appropriate transmission rate stand to gain the most byte savings from compression. [2]. In this way, a new TCP connection avoids overburdening a Encoding schemes for such text resources must provide loss- network with large bursts of data. Due to this slow start phase, less data compression. As the name implies, a lossless data the first few packets that are transferred on a connection are compression algorithm is one that can recreate the original data, relatively more expensive than subsequent ones. Also, one can bit-for-bit, from a compressed file. One can easily imagine how imagine that for the transfer of small files, a connection may not the loss or alteration of a single bit in an HTML file could affect reach its maximum transfer rate because the transfer may reach its meaning. completion before it has the chance to get out of the TCP slow The goal of HTTP compression is to reduce the size of certain start phase. So, by compressing a resource, more data effectively resources that are transferred between a server and client. By fits into each packet. This in turns results in fewer packets being reducing the size of web resources, compression can make more transferred thereby lessening the effects of slow start (reducing the number of server stalls) [4, 6, 14, 3]. ———————————————— In the case where an HTML document is sent in a com- Khushali is currently working as Assistant Professor in MCA Department pressed format, it is probable that the first few packets of data at IITE,Ahmedabad,India will contain more HTML code and hence a greater number of E-mail: [email protected] inline image references than if the same document had been sent uncompressed. As a result, the client can subsequently issue requests for these embedded resources quicker hence easing some of the slow start burden. Also, inline objects are likely to IJSER © 2012 http://www.ijser.org International Journal of Scientific & Engineering Research Volume 3, Issue 5, May-2012 2 ISSN 2229-5518 be on the same server as this HTML document. Therefore an proves inadequate when multiple versions of the same re- HTTP/1.1 compliant browser may be able to pipeline these resource exist - in this case, a compressed and uncompressed quests onto the same TCP connection [6]. Thus, not only does representation. This problem was addressed in HTTP/1.1 the client receive the HTML file in less time he/she is also able with the inclusion of the Vary response header. A cache could to expedite the process of requesting embedded resources [6, then store both a compressed and uncompressed version of 14]. the same object and use the Vary header to distinguish between the two. The Vary header is used to indicate which re- Currently, most modern browsers and web servers support sponse headers should be analyzed in order to determine the some form of content compression. Additionally, a number of appropriate variant of the cached resource to return to the browsers are able to perform streaming decompression of client [2]. gzipped content. This means that, for instance, such a browser could decompress and parse a gzipped HTML file as each suc- 4 RELATED WORK cessive packet of data arrives rather than having to wait for the entire file to be retrieved before decompressing. Despite all of In [1], Mogul et al. quantified the potential benefits of delta the aforementioned benefits and the existing support for HTTP encoding and data compression for HTTP by analyzing lives compression, it remains an underutilized feature of the Web traces from Digital Equipment Corporation (DEC) and an today. AT&T Research Lab. The traces were filtered in an attempt to remove requests for precompressed content; for example, references to GIF, JPEG and MPEG files. The authors then esti- OPULAR OMPRESSION CHEMES 2 P C S mated the time and byte savings that could have been Although, there exists many different lossless compression achieved had the HTTP responses to the clients been delta algorithms today, most are variations of two popular schemes: encoded and/or compressed. The authors determined that in Huffman encoding and the Lempel-Ziv algorithm. the case of the DEC trace, of the 2465 MB of data analyzed, 965 Huffman encoding works by assigning a binary code to each MB, or approximately 39%, could have been saved had the of the symbols (characters) in an input stream (file). This is content been gzip compressed. For the AT&T trace, 1054MB, accomplished by first building a binary tree of symbols based or approximately 17%, of the total 6216 MB of data could have on their frequency of occurrence in a file. The assignment of been saved. Furthermore, retrieval times could have been re- binary codes to symbols is done in such a way that the most duced 22% and 14% in the DEC and AT&T traces, respective- frequently occurring symbols are assigned the shortest binary ly. The authors remarked that they felt their results demon- codes and the least frequently occurring symbols assigned the strated a significant potential improvement in response size longest codes. This in turn creates a smaller compressed file and response delay as a result of delta encoding and compres- [7]. sion. The Lempel–Ziv algorithm, also known as LZ-77, exploits the In [8], the authors attempted to determine the performance redundant nature of data to provide compression. The algo- benefits of HTTP compression by simulating a realistic work- rithm utilizes what is referred to as a sliding window to keep load environment. This was done by setting up a web server track of the last n bytes of data seen. Each time a phrase is en- and replicating the CNN site on this machine. The authors countered that exists in the sliding window buffer, it is re- then accessed the replicated CNN main page and ten subsec- placed with a pointer to the starting position of the previously tions within this page (i.e. World News, Weather, etc), empty- occurring phrase in the sliding window along with the length ing the cache before each test. Analysis of the total time to load of the phrase [7].

Load more