Energy Aware Lossless Data Compression by MASSACHUSMJS TTT of Techivolgru1 Kenneth C

Total Page:16

File Type:pdf, Size:1020Kb

Energy Aware Lossless Data Compression by MASSACHUSMJS TTT of Techivolgru1 Kenneth C Energy Aware Lossless Data Compression by MASSACHUSMJS TTT OF TECHIVOLGrU1 Kenneth C. Barr H^IVTEOL GY B.S.E. Computer Engineering NE V 8 2002 University of Michigan, April 2000 LIBRARIES Submitted to the Department of Electrical Engineering and Computer Science BARKER in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Engineering at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2002 © Massachusetts Institute of Technology 2002. All rights reserved. Author ................... '........................... Department of Electrical Engineering and Computer Science August 16, 2002 Certified by ..... Krste Asanovid Assistant Professor Thesis Supervisor Accepted by...... .... Arthur C. Smith Chairman, Department Committee on Graduate Students Energy Aware Lossless Data Compression by Kenneth C. Barr Submitted to the Department of Electrical Engineering and Computer Science on August 16, 2002, in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Engineering Abstract Wireless transmission of a bit can require over 1000 times more energy than a single 32-bit computation. It may therefore be desirable to perform additional computation to reduce the number of bits transmitted. If the energy required to compress data is less than the energy required to send it, there is a net energy savings and consequently, a longer battery life for portable computers. This thesis is a study of the energy profiles of lossless data com- pression algorithms. Several distinct algorithms have been selected and are measured on a StrongARM SA-1 10 processor. This work demonstrates that with several typical compres- sion tools, there is a net energy increase when compression is applied before transmission. Reasons for this increase are explained and suggestions are made to avoid it. Compres- sion and decompression need not be performed by the same algorithm. By choosing the lowest-energy compressor and decompressor on the test platform, rather than using default levels of compression, overall energy to send data can be reduced 57%. Compared with a system using the same optimized application for both compression and decompression, the asymmetric scheme saves 11% of the total energy. Thesis Supervisor: Krste Asanovi6 Title: Assistant Professor 3 4 Acknowledgments This thesis would not have been possible without the help, generosity, and kindness of sev- eral people. My advisor, Krste Asanovi6, not only suggested the intriguing topic of this thesis, but unfailingly provided support, direction, and advice as I worked to complete it. He is an extremely responsible and accessible advisor, and I am thankful for the opportu- nity to work with him and the Assam group. Scott Ananian and Jamey Hicks came to the rescue with Skiff help early on, allowing me to have a stable platform on which to evaluate the energy of applications. David Wentzlaff, Nathan Shnidman, and John Ankcorn helped me build confidence and acquire equipment in the hardware lab. Ronny Krashinsky pa- tiently explained ways to look at energy and how to measure it. His work in compressing HTTP traffic was the jumping-off point for this thesis. Christopher Batten, my officemate, went above and beyond the call of duty. I appreciate his suggestions, curiosity, creativ- ity, idealism, and perspective. His help made a big difference in the quality of this thesis. Caroline and my family provided an equally important form of help through unflagging encouragement, motivation, and love. 5 6 Contents 1 Introduction 13 2 Related work 15 2.1 Energy measurement and estimation .......... ........... 15 2.2 Data compression for low-bandwidth devices .......... ...... 17 2.3 Optimizing algorithms for low energy .................... 19 3 Lossless data compression overview 21 3.1 Terminology ....................... ........... 21 3.2 C oding . ......................... ........... 22 3.2.1 Huffman coding ........................... 22 3.2.2 Arithmetic coding .......................... 23 3.2.3 Lempel-Ziv codes ................ .......... 24 3.3 Lossless compression algorithms ............ ........... 24 3.3.1 Sliding window - LZ77 ........ ............ .... 24 3.3.2 Dictionary - LZ78 .......... ............ .... 27 3.3.3 Prediction with Partial Match - PPM ... .............. 28 3.3.4 Burrows-Wheeler Transform - BWT ........... ...... 29 3.4 Performance and implementation concerns ......... ......... 30 4 Evaluation of compression applications 33 4.1 Benchmark selection ............................. 33 4.2 Methodology ................................. 38 7 4.2.1 Equipment .. .... ... .... .... 38 4.2.2 Energy calculations ... ... ... ... .. .. .. .. ... 3 9 4.2.3 Error analysis ... ... ... .... ... .. ... .. .. 4 0 4.2.4 Simulation . .... .... .... .... ... ... ... 4 2 4.3 Motivation and misconception ... ..... .... .... ... .. 4 4 4.3.1 High communication-to-computation ratio.... .. .. ... .. 4 5 4.3.2 ...is not exploited by popular compressors . .. ... .... 4 6 4.4 Energy analysis of popular compressors .... ... .. ... .... 4 9 4.4.1 Instruction count . ...... ..... ... .. ... .... 4 9 4.4.2 Memory hierarchy ...... ..... ... ....... ... 4 9 4.4.3 Minimizing memory access energy..... ...... .... 5 0 4.4.4 Instruction mix .... ........ .... .......... 5 6 4.5 Summary .. ......... ......... ... ......... 5 8 5 Reducing the energy of transmitting compressed data 65 5.1 Understanding cache behavior .... ........ 65 5.2 Exploiting the sleep mode ...... ........ 67 5.3 Reducing energy on the Skiff ............. 68 6 Conclusion and future work 73 8 List of Figures 3-1 Hash table implementation of LZ77 . .......... .......... 26 4-1 Application comparison by traditional metrics ..... ......... 35 4-2 Simplified Skiff power schematic ............ ... ... ... 39 4-3 Using a simulator to predict energy ......... .. ......... 44 4-4 Communication energy ............... .. ...... .. 46 4-5 Energy required to send compressible 1MB file .. .. ...... .. 48 4-6 Energy required to receive a compressible 1MB file .. ...... ... 48 4-7 Memory, time, and ratio. Memory footprint is indicated by area of circle; footprints shown range from 3KB - 8MB ........ .. ... .. 51 4-8 Energy required to send compressible 1MB file .. .. ....... 52 4-9 Energy required to receive a compressible 1MB file .. ....... 52 4-10 Cache performance: absolute counts .......... ... .... 55 4-11 Cache performance: data cache miss rate ........ .... .. ... 56 4-12 Instruction Mix. Number in parenthesis shows absolute number of instruc- tions (static) and billions of absolute instructions (dynamic) ....... 57 4-13 Branch behavior ......................... ....... 57 4-14 Average power of compression applications .......... ..... .. 59 4-15 Average power of decompression applications ......... ....... 59 4-16 Total energy as CPU energy decreases ............. ....... 62 4-17 Total energy as memory energy decreases ........... .... ... 62 4-18 Total energy as both CPU and memory energy decreases ... .. I.... .. 63 4-19 Total energy as network energy decreases ........... ... .... 63 9 5-1 Optimizing compress . .... ... .... .... ... .... ... ... 68 5-2 Compression + Send energy consumption with varying sleep power .. .. 69 5-3 Receive + Decompression energy consumption with varying sleep power 69 5-4 Receive + Decompression energy stays constant across zlib parameters . .. 70 5-5 Choosing an optimal compressor-decompressor pair ... ... ... ... 71 10 List of Tables 3.1 Hash table implementation of LZW ..................... 28 4.1 Compression applications and their algorithms ................ 34 4.2 Compression ratio .......... ..................... 36 4.3 Statically allocated memory (KB) . ..................... 36 4.4 Compression time ............................... 37 4.5 Decompression time ........ ..................... 37 4.6 Maximum measurement error: compression ................. 42 4.7 Maximum measurement error: decompression ................ 42 4.8 Total Energy of an ADD ........................... 47 4.9 Instructions per bit .............................. 49 4.10 Measured memory energy vs ADD energy ....... ........... 50 4.11 Ranking compression applications by four metrics .............. 58 4.12 Ranking energy of compression applications including network energy ... 58 11 12 Chapter 1 Introduction This thesis is motivated by previous work in energy-aware computing and the communica- tion - computation energy gap: wireless communication is an essential component of mo- bile computing, but the energy required for transmission of a single bit has been measured to be over 1000 times greater than a single 32-bit computation. Thus, if 1000 computation operations can compress data by even one bit, energy should be saved. Algorithms which once seemed too resource or time intensive might be valuable for saving energy. Imple- mentations which made compression concessions to gain performance might be modified to provide an overall energy savings. Ideally, the effort exerted to compress data should be variable so one may trade speed for energy. While some types of data (e.g., audio and video) may accept some degradation in quality, other data must be transmitted faithfully with no loss of information. Such data presents a unique challenge in that, unlike related work in lossy compression for reducing energy, one cannot sacrifice fidelity of the data to achieve energy goals. This work provides an in-depth examination of the energy requirements of several lossless data
Recommended publications
  • Application Profile Avi Networks — Technical Reference (16.3)
    Page 1 of 12 Application Profile Avi Networks — Technical Reference (16.3) Application Profile view online Application profiles determine the behavior of virtual services, based on application type. The application profile types and their options are described in the following sections: HTTP Profile DNS Profile Layer 4 Profile Syslog Profile Dependency on TCP/UDP Profile The application profile associated with a virtual service may have a dependency on an underlying TCP/UDP profile. For example, an HTTP application profile may be used only if the TCP/UDP profile type used by the virtual service is set to type TCP Proxy. The application profile associated with a virtual service instructs the Service Engine (SE) to proxy the service's application protocol, such as HTTP, and to perform functionality appropriate for that protocol. Application Profile Tab Select Templates > Profiles > Applications to open the Application Profiles tab, which includes the following functions: Search: Search against the name of the profile. Create: Opens the Create Application Profile popup. Edit: Opens the Edit Application Profile popup. Delete: Removes an application profile if it is not currently assigned to a virtual service.Note: If the profile is still associated with any virtual services, the profile cannot be removed. In this case, an error message lists the virtual service that still is referencing the application profile. The table on this tab provides the following information for each application profile: Name: Name of the Profile. Type: Type of application profile, which will be either: DNS: Default for processing DNS traffic. HTTP: Default for processing Layer 7 HTTP traffic.
    [Show full text]
  • SSL/TLS Implementation CIO-IT Security-14-69
    DocuSign Envelope ID: BE043513-5C38-4412-A2D5-93679CF7A69A IT Security Procedural Guide: SSL/TLS Implementation CIO-IT Security-14-69 Revision 6 April 6, 2021 Office of the Chief Information Security Officer DocuSign Envelope ID: BE043513-5C38-4412-A2D5-93679CF7A69A CIO-IT Security-14-69, Revision 6 SSL/TLS Implementation VERSION HISTORY/CHANGE RECORD Person Page Change Posting Change Reason for Change Number of Number Change Change Initial Version – December 24, 2014 N/A ISE New guide created Revision 1 – March 15, 2016 1 Salamon Administrative updates to Clarify relationship between this 2-4 align/reference to the current guide and CIO-IT Security-09-43 version of the GSA IT Security Policy and to CIO-IT Security-09-43, IT Security Procedural Guide: Key Management 2 Berlas / Updated recommendation for Clarification of requirements 7 Salamon obtaining and using certificates 3 Salamon Integrated with OMB M-15-13 and New OMB Policy 9 related TLS implementation guidance 4 Berlas / Updates to clarify TLS protocol Clarification of guidance 11-12 Salamon recommendations 5 Berlas / Updated based on stakeholder Stakeholder review / input Throughout Salamon review / input 6 Klemens/ Formatting, editing, review revisions Update to current format and Throughout Cozart- style Ramos Revision 2 – October 11, 2016 1 Berlas / Allow use of TLS 1.0 for certain Clarification of guidance Throughout Salamon server through June 2018 Revision 3 – April 30, 2018 1 Berlas / Remove RSA ciphers from approved ROBOT vulnerability affected 4-6 Salamon cipher stack
    [Show full text]
  • Protecting Encrypted Cookies from Compression Side-Channel Attacks
    Protecting encrypted cookies from compression side-channel attacks Janaka Alawatugoda1, Douglas Stebila1;2, and Colin Boyd3 1 School of Electrical Engineering and Computer Science, 2 School of Mathematical Sciences 1;2 Queensland University of Technology, Brisbane, Australia [email protected],[email protected] 3 Department of Telematics, Norwegian University of Science and Technology, Trondheim, Norway [email protected] December 28, 2014 Abstract Compression is desirable for network applications as it saves bandwidth; however, when data is compressed before being encrypted, the amount of compression leaks information about the amount of redundancy in the plaintext. This side channel has led to successful CRIME and BREACH attacks on web traffic protected by the Transport Layer Security (TLS) protocol. The general guidance in light of these attacks has been to disable compression, preserving confidentiality but sacrificing bandwidth. In this paper, we examine two techniques|heuristic separation of secrets and fixed-dictionary compression|for enabling compression while protecting high-value secrets, such as cookies, from attack. We model the security offered by these techniques and report on the amount of compressibility that they can achieve. 1This is the full version of a paper published in the Proceedings of the 19th International Conference on Financial Cryptography and Data Security (FC 2015) in San Juan, Puerto Rico, USA, January 26{30, 2015, organized by the International Financial Cryptography Association in cooperation with IACR. 1 Contents 1 Introduction 3 2 Definitions 6 2.1 Encryption and compression schemes.........................6 2.2 Existing security notions................................7 2.3 New security notions..................................7 2.4 Relations and separations between security notions.................8 3 Technique 1: Separating secrets from user inputs9 3.1 The scheme.......................................9 3.2 CCI security of basic separating-secrets technique.................
    [Show full text]
  • Data Compression for Remote Laboratories
    Paper—Data Compression for Remote Laboratories Data Compression for Remote Laboratories https://doi.org/10.3991/ijim.v11i4.6743 Olawale B. Akinwale Obafemi Awolowo University, Ile-Ife, Nigeria [email protected] Lawrence O. Kehinde Obafemi Awolowo University, Ile-Ife, Nigeria [email protected] Abstract—Remote laboratories on mobile phones have been around for a few years now. This has greatly improved accessibility of these remote labs to students who cannot afford computers but have mobile phones. When money is a factor however (as is often the case with those who can’t afford a computer), the cost of use of these remote laboratories should be minimized. This work ad- dressed this issue of minimizing the cost of use of the remote lab by making use of data compression for data sent between the user and remote lab. Keywords—Data compression; Lempel-Ziv-Welch; remote laboratory; Web- Sockets 1 Introduction The mobile phone is now the de facto computational device of choice. They are quite powerful, connected devices which are almost always with us. This is so much so that about every important online service has a mobile version or at least a version compatible with mobile phones and tablets (for this paper we would adopt the phrase mobile device to represent mobile phones and tablets). This has caused online labora- tory developers to develop online laboratory solutions which are mobile device- friendly [1, 2, 3, 4, 5, 6, 7, 8]. One of the main benefits of online laboratories is the possibility of using fewer re- sources to serve more people i.e.
    [Show full text]
  • A Comprehensive Study of the BREACH A8ack Against HTTPS
    A Comprehensive Study of the BREACH A8ack Against HTTPS Esam Alzahrani, JusCn Nonaka, and Thai Truong 12/03/13 BREACH Overview Browser Reconnaissance and Exfiltraon via AdapCve Compression of Hypertext Demonstrated at BlackHat 2013 by Angelo Prado, Neal Harris, and Yoel Gluck • Chosen plaintext aack against HTTP compression • Client requests a webpage, the web server’s response is compressed • The HTTP compression may leak informaon that will reveal encrypted secrets about the user Network Intrusion DetecCon System Edge Firewall Switch Router DMZ Clients A8acker (Vicm) 2 BREACH Requirements Requirements for chosen plain text (side channel) aack • The web server should support HTTP compression • The web server should support HTTPS sessions • The web server reflects the user’s request • The reflected response must be in the HTML Body • The aacker must be able to measure the size of the encrypted response • The aacker can force the vicCm’s computer to send HTTP requests • The HTTP response contains secret informaon that is encrypted § Cross Site Request Forgery token – browser redirecCon § SessionID (uniquely idenCfies HTTP session) § VIEWSTATE (handles mulCple requests to the same ASP, usually hidden base64 encoded) § Oath tokens (Open AuthenCcaon - one Cme password) § Email address, Date of Birth, etc (PII) SSL/TLS protocol structure • X.509 cerCficaon authority • Secure Socket Layer (SSL) • Transport Layer Security (TLS) • Asymmetric cryptography for authenCcaon – IniCalize on OSI layer 5 (Session Layer) – Use server public key to encrypt pre-master
    [Show full text]
  • A Perfect CRIME?
    AA PerfectPerfect CRIME?CRIME? OnlyOnly TIMETIME WillWill TellTell Tal Be'ery, Amichai Shulman i ii Table of Contents 1. Abstract ................................................................................................................ 4 2. Introduction to HTTP Compression ................................................................. 5 2.1 HTTP compression and the web .............................................................................................. 5 2.2 GZIP ........................................................................................................................................ 6 2.2.1 LZ77 ................................................................................................................................ 6 2.2.2 Huffman coding ............................................................................................................... 6 3. CRIME attack ..................................................................................................... 8 3.1 Compression data leaks ........................................................................................................... 8 3.2 Attack outline ........................................................................................................................... 8 3.3 Attack example ........................................................................................................................ 9 4. Extending CRIME ............................................................................................
    [Show full text]
  • Randomized Lempel-Ziv Compression for Anti-Compression Side-Channel Attacks
    Randomized Lempel-Ziv Compression for Anti-Compression Side-Channel Attacks by Meng Yang A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Applied Science in Electrical and Computer Engineering Waterloo, Ontario, Canada, 2018 c Meng Yang 2018 I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Abstract Security experts confront new attacks on TLS/SSL every year. Ever since the compres- sion side-channel attacks CRIME and BREACH were presented during security conferences in 2012 and 2013, online users connecting to HTTP servers that run TLS version 1.2 are susceptible of being impersonated. We set up three Randomized Lempel-Ziv Models, which are built on Lempel-Ziv77, to confront this attack. Our three models change the determin- istic characteristic of the compression algorithm: each compression with the same input gives output of different lengths. We implemented SSL/TLS protocol and the Lempel- Ziv77 compression algorithm, and used them as a base for our simulations of compression side-channel attack. After performing the simulations, all three models successfully pre- vented the attack. However, we demonstrate that our randomized models can still be broken by a stronger version of compression side-channel attack that we created. But this latter attack has a greater time complexity and is easily detectable. Finally, from the results, we conclude that our models couldn't compress as well as Lempel-Ziv77, but they can be used against compression side-channel attacks.
    [Show full text]
  • Incompressibility and Lossless Data Compression: an Approach By
    Incompressibility and Lossless Data Compression: An Approach by Pattern Discovery Incompresibilidad y compresión de datos sin pérdidas: Un acercamiento con descubrimiento de patrones Oscar Herrera Alcántara and Francisco Javier Zaragoza Martínez Universidad Autónoma Metropolitana Unidad Azcapotzalco Departamento de Sistemas Av. San Pablo No. 180, Col. Reynosa Tamaulipas Del. Azcapotzalco, 02200, Mexico City, Mexico Tel. 53 18 95 32, Fax 53 94 45 34 [email protected], [email protected] Article received on July 14, 2008; accepted on April 03, 2009 Abstract We present a novel method for lossless data compression that aims to get a different performance to those proposed in the last decades to tackle the underlying volume of data of the Information and Multimedia Ages. These latter methods are called entropic or classic because they are based on the Classic Information Theory of Claude E. Shannon and include Huffman [8], Arithmetic [14], Lempel-Ziv [15], Burrows Wheeler (BWT) [4], Move To Front (MTF) [3] and Prediction by Partial Matching (PPM) [5] techniques. We review the Incompressibility Theorem and its relation with classic methods and our method based on discovering symbol patterns called metasymbols. Experimental results allow us to propose metasymbolic compression as a tool for multimedia compression, sequence analysis and unsupervised clustering. Keywords: Incompressibility, Data Compression, Information Theory, Pattern Discovery, Clustering. Resumen Presentamos un método novedoso para compresión de datos sin pérdidas que tiene por objetivo principal lograr un desempeño distinto a los propuestos en las últimas décadas para tratar con los volúmenes de datos propios de la Era de la Información y la Era Multimedia.
    [Show full text]
  • I Introduction
    PPM Performance with BWT Complexity: A New Metho d for Lossless Data Compression Michelle E ros California Institute of Technology e [email protected] Abstract This work combines a new fast context-search algorithm with the lossless source co ding mo dels of PPM to achieve a lossless data compression algorithm with the linear context-search complexity and memory of BWT and Ziv-Lemp el co des and the compression p erformance of PPM-based algorithms. Both se- quential and nonsequential enco ding are considered. The prop osed algorithm yields an average rate of 2.27 bits per character bp c on the Calgary corpus, comparing favorably to the 2.33 and 2.34 bp c of PPM5 and PPM and the 2.43 bp c of BW94 but not matching the 2.12 bp c of PPMZ9, which, at the time of this publication, gives the greatest compression of all algorithms rep orted on the Calgary corpus results page. The prop osed algorithm gives an average rate of 2.14 bp c on the Canterbury corpus. The Canterbury corpus web page gives average rates of 1.99 bp c for PPMZ9, 2.11 bp c for PPM5, 2.15 bp c for PPM7, and 2.23 bp c for BZIP2 a BWT-based co de on the same data set. I Intro duction The Burrows Wheeler Transform BWT [1] is a reversible sequence transformation that is b ecoming increasingly p opular for lossless data compression. The BWT rear- ranges the symb ols of a data sequence in order to group together all symb ols that share the same unb ounded history or \context." Intuitively, this op eration is achieved by forming a table in which each row is a distinct cyclic shift of the original data string.
    [Show full text]
  • Dc5m United States Software in English Created at 2016-12-25 16:00
    Announcement DC5m United States software in english 1 articles, created at 2016-12-25 16:00 articles set mostly positive rate 10.0 1 3.8 Google’s Brotli Compression Algorithm Lands to Windows Edge Microsoft has announced that its Edge browser has started using Brotli, the compression algorithm that Google open-sourced last year. 2016-12-25 05:00 1KB www.infoq.com Articles DC5m United States software in english 1 articles, created at 2016-12-25 16:00 1 /1 3.8 Google’s Brotli Compression Algorithm Lands to Windows Edge Microsoft has announced that its Edge browser has started using Brotli, the compression algorithm that Google open-sourced last year. Brotli is on by default in the latest Edge build and can be previewed via the Windows Insider Program. It will reach stable status early next year, says Microsoft. Microsoft touts a 20% higher compression ratios over comparable compression algorithms, which would benefit page load times without impacting client-side CPU costs. According to Google, Brotli uses a whole new data format , which makes it incompatible with Deflate but ensures higher compression ratios. In particular, Google says, Brotli is roughly as fast as zlib when decompressing and provides a better compression ratio than LZMA and bzip2 on the Canterbury Corpus. Brotli appears to be especially tuned for the web , that is for offline encoding and online decoding of Web assets, or Android APKs. Google claims a compression ratio improvement of 20–26% over its own Zopfli algorithm, which still provides the best compression ratio of any deflate algorithm.
    [Show full text]
  • Implementing Associative Coder of Buyanovsky (ACB)
    Implementing Associative Coder of Buyanovsky (ACB) data compression by Sean Michael Lambert A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science Montana State University © Copyright by Sean Michael Lambert (1999) Abstract: In 1994 George Mechislavovich Buyanovsky published a basic description of a new data compression algorithm he called the “Associative Coder of Buyanovsky,” or ACB. The archive program using this idea, which he released in 1996 and updated in 1997, is still one of the best general compression utilities available. Despite this, the ACB algorithm is still barely understood by data compression experts, primarily because Buyanovsky never published a detailed description of it. ACB is a new idea in data compression, merging concepts from existing statistical and dictionary-based algorithms with entirely original ideas. This document presents several variations of the ACB algorithm and the details required to implement a basic version of ACB. IMPLEMENTING ASSOCIATIVE CODER OF BUYANOVSKY (ACB) DATA COMPRESSION by Sean Michael Lambert A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science MONTANA STATE UNIVERSITY-BOZEMAN Bozeman, Montana April 1999 © COPYRIGHT by Sean Michael Lambert 1999 All Rights Reserved ii APPROVAL of a thesis submitted by Sean Michael Lambert This thesis has been read by each member of the thesis committee and has been found to be satisfactory regarding content, English usage, format, citations, bibliographic style, and consistency, and is ready for submission to the College of Graduate Studies. Brendan Mumey U /l! ^ (Signature) Date Approved for the Department of Computer Science J.
    [Show full text]
  • An Analysis of XML Compression Efficiency
    An Analysis of XML Compression Efficiency Christopher J. Augeri1 Barry E. Mullins1 Leemon C. Baird III Dursun A. Bulutoglu2 Rusty O. Baldwin1 1Department of Electrical and Computer Engineering Department of Computer Science 2Department of Mathematics and Statistics United States Air Force Academy (USAFA) Air Force Institute of Technology (AFIT) USAFA, Colorado Springs, CO Wright Patterson Air Force Base, Dayton, OH {chris.augeri, barry.mullins}@afit.edu [email protected] {dursun.bulutoglu, rusty.baldwin}@afit.edu ABSTRACT We expand previous XML compression studies [9, 26, 34, 47] by XML simplifies data exchange among heterogeneous computers, proposing the XML file corpus and a combined efficiency metric. but it is notoriously verbose and has spawned the development of The corpus was assembled using guidelines given by developers many XML-specific compressors and binary formats. We present of the Canterbury corpus [3], files often used to assess compressor an XML test corpus and a combined efficiency metric integrating performance. The efficiency metric combines execution speed compression ratio and execution speed. We use this corpus and and compression ratio, enabling simultaneous assessment of these linear regression to assess 14 general-purpose and XML-specific metrics, versus prioritizing one metric over the other. We analyze compressors relative to the proposed metric. We also identify key collected metrics using linear regression models (ANOVA) versus factors when selecting a compressor. Our results show XMill or a simple comparison of means, e.g., X is 20% better than Y. WBXML may be useful in some instances, but a general-purpose compressor is often the best choice. 2. XML OVERVIEW XML has gained much acceptance since first proposed in 1998 by Categories and Subject Descriptors the World-Wide Web Consortium (W3C).
    [Show full text]