UCLA UCLA Electronic Theses and Dissertations Title Datacomp: Locally-independent Adaptive Compression for Real-World Systems Permalink https://escholarship.org/uc/item/0c3453tc Author Peterson, Peter Andrew Harrington Publication Date 2013 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA Los Angeles Datacomp: Locally-independent Adaptive Compression for Real-World Systems A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Computer Science by Peter Andrew Harrington Peterson 2013 © Copyright Peter Andrew Harrington Peterson 2013 ABSTRACT OF THE DISSERTATION Datacomp: Locally-independent Adaptive Compression for Real-World Systems by Peter Andrew Harrington Peterson Doctor of Philosophy in Computer Science University of California, Los Angeles, 2013 Professor Todd Millstein, Co-chair Professor Peter Reiher, Co-chair Typically used to save space, non-lossy data compression can save time and energy during communication if the cost to compress and send data is less than the cost of sending uncompressed data. However, compression can degrade efficiency if it compresses insufficiently or delays the operation significantly, which can depend on many factors. Because predicting the best strategy is risky and difficult, compression (if available) is typically manually controlled, resulting in missed opportunities and avoidable losses. This dissertation describes Datacomp, a general-purpose Adaptive Compression (AC) framework that improves efficiency in terms of time, space and energy for real-world workloads on real-world systems like laptops and smartphones. Prior systems are limited in important ways or rely on external hosts for prediction and compression, reducing their effectiveness or imposing unnecessary dependencies. In contrast, Datacomp is a Local Adaptive Compression system capable of choosing between numerous compressors using system monitors, a novel ii compressibility estimation technique and a history mechanism. Datacomp wraps system calls with AC capabilities, enabling applications to benefit with little modification. I also built Comptool, an off-line “AC oracle” for investigation and validation. Comptool, which includes LEAP energy-measurement capabilities, identifies the best-case compression strategy for a given scenario, highlighting critical factors for AC and providing a valuable standard against which to compare systems such as Datacomp. I evaluated two Datacomp-enabled utilities: drcp, a throughput-sensitive remote copy tool and dzip, an AC-enabled compression utility. I collected hundreds of megabytes of nine common but distinct classes of data to serve as workloads, including web traces, binaries, email and collections of personal data from volunteers. Experiments were performed using both Comptool and Datacomp while varying the data type, bandwidth, CPU load, frequency, and more. Up to and including 100Mbit/s, Datacomp consistently came within 1-3% of the best strategy identified by Comptool, improving throughput for realistic types by up to 74% over no compression, and up to 45% over zlib compression. Comptool generated strategies that could improve efficiency at gigabit speeds (over no compression) by up to 28% for Wikipedia data and 14% for Facebook data. iii The dissertation of Peter Andrew Harrington Peterson is approved. ________________________________________________ William Kaiser ________________________________________________ Douglas Stott Parker ________________________________________________ Junghoo Cho ________________________________________________ Peter Reiher, Committee Co-chair ________________________________________________ Todd Millstein, Committee Co-chair University of California, Los Angeles 2013 iv DEDICATION “Bernard of Chartres used to say that we are like [puny] dwarfs on the shoulders of giants, so that we can see more than them, and things at a greater distance, not by virtue of any sharpness of sight on our part, or any physical distinction, but because we are carried high and raised up by their giant size.” – John of Salisbury, Metalogicon [1] (1159) I like this version of the famous thought because of its humility. Seeing farther by virtue of standing on giants’ shoulders doesn’t require one to be a giant themselves, or even to see especially far. It merely requires a person who is willing to look and some patient, generous and steady giants willing to lift. I have been blessed with many such giants. First, I would not have completed this process without the love of my life, Anna. She quite literally supported this endeavor in every way as it evolved from a two-year Master’s degree into something much, much larger. She has been incredibly patient and gracious and has helped me grow through this process in many ways that are more personally valuable to me than the degree. Thank you to my parents, Tom and Sue, for everything, including instilling in me a love of learning, creating, teaching and experimentation. Dad, if you hadn’t brought those TRS-80s and Apple ][s home from the school district over all those summer vacations I would never have ended up here. Mom, thank you for your attention to detail, which has served me well in writing both text and code. (I’ve hidden some typos in here for you to find.) I thank the Harringtons and the rest of my family for their love and support. Thank you to everyone who gave me the benefit of the doubt during this project. I’ve bit off more than I could chew in the past, but this was something else entirely. I owe you. I also would never have done any of this if not for the support and encouragement of Dr. Peter Reiher, who gave me the chance to seek this Ph.D. His practical advice and the environment in his lab defined my UCLA experience. Thank you as well to Janice for her support and sharp proofreading, to my lab mates and colleagues in the CSD, and to the giants listed in Section 14. Finally, many people, perhaps unknowingly, made various contributions to this project: Dr. William Kaiser and Digvijay Singh, Dr. Junghoo Cho, Dr. D. Stott Parker, Dr. Todd Millstein, Dr. Jelena Mirkovic, Dr. Tanya Crenshaw, Dr. Eddie Kohler, Dr. Paul Eggert, Dr. Alan Iliff, Dr. Alice Iverson, Dr. Joe Lill, Dr. Walter M. Gibbs, Dr. Michael Meisel, Dr. Erik Kline, Dr. Alex Afanasyev, Dr. Chuck Fleming, Vahab Pournaghshband, Matt Beaumont-Gay, Elizabeth Harrington, P. Joshua Griffin, Lukas Eklund, Jess Frykholm, Max Peterson, Clint and Charles Bergsten, James Herrick, Eric and Robin Berglund, Louise Ambros, and the “usual gang of idiots,” including Nick Moffitt, Paul Collins, Brian Hicks, Emad El-Haraty, Neale Pickett and Ryan Finnie. Thanks and apologies to those inexplicably omitted. I have had the privilege to stand on the shoulders of a great group of wonderful, friendly and brilliant people. This dissertation is dedicated to you. v TABLE OF CONTENTS 1 Introduction ............................................................................................................................. 1 2 Non-Lossy Compression ......................................................................................................... 9 2.1 Basic Techniques, Compression Tools, and Options...................................................... 9 2.2 A Compression Primer .................................................................................................. 11 2.2.1 EngZip....................................................................................................................... 11 2.2.2 Variation by Input and Algorithm ............................................................................. 12 2.2.3 Variation in Compressibility ..................................................................................... 13 2.2.4 Input Length and Compression Ratio ....................................................................... 15 2.2.5 Compression Blocks and Sliding Windows .............................................................. 16 2.2.6 Throwing Computation at the Problem..................................................................... 17 2.3 Fundamental Compression Mechanisms ...................................................................... 18 2.3.1 Run-length Encoding ................................................................................................ 19 2.3.2 Move To Front Coding ............................................................................................. 21 2.3.3 Huffman Coding ....................................................................................................... 22 2.3.4 Lempel-Ziv (Dictionary Methods) ............................................................................ 23 2.3.5 Burrows-Wheeler Transform .................................................................................... 25 2.4 Application Requirements ............................................................................................ 26 2.4.1 Sequential vs. Random Access ................................................................................. 27 2.4.2 Effective Throughput and Latency ........................................................................... 28 2.4.3 Block Structures and Slack Space............................................................................. 31 3 Adaptive Compression .......................................................................................................... 34 3.1 Overview ......................................................................................................................
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages418 Page
-
File Size-