UCLA Electronic Theses and Dissertations

Total Page:16

File Type:pdf, Size:1020Kb

UCLA Electronic Theses and Dissertations UCLA UCLA Electronic Theses and Dissertations Title Datacomp: Locally-independent Adaptive Compression for Real-World Systems Permalink https://escholarship.org/uc/item/0c3453tc Author Peterson, Peter Andrew Harrington Publication Date 2013 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA Los Angeles Datacomp: Locally-independent Adaptive Compression for Real-World Systems A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Computer Science by Peter Andrew Harrington Peterson 2013 © Copyright Peter Andrew Harrington Peterson 2013 ABSTRACT OF THE DISSERTATION Datacomp: Locally-independent Adaptive Compression for Real-World Systems by Peter Andrew Harrington Peterson Doctor of Philosophy in Computer Science University of California, Los Angeles, 2013 Professor Todd Millstein, Co-chair Professor Peter Reiher, Co-chair Typically used to save space, non-lossy data compression can save time and energy during communication if the cost to compress and send data is less than the cost of sending uncompressed data. However, compression can degrade efficiency if it compresses insufficiently or delays the operation significantly, which can depend on many factors. Because predicting the best strategy is risky and difficult, compression (if available) is typically manually controlled, resulting in missed opportunities and avoidable losses. This dissertation describes Datacomp, a general-purpose Adaptive Compression (AC) framework that improves efficiency in terms of time, space and energy for real-world workloads on real-world systems like laptops and smartphones. Prior systems are limited in important ways or rely on external hosts for prediction and compression, reducing their effectiveness or imposing unnecessary dependencies. In contrast, Datacomp is a Local Adaptive Compression system capable of choosing between numerous compressors using system monitors, a novel ii compressibility estimation technique and a history mechanism. Datacomp wraps system calls with AC capabilities, enabling applications to benefit with little modification. I also built Comptool, an off-line “AC oracle” for investigation and validation. Comptool, which includes LEAP energy-measurement capabilities, identifies the best-case compression strategy for a given scenario, highlighting critical factors for AC and providing a valuable standard against which to compare systems such as Datacomp. I evaluated two Datacomp-enabled utilities: drcp, a throughput-sensitive remote copy tool and dzip, an AC-enabled compression utility. I collected hundreds of megabytes of nine common but distinct classes of data to serve as workloads, including web traces, binaries, email and collections of personal data from volunteers. Experiments were performed using both Comptool and Datacomp while varying the data type, bandwidth, CPU load, frequency, and more. Up to and including 100Mbit/s, Datacomp consistently came within 1-3% of the best strategy identified by Comptool, improving throughput for realistic types by up to 74% over no compression, and up to 45% over zlib compression. Comptool generated strategies that could improve efficiency at gigabit speeds (over no compression) by up to 28% for Wikipedia data and 14% for Facebook data. iii The dissertation of Peter Andrew Harrington Peterson is approved. ________________________________________________ William Kaiser ________________________________________________ Douglas Stott Parker ________________________________________________ Junghoo Cho ________________________________________________ Peter Reiher, Committee Co-chair ________________________________________________ Todd Millstein, Committee Co-chair University of California, Los Angeles 2013 iv DEDICATION “Bernard of Chartres used to say that we are like [puny] dwarfs on the shoulders of giants, so that we can see more than them, and things at a greater distance, not by virtue of any sharpness of sight on our part, or any physical distinction, but because we are carried high and raised up by their giant size.” – John of Salisbury, Metalogicon [1] (1159) I like this version of the famous thought because of its humility. Seeing farther by virtue of standing on giants’ shoulders doesn’t require one to be a giant themselves, or even to see especially far. It merely requires a person who is willing to look and some patient, generous and steady giants willing to lift. I have been blessed with many such giants. First, I would not have completed this process without the love of my life, Anna. She quite literally supported this endeavor in every way as it evolved from a two-year Master’s degree into something much, much larger. She has been incredibly patient and gracious and has helped me grow through this process in many ways that are more personally valuable to me than the degree. Thank you to my parents, Tom and Sue, for everything, including instilling in me a love of learning, creating, teaching and experimentation. Dad, if you hadn’t brought those TRS-80s and Apple ][s home from the school district over all those summer vacations I would never have ended up here. Mom, thank you for your attention to detail, which has served me well in writing both text and code. (I’ve hidden some typos in here for you to find.) I thank the Harringtons and the rest of my family for their love and support. Thank you to everyone who gave me the benefit of the doubt during this project. I’ve bit off more than I could chew in the past, but this was something else entirely. I owe you. I also would never have done any of this if not for the support and encouragement of Dr. Peter Reiher, who gave me the chance to seek this Ph.D. His practical advice and the environment in his lab defined my UCLA experience. Thank you as well to Janice for her support and sharp proofreading, to my lab mates and colleagues in the CSD, and to the giants listed in Section 14. Finally, many people, perhaps unknowingly, made various contributions to this project: Dr. William Kaiser and Digvijay Singh, Dr. Junghoo Cho, Dr. D. Stott Parker, Dr. Todd Millstein, Dr. Jelena Mirkovic, Dr. Tanya Crenshaw, Dr. Eddie Kohler, Dr. Paul Eggert, Dr. Alan Iliff, Dr. Alice Iverson, Dr. Joe Lill, Dr. Walter M. Gibbs, Dr. Michael Meisel, Dr. Erik Kline, Dr. Alex Afanasyev, Dr. Chuck Fleming, Vahab Pournaghshband, Matt Beaumont-Gay, Elizabeth Harrington, P. Joshua Griffin, Lukas Eklund, Jess Frykholm, Max Peterson, Clint and Charles Bergsten, James Herrick, Eric and Robin Berglund, Louise Ambros, and the “usual gang of idiots,” including Nick Moffitt, Paul Collins, Brian Hicks, Emad El-Haraty, Neale Pickett and Ryan Finnie. Thanks and apologies to those inexplicably omitted. I have had the privilege to stand on the shoulders of a great group of wonderful, friendly and brilliant people. This dissertation is dedicated to you. v TABLE OF CONTENTS 1 Introduction ............................................................................................................................. 1 2 Non-Lossy Compression ......................................................................................................... 9 2.1 Basic Techniques, Compression Tools, and Options...................................................... 9 2.2 A Compression Primer .................................................................................................. 11 2.2.1 EngZip....................................................................................................................... 11 2.2.2 Variation by Input and Algorithm ............................................................................. 12 2.2.3 Variation in Compressibility ..................................................................................... 13 2.2.4 Input Length and Compression Ratio ....................................................................... 15 2.2.5 Compression Blocks and Sliding Windows .............................................................. 16 2.2.6 Throwing Computation at the Problem..................................................................... 17 2.3 Fundamental Compression Mechanisms ...................................................................... 18 2.3.1 Run-length Encoding ................................................................................................ 19 2.3.2 Move To Front Coding ............................................................................................. 21 2.3.3 Huffman Coding ....................................................................................................... 22 2.3.4 Lempel-Ziv (Dictionary Methods) ............................................................................ 23 2.3.5 Burrows-Wheeler Transform .................................................................................... 25 2.4 Application Requirements ............................................................................................ 26 2.4.1 Sequential vs. Random Access ................................................................................. 27 2.4.2 Effective Throughput and Latency ........................................................................... 28 2.4.3 Block Structures and Slack Space............................................................................. 31 3 Adaptive Compression .......................................................................................................... 34 3.1 Overview ......................................................................................................................
Recommended publications
  • Table of Contents Modules and Packages
    Table of Contents Modules and Packages...........................................................................................1 Software on NAS Systems..................................................................................................1 Using Software Packages in pkgsrc...................................................................................4 Using Software Modules....................................................................................................7 Modules and Packages Software on NAS Systems UPDATE IN PROGRESS: Starting with version 2.17, SGI MPT is officially known as HPE MPT. Use the command module load mpi-hpe/mpt to get the recommended version of MPT library on NAS systems. This article is being updated to reflect this change. Software programs on NAS systems are managed as modules or packages. Available programs are listed in tables below. Note: The name of a software module or package may contain additional information, such as the vendor name, version number, or what compiler/library is used to build the software. For example: • comp-intel/2016.2.181 - Intel Compiler version 2016.2.181 • mpi-sgi/mpt.2.15r20 - SGI MPI library version 2.15r20 • netcdf/4.4.1.1_mpt - NetCDF version 4.4.1.1, built with SGI MPT Modules Use the module avail command to see all available software modules. Run module whatis to view a short description of every module. For more information about a specific module, run module help modulename. See Using Software Modules for more information. Available Modules (as
    [Show full text]
  • Metadefender Core V4.12.2
    MetaDefender Core v4.12.2 © 2018 OPSWAT, Inc. All rights reserved. OPSWAT®, MetadefenderTM and the OPSWAT logo are trademarks of OPSWAT, Inc. All other trademarks, trade names, service marks, service names, and images mentioned and/or used herein belong to their respective owners. Table of Contents About This Guide 13 Key Features of Metadefender Core 14 1. Quick Start with Metadefender Core 15 1.1. Installation 15 Operating system invariant initial steps 15 Basic setup 16 1.1.1. Configuration wizard 16 1.2. License Activation 21 1.3. Scan Files with Metadefender Core 21 2. Installing or Upgrading Metadefender Core 22 2.1. Recommended System Requirements 22 System Requirements For Server 22 Browser Requirements for the Metadefender Core Management Console 24 2.2. Installing Metadefender 25 Installation 25 Installation notes 25 2.2.1. Installing Metadefender Core using command line 26 2.2.2. Installing Metadefender Core using the Install Wizard 27 2.3. Upgrading MetaDefender Core 27 Upgrading from MetaDefender Core 3.x 27 Upgrading from MetaDefender Core 4.x 28 2.4. Metadefender Core Licensing 28 2.4.1. Activating Metadefender Licenses 28 2.4.2. Checking Your Metadefender Core License 35 2.5. Performance and Load Estimation 36 What to know before reading the results: Some factors that affect performance 36 How test results are calculated 37 Test Reports 37 Performance Report - Multi-Scanning On Linux 37 Performance Report - Multi-Scanning On Windows 41 2.6. Special installation options 46 Use RAMDISK for the tempdirectory 46 3. Configuring Metadefender Core 50 3.1. Management Console 50 3.2.
    [Show full text]
  • Redalyc.A Lossy Method for Compressing Raw CCD Images
    Revista Mexicana de Astronomía y Astrofísica ISSN: 0185-1101 [email protected] Instituto de Astronomía México Watson, Alan M. A Lossy Method for Compressing Raw CCD Images Revista Mexicana de Astronomía y Astrofísica, vol. 38, núm. 2, octubre, 2002, pp. 233-249 Instituto de Astronomía Distrito Federal, México Available in: http://www.redalyc.org/articulo.oa?id=57138212 How to cite Complete issue Scientific Information System More information about this article Network of Scientific Journals from Latin America, the Caribbean, Spain and Portugal Journal's homepage in redalyc.org Non-profit academic project, developed under the open access initiative Revista Mexicana de Astronom´ıa y Astrof´ısica, 38, 233{249 (2002) A LOSSY METHOD FOR COMPRESSING RAW CCD IMAGES Alan M. Watson Instituto de Astronom´ıa Universidad Nacional Aut´onoma de M´exico, Campus Morelia, M´exico Received 2002 June 3; accepted 2002 August 7 RESUMEN Se presenta un m´etodo para comprimir las im´agenes en bruto de disposi- tivos como los CCD. El m´etodo es muy sencillo: cuantizaci´on con p´erdida y luego compresi´on sin p´erdida con herramientas de uso general como gzip o bzip2. Se convierten los archivos comprimidos a archivos de FITS descomprimi´endolos con gunzip o bunzip2, lo cual es una ventaja importante en la distribuci´on de datos com- primidos. El grado de cuantizaci´on se elige para eliminar los bits de bajo orden, los cuales sobre-muestrean el ruido, no proporcionan informaci´on, y son dif´ıciles o imposibles de comprimir. El m´etodo es con p´erdida, pero proporciona ciertas garant´ıas sobre la diferencia absoluta m´axima, la diferencia RMS y la diferencia promedio entre la imagen comprimida y la imagen original; tales garant´ıas implican que el m´etodo es adecuado para comprimir im´agenes en bruto.
    [Show full text]
  • An Optimal Real-Time Controller for Vertical Plasma Stabilization N
    1 An optimal real-time controller for vertical plasma stabilization N. Cruz, J.-M. Moret, S. Coda, B.P. Duval, H.B. Le, A.P. Rodrigues, C.A.F. Varandas, C.M.B.A. Correia and B. Gonc¸alves Abstract—Modern Tokamaks have evolved from the initial ax- presents important advantages since it allows the creation isymmetric circular plasma shape to an elongated axisymmetric of divertor plasmas, the increase of the plasma current and plasma shape that improves the energy confinement time and the density limit as well as providing plasma stability. However, triple product, which is a generally used figure of merit for the conditions needed for fusion reactor performance. However, the an elongated plasma is unstable due to the forces that pull elongated plasma cross section introduces a vertical instability the plasma column upward or downward. The result of these that demands a real-time feedback control loop to stabilize forces is a plasma configuration that tends to be pushed up or the plasma vertical position and velocity. At the Tokamak down depending on the initial displacement disturbance. For Configuration Variable (TCV) in-vessel poloidal field coils driven example, a small displacement downwards results in the lower by fast switching power supplies are used to stabilize highly elongated plasmas. TCV plasma experiments have used a PID poloidal field coils pulling the plasma down, with increased algorithm based controller to correct the plasma vertical position. strength as the plasma gets further from the equilibrium posi- In late 2013 experiments a new optimal real-time controller was tion. To compensate this instability, feedback controllers have tested improving the stability of the plasma.
    [Show full text]
  • Aligning Intent and Behavior in Software Systems: How Programs Communicate & Their Distribution and Organization
    © 2020 William B. Dietz ALIGNING INTENT AND BEHAVIOR IN SOFTWARE SYSTEMS: HOW PROGRAMS COMMUNICATE & THEIR DISTRIBUTION AND ORGANIZATION BY WILLIAM B. DIETZ DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate College of the University of Illinois at Urbana-Champaign, 2020 Urbana, Illinois Doctoral Committee: Professor Vikram Adve, Chair Professor John Regehr, University of Utah Professor Tao Xie Assistant Professor Sasa Misailovic ABSTRACT Managing the overwhelming complexity of software is a fundamental challenge because complex- ity is the root cause of problems regarding software performance, size, and security. Complexity is what makes software hard to understand, and our ability to understand software in whole or in part is essential to being able to address these problems effectively. Attacking this overwhelming complexity is the fundamental challenge I seek to address by simplifying how we write, organize and think about programs. Within this dissertation I present a system of tools and a set of solutions for improving the nature of software by focusing on programmer’s desired outcome, i.e. their intent. At the program level, the conventional focus, it is impossible to identify complexity that, at the system level, is unnecessary. This “accidental complexity” includes everything from unused features to independent implementations of common algorithmic tasks. Software techniques driving innovation simultaneously increase the distance between what is intended by humans – developers, designers, and especially the users – and what the executing code does in practice. By preserving the declarative intent of the programmer, which is lost in the traditional process of compiling and linking and building software, it is easier to abstract away unnecessary details.
    [Show full text]
  • Volume 5: DAF Variable Detail Pages
    Anc hor Volume 5: DAF Variable Detail Pages August 2020 Submitted to: Social Security Administration Office of Retirement and Disability Policy Office of Research, Demonstration, and Employment Support Washington, DC 20024-2796 Project Officers: Paul O’Leary and Debra Tidwell-Peters Contract Number: SS00-16-60003 Submitted by: Mathematica 1100 1st Street, NE 12th Floor Washington, DC 20002-4221 Telephone: (202) 484-9220 Facsimile: (202) 863-1763 Project Director: Jody Schimmel Hyde Reference Number: 50214.Y3.T05.530.360 Suggested Citation: “Disability Analysis File 2018 (DAF18) Documentation: Data from January 1994 through December 2018.” Washington, DC: Mathematica, August 2020. This page has been left blank for double-sided copying. MATHEMATICA CONTENTS GLOSSARY .................................................................................................................................................. .v OVERVIEW OF DAF DOCUMENTATION ................................................................................................... .ix QUICK REFERENCE GUIDE ....................................................................................................................... 1 PART A DAF VARIABLE DETAIL PAGES .................................................................................................. 7 PART B RSA VARIABLE DETAIL PAGES .............................................................................................. 635 iii This page has been left blank for double-sided copying. MATHEMATICA GLOSSARY AB Accelerated
    [Show full text]
  • Linux-Cookbook.Pdf
    LINUX COOKBOOK ™ Other Linux resources from O’Reilly Related titles Linux Device Drivers Exploring the JDS Linux Linux in a Nutshell Desktop Running Linux Learning Red Hat Enterprise Building Embedded Linux Linux and Fedora Systems Linux Pocket Guide Linux Security Cookbook Understanding the Linux Kernel Linux Books linux.oreilly.com is a complete catalog of O’Reilly’s books on Resource Center Linux and Unix and related technologies, including sample chapters and code examples. ONLamp.com is the premier site for the open source web plat- form: Linux, Apache, MySQL, and either Perl, Python, or PHP. Conferences O’Reilly brings diverse innovators together to nurture the ideas that spark revolutionary industries. We specialize in document- ing the latest tools and systems, translating the innovator’s knowledge into useful skills for those in the trenches. Visit conferences.oreilly.com for our upcoming events. Safari Bookshelf (safari.oreilly.com) is the premier online refer- ence library for programmers and IT professionals. Conduct searches across more than 1,000 books. Subscribers can zero in on answers to time-critical questions in a matter of seconds. Read the books on your Bookshelf from cover to cover or sim- ply flip to the page you need. Try it today with a free trial. LINUX COOKBOOK ™ Carla Schroder Beijing • Cambridge • Farnham • Köln • Paris • Sebastopol • Taipei • Tokyo Linux Cookbook™ by Carla Schroder Copyright © 2005 O’Reilly Media, Inc. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use.
    [Show full text]
  • Kafl: Hardware-Assisted Feedback Fuzzing for OS Kernels
    kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels Sergej Schumilo1, Cornelius Aschermann1, Robert Gawlik1, Sebastian Schinzel2, Thorsten Holz1 1Ruhr-Universität Bochum, 2Münster University of Applied Sciences Motivation IJG jpeg libjpeg-turbo libpng libtiff mozjpeg PHP Mozilla Firefox Internet Explorer PCRE sqlite OpenSSL LibreOffice poppler freetype GnuTLS GnuPG PuTTY ntpd nginx bash tcpdump JavaScriptCore pdfium ffmpeg libmatroska libarchive ImageMagick BIND QEMU lcms Adobe Flash Oracle BerkeleyDB Android libstagefright iOS ImageIO FLAC audio library libsndfile less lesspipe strings file dpkg rcs systemd-resolved libyaml Info-Zip unzip libtasn1OpenBSD pfctl NetBSD bpf man mandocIDA Pro clamav libxml2glibc clang llvmnasm ctags mutt procmail fontconfig pdksh Qt wavpack OpenSSH redis lua-cmsgpack taglib privoxy perl libxmp radare2 SleuthKit fwknop X.Org exifprobe jhead capnproto Xerces-C metacam djvulibre exiv Linux btrfs Knot DNS curl wpa_supplicant Apple Safari libde265 dnsmasq libbpg lame libwmf uudecode MuPDF imlib2 libraw libbson libsass yara W3C tidy- html5 VLC FreeBSD syscons John the Ripper screen tmux mosh UPX indent openjpeg MMIX OpenMPT rxvt dhcpcd Mozilla NSS Nettle mbed TLS Linux netlink Linux ext4 Linux xfs botan expat Adobe Reader libav libical OpenBSD kernel collectd libidn MatrixSSL jasperMaraDNS w3m Xen OpenH232 irssi cmark OpenCV Malheur gstreamer Tor gdk-pixbuf audiofilezstd lz4 stb cJSON libpcre MySQL gnulib openexr libmad ettercap lrzip freetds Asterisk ytnefraptor mpg123 exempi libgmime pev v8 sed awk make
    [Show full text]
  • Pipenightdreams Osgcal-Doc Mumudvb Mpg123-Alsa Tbb
    pipenightdreams osgcal-doc mumudvb mpg123-alsa tbb-examples libgammu4-dbg gcc-4.1-doc snort-rules-default davical cutmp3 libevolution5.0-cil aspell-am python-gobject-doc openoffice.org-l10n-mn libc6-xen xserver-xorg trophy-data t38modem pioneers-console libnb-platform10-java libgtkglext1-ruby libboost-wave1.39-dev drgenius bfbtester libchromexvmcpro1 isdnutils-xtools ubuntuone-client openoffice.org2-math openoffice.org-l10n-lt lsb-cxx-ia32 kdeartwork-emoticons-kde4 wmpuzzle trafshow python-plplot lx-gdb link-monitor-applet libscm-dev liblog-agent-logger-perl libccrtp-doc libclass-throwable-perl kde-i18n-csb jack-jconv hamradio-menus coinor-libvol-doc msx-emulator bitbake nabi language-pack-gnome-zh libpaperg popularity-contest xracer-tools xfont-nexus opendrim-lmp-baseserver libvorbisfile-ruby liblinebreak-doc libgfcui-2.0-0c2a-dbg libblacs-mpi-dev dict-freedict-spa-eng blender-ogrexml aspell-da x11-apps openoffice.org-l10n-lv openoffice.org-l10n-nl pnmtopng libodbcinstq1 libhsqldb-java-doc libmono-addins-gui0.2-cil sg3-utils linux-backports-modules-alsa-2.6.31-19-generic yorick-yeti-gsl python-pymssql plasma-widget-cpuload mcpp gpsim-lcd cl-csv libhtml-clean-perl asterisk-dbg apt-dater-dbg libgnome-mag1-dev language-pack-gnome-yo python-crypto svn-autoreleasedeb sugar-terminal-activity mii-diag maria-doc libplexus-component-api-java-doc libhugs-hgl-bundled libchipcard-libgwenhywfar47-plugins libghc6-random-dev freefem3d ezmlm cakephp-scripts aspell-ar ara-byte not+sparc openoffice.org-l10n-nn linux-backports-modules-karmic-generic-pae
    [Show full text]
  • Systems Cost/Performance Analysis (Study 2.3) Final Report
    AEROSPffE REPORT NO. AZ "'14(7343)-1, VOL III Systems Cost/Performance Analysis (Study 2.3) Final Report Volume IH: Programmer's Manual and User's Guide Prepared by V'< ADVANCED MISSION ANALYSIS DIRECTORATE Advanced Orbital Systems Division 27 September 1974 Prepared for OFFICE OF MANNED SPACE FLIGHT NATIONAL AERONAUTICS AND SPACE ADMINISTRATION Washington, D.C. 20546 Contract No. NASW-2575 Systems Engineering Operations THE AEROSPACE CORPORATION .(NASA-C-lP3377) SYSTEMS COSr/PERFORACE N75-309211 ANALYSIS (STUDY 2.3). VOLUME 3: IPROG AMER'S MANUAL AND USER'S GUIDE Final Report (Aerospace Corp., El Segundo, Calif.) Unclas 592 p HC $13.25 CSCL 05A G3/8-1 34377 ) Aerospace Report No. ATR-74(7343)-I, Vol, III SYSTEMS COST/PERFORMANCE ANALYSIS (STUDY 2. 3) FINAL REPORT Volume III: Programmer's Manual and User's Guide Prepared by Advanced Mission Analysis Directorate Advanced Orbital Systems Division Z7 September 1974 Systems Engineering Operations THE AEROSPACE CORPORATION El Segundo, California Prepared for OFFICE OF MANNED SPACE FLIGHT NATIONAL AERONAUTICS AND SPACE ADMINISTRATION Washington, D. C. Contract No. NASW-2575 PAGE INTENTIONALLY BLANK Aerospace Report No. ATR-74(7343)-I, Vol. III SYSTEMS COST/PERFORMANCE ANALYSIS (STUDY 2. 3) FINAL REPORT Volume III: Programmer's Manual and User's Guide Prepared R *F.OJ , Man e Data Sys ems An sis Section Data Processing Subdivision Approved L. Sashkin, Director R. H. Herndon, Assoc. Group Data Processing Subdivision Director Information Processing Division Advanced Mission Analysis Engineering Science Operations Directorate Advanced Orbital Systems Division PRECEDING PAGE-iii- BLANK NOT FILMED PAGE INTENTIONALLY BLANK FOREWORD This report documents The Aerospace Corporation effort on Study 2.3, Systems Cost/Perfbrmance Analysis, performed under NASA Contract NASW-2575 during Fiscal Year 1974.
    [Show full text]
  • Is It Time to Replace Gzip?
    bioRxiv preprint doi: https://doi.org/10.1101/642553; this version posted May 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Is it time to replace gzip? Comparison of modern compressors for molecular sequence databases Kirill Kryukov*, Mahoko Takahashi Ueda, So Nakagawa, Tadashi Imanishi Department of Molecular Life Science, Tokai University School of Medicine, Isehara, Kanagawa 259-1193, Japan. *Correspondence: [email protected] Abstract Nearly all molecular sequence databases currently use gzip for data compression. Ongoing rapid accumulation of stored data calls for more efficient compression tool. We systematically benchmarked the available compressors on representative DNA, RNA and Protein datasets. We tested specialized sequence compressors 2bit, BLAST, DNA-COMPACT, DELIMINATE, Leon, MFCompress, NAF, UHT and XM, and general-purpose compressors brotli, bzip2, gzip, lz4, lzop, lzturbo, pbzip2, pigz, snzip, xz, zpaq and zstd. Overall, NAF and zstd performed well in terms of transfer/decompression speed. However, checking benchmark results is necessary when choosing compressor for specific data type and application. Benchmark results database is available at: http://kirr.dyndns.org/sequence-compression-benchmark/. Keywords: compression; benchmark; DNA; RNA; protein; genome; sequence; database. Molecular sequence databases store and distribute DNA, RNA and protein sequences as compressed FASTA files. Currently, nearly all databases universally depend on gzip for compression. This incredible longevity of the 26-year-old compressor probably owes to multiple factors, including conservatism of database operators, wide availability of gzip, and its generally acceptable performance.
    [Show full text]
  • Rule Base with Frequent Bit Pattern and Enhanced K-Medoid Algorithm for the Evaluation of Lossless Data Compression
    Volume 3, No. 1, Jan-Feb 2012 ISSN No. 0976-5697 International Journal of Advanced Research in Computer Science RESEARCH PAPER Available Online at www.ijarcs.info Rule Base with Frequent Bit Pattern and Enhanced k-Medoid Algorithm for the Evaluation of Lossless Data Compression. Nishad P.M.* Dr. N. Nalayini Ph.D Scholar, Department Of Computer Science Associate professor, Department of computer science NGM NGM College, Pollachi, India College Pollachi, Coimbatore, India [email protected] [email protected] Abstract: This paper presents a study of various lossless compression algorithms; to test the performance and the ability of compression of each algorithm based on ten different parameters. For evaluation the compression ratios of each algorithm on different parameters are processed. To classify the algorithms based on the compression ratio, rule base is constructed to mine with frequent bit pattern to analyze the variations in various compression algorithms. Also, enhanced K- Medoid clustering is used to cluster the various data compression algorithms based on various parameters. The cluster falls dissentingly high to low after the enhancement. The framed rule base consists of 1,048,576 rules, which is used to evaluate the compression algorithm. Two hundred and eleven Compression algorithms are used for this study. The experimental result shows only few algorithm satisfies the range “High” for more number of parameters. Keywords: Lossless compression, parameters, compression ratio, rule mining, frequent bit pattern, K–Medoid, clustering. I. INTRODUCTION the maximum shows the peek compression ratio of algorithms on various parameters, for example 19.43 is the Data compression is a method of encoding rules that minimum compression ratio and the 76.84 is the maximum allows substantial reduction in the total number of bits to compression ratio for the parameter EXE shown in table-1 store or transmit a file.
    [Show full text]